First we need good stem splitting
What do you think about the recent SAM audio model by meta? https://ai.meta.com/blog/sam-audio/
What do you think about the recent SAM audio model by meta? https://ai.meta.com/blog/sam-audio/