There is an important difference between openly training on scraped web data and license-ignored data from GitHub and training on data from your paying customers that you promised you wouldn't train on.
Anthropic had to pay $1.5bn after being caught downloading pirated ebooks.
So Anthropic had to pay less than 1% of their valuation despite approximately their entire business being dependent on this and similar piracy. I somehow doubt their takeaway from that is "let's avoid doing that again".