No specific figures, but see, for example:
https://annas-archive.gl/blog/ai-copyright.html
> Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.
> We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).
It's at least 30 companies, each of which paid hundreds of thousands of dollars.