> with AI-generated content excluded from pre-training.
> without distillation from third-party models
sounds like zero unless they are lying.
"how many of those shapes are rectangles?" "sounds like zero unless they are squares"
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
> with AI-generated content excluded from pre-training.
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.