Both ChatGPT 4o and Claude 3.5 Sonnet can identify the generated page content as "random words".
Given the size of the training data - I don’t think it would economical to validate all training data with high-end LLM models.
Given the size of the training data - I don’t think it would economical to validate all training data with high-end LLM models.