Did they though? That is the lore. You can’t really compare recommender system performance across different populations and products.
Unlike common benchmarks for LLMs.