There is the preselection, which depends on the fanout queries the model comes up with and the contents performance across those queries on the search index.
After that content is actually assessed by the model. This paper tried different strategies to improve performance for this last step: https://arxiv.org/pdf/2311.09735. Adding statistics, sources, original data are all strategies that we apply.
In classic SEO, creating more and more content leads to "cannibalization". Generally this hurts performance of all overlapping content so much that it is not worth it.