> asked LLMs to compile list of 10-20 writers considered canon in each decade since 1800, then identify all their notable works and years of publication. After some iterations with coding agents I got over 2,000 works by 200 authors.
Wait, so the source data is just LLM hallucinations? It makes sense to use an LLM to build the data collection, but not to build your source data.
This is in my opinion a better use of tech that has an error rate (hallucination), you just assume that its a fuzzy search, and sample the results to see how you did. I'd like to see a few from the results for sure!