You can probably do this in a day with a CLI based LLM like Claude Code. It can write the tools that would allow you to sort, test and cross check your doc sets.