> It's trivial to normalise the various formats,
Ha. Ha. ha ha ha.
As someone who as pretty broadly tried to normalize a pile of books and documents I have legitimate access to, no it is not.
You can get good results 80% of the time, usable but messy results 18% of the time, and complete garbage the remaining 2%. More effort seems to only result in marginal improvements.
98% sounds good enough for the usecase suggested here.