logoalt Hacker News

fake-namelast Wednesday at 7:35 AM1 replyview on HN

> It's trivial to normalise the various formats,

Ha. Ha. ha ha ha.

As someone who as pretty broadly tried to normalize a pile of books and documents I have legitimate access to, no it is not.

You can get good results 80% of the time, usable but messy results 18% of the time, and complete garbage the remaining 2%. More effort seems to only result in marginal improvements.


Replies

bawolfflast Wednesday at 8:06 AM

98% sounds good enough for the usecase suggested here.

show 2 replies