logoalt Hacker News

bawolfflast Wednesday at 8:06 AM2 repliesview on HN

98% sounds good enough for the usecase suggested here.


Replies

pastagelast Wednesday at 10:59 AM

Writing good validators for data is hard. You can be 100% sure that there will be bad data in those 98%. From my own experience I thought I had 50% of the books converted correctly and then I found I still had junk data and gave up, it is not an impossible problem I just was not motivated to fix it on my own. Working with your own copies is fine, but when you try to share that you get into legal issues that I just do not feel are that interesting to solve.

Edit: my point is that I would like to share my work but that is hard to do in a legal way. That is the main reason I gave up.

landl0rdlast Wednesday at 1:25 PM

2% garbage, if some of that garbage falls out the right way, is more than enough to seriously degrade search result quality.

show 1 reply