logoalt Hacker News

verditelabstoday at 7:10 PM3 repliesview on HN

I am not on the research team, rather on the production side of things, so my knowledge on that is pretty limited. I think one of the main takeaways from a lot of the research, though, on both the segmentation side and the ink detection side, is that it's a lot less about what models and techniques and such you use, but how good your training data is. Gathering ground truth is hard, and if you don't have a lot of good ground truth, it doesn't matter if your code is perfect, you'll never get results.


Replies

EvanAndersontoday at 8:47 PM

You brought up what I'm most curious about: Where does the ground truth come from for this work since you can't just to unwrap a scroll to tell if the model got it right or, presumably, make a facsimile scroll and wrap it up.

show 1 reply
rossdavidhtoday at 7:29 PM

That is a general truth of most ML; many models _can_ find the information in the data, if the data is good enough. If it is not, then likely no model can.

gekoxyztoday at 7:28 PM

> it's a lot less about what models and techniques and such you use, but how good your training data is.

Ah, the good old bitter lesson strikes again