I am not on the research team, rather on the production side of things, so my knowledge on that is pretty limited. I think one of the main takeaways from a lot of the research, though, on both the segmentation side and the ink detection side, is that it's a lot less about what models and techniques and such you use, but how good your training data is. Gathering ground truth is hard, and if you don't have a lot of good ground truth, it doesn't matter if your code is perfect, you'll never get results.
That is a general truth of most ML; many models _can_ find the information in the data, if the data is good enough. If it is not, then likely no model can.
> it's a lot less about what models and techniques and such you use, but how good your training data is.
Ah, the good old bitter lesson strikes again
You brought up what I'm most curious about: Where does the ground truth come from for this work since you can't just to unwrap a scroll to tell if the model got it right or, presumably, make a facsimile scroll and wrap it up.