logoalt Hacker News

MattRogishyesterday at 6:20 PM1 replyview on HN

I calculate* the appropriate overlap and the slicer overlaps a certain amount of the previous slice. There is some post-processing assembly required, but it's trivial.

[*] SWAG line height, trial and error to figure out the right amount of overlap given LLM error rates, etc.


Replies

ryanisnanyesterday at 6:30 PM

Interesting. Do you have a uniform data set? E.g. documents of a specific type that you know consistently have similar formats, or is this training something you need to do per-document?

show 1 reply