Why not store just a small u8 count of newlines in a chunk instead of their u128 positions and then only loop through the last chunk for precision?
You don't need information about the position of newlines in all the chunks located before the one your offset lands on
As I understand it, they do exactly what you say. TFA is about optimizing the last chunk's loop.