logoalt Hacker News

ethan_smithlast Monday at 9:45 AM1 replyview on HN

You could try implementing a character count limit per chunk instead of sentence-based splitting. A hybrid approach that breaks at sentence boundaries but enforces a maximum chunk size of ~150-200 characters would likely solve the word-skipping issue while maintaining natural speech flow.


Replies

logicprogyesterday at 1:45 AM

That's precisely what I'm doing. I'm splitting by sentences, and then for each sentence that's still too long, I split them by natural breakpoints like colons, semicolons, commas, dashes, and conjunctions, and if any of /those/ are still too long, I then break by greedy-filling words. Then I do some fun manipulation on the raw audio tensors to maintain flow.