logoalt Hacker News

pamayesterday at 2:08 PM2 repliesview on HN

Cool. Would it be possible to eliminate that little vocab format conversion requirement for the vocab I see in the test against tiktoken? It would be nice to have a fully compatible drop in replacement without having to think about details. It also would be nice to have examples that work the other way around: initialize tiktoken as you normally would, including any specialized extension of standard tokenizers, and then use that initialized tokenizer to initialize a new tokendagger and test identity of results.


Replies

matthewolfeyesterday at 4:57 PM

Alright, 0.1.1 should now be a true drop-in replacement. I'll write up some examples soon.

matthewolfeyesterday at 4:25 PM

Ah good catch. Updating this right now.