logoalt Hacker News

notatallshawyesterday at 9:15 PM1 replyview on HN

It looks like TikToken is written in Rust (https://github.com/openai/tiktoken/tree/main/src), are the gains here actually from porting to C++?


Replies

fhubtoday at 3:11 AM

From the post

Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.