I don't like that they're not apples to apples; less bits so of course it'll take less tokens.
> Where UUIDs cost ~23 tokens and get hallucinated by LLMs
How does this solve the hallucination problem?
Just removing the - from the example UUID takes it from 26 tokens to 18
LLMs are good at predicting words, since each word in the id is ~1 BPE token. But uuids are random hex characters, this is where LLMs struggle to output the right ids.
You can use the .from method https://github.com/vostride/id-agent/#idagentfrominput-opts
To convert uuid or any text to id-agent based id. Then do the LLM inference and then convert it back to UUID.
> Just removing the - from the example UUID takes it from 26 tokens to 18
And according to the table below, an id-agent with 120 bits of entropy (still 2 bits less than UUID) uses 17 tokens on average. So unless you purposefully want to reduce the entropy, this whole scheme is just as good as just removing the dashes from UUIDs. But that wouldn't make for a resume-worthy project (sorry, got a bit cynical there)