logoalt Hacker News

Show HN: Id-agent – Token efficient UUID alternative for AI agents

24 pointsby pranshuchittoratoday at 11:16 AM39 commentsview on HN

Comments

mrweaseltoday at 12:06 PM

Can someone explain why this would even be needed? Why is there a cost to generating say an UUIDv4? E.g. Claude Code has some regex in the client side code that filters out "bad words", so why can't the agent just generate UUIDs client side, using zero tokens.

I sort of get the "problem", but the fact that this is even needed is stupid.

show 6 replies
yunusabdtoday at 12:43 PM

That's nice, I've had the issue where LLMs would return non-existent uids. But does this package actually help with that? Token savings are nice, but not really my main concern. If this can measurably reduce hallucinations, it would be really useful.

> Where UUIDs cost ~23 tokens and get hallucinated by LLMs, id-agent produces memorable word-based IDs at ~14 tokens with equivalent collision resistance.

show 2 replies
nkmnztoday at 12:20 PM

Neat idea! I'd argue that the collision risk is basically zero because even though the entropy is lower, because you must validate the LLM-output anyways for two reasons:

1. LLMs might lack intrinsic entropy and reuse some UUIDs much more often.

2. Referential integrity is as important as collision resistance. An LLM must be able to reuse the correct id in the correct place.

On the other hand, using a dictionary for the ids helps with readability, but depending on the models strenghts, it might also add a confounder. After all, tokens that represent real words will probably influence the attention in a different way than random numbers.

synthostoday at 12:19 PM

Isn't this solving a subproblem of the overall issue of uncompressed tool call polluting context?

Furthermore, this could be compressed even further with a dynamic legend of every UUID in the context. So UUID@Bravo and UUID@Delta would be the actual symbols in the context but dynamically replaced when calling tools.

jy14898today at 12:03 PM

I don't like that they're not apples to apples; less bits so of course it'll take less tokens.

> Where UUIDs cost ~23 tokens and get hallucinated by LLMs

How does this solve the hallucination problem?

Just removing the - from the example UUID takes it from 26 tokens to 18

show 2 replies
Falimondatoday at 11:52 AM

Benchmark comparing conventional UUID and AID across models, hallucination rate, token usage, would be cool!

railkatoday at 12:02 PM

Why do people choose the hyphen ("-") as the separator in an identifier? When double-clicking, the ID does not select completely, unlike when an underscore ("_") is used.

show 2 replies
nithertoday at 11:49 AM

Smart idea but the concern can be that in the future, tokenization techniques and libraries may change. And also this looks like a very edge optimization to me. But overall, it deserve to exist. Good job.

whazortoday at 11:50 AM

i would be afraid of accidental prompt injection

show 3 replies
simedwtoday at 11:53 AM

Nice package, not only is using words more token-efficient [saving time and money], but weaker models are also less likely to make mistakes when providing the key, at least in my tests.

That said, for `createAliasMap`, don't you think you could create a deterministic mapping from and to UUIDs <-> word chains? That way, no additional state would be needed. [Might require fairly long word chains...]

thrancetoday at 11:54 AM

An even better solution is to present the AI with local IDs and map those to UUIDs outside of its context. So when giving a list of items for the LLM to choose from, just list them with incremental numbers (1, 2, 3...) and ask for these numbers in tool schemas.

show 1 reply
Tiberiumtoday at 11:52 AM

Is this just a reinvented humanhash?

show 2 replies
diimdeeptoday at 12:46 PM

just nanoid(5) https://github.com/ai/nanoid

show 1 reply
felipeyaneztoday at 11:52 AM

any plans for a python port?

show 1 reply