Ask HN: We just had an actual UUID v4 collision...

264 points • by mittermayr • today at 7:57 AM • 224 comments • view on HN

I know what you're thinking... and I still can't believe it, but...

This morning, our database flagged a duplicate UUID (v4). I checked, thinking it may have been a double-insert bug or something, but no.

The original UUID was from a record added in 2025 (about a year ago), and today the system inserted a new document with a fresh UUIDv4 and it came up with the exact same one:

b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd

We're using this: https://www.npmjs.com/package/uuid

I thought this is technically impossible, and it will never happen, and since we're not modifying the UUIDs in any way, I really wonder how that.... is possible!? We're literally only calling:

import { v4 as uuidv4 } from "uuid";

const document_id = uuidv4();

... and then insert into the database, that's it.

Additionally, the database only has about 15.000 records, and now one collision. Statistically... impossible.

Has that ever happened to anyone?! What in the...

Comments

jandrewrogers • today at 4:41 PM

This is surprisingly common.

The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.

It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.

UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

➕ show 2 replies

throwaway_19sz • today at 10:39 AM

Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.

➕ show 5 replies

CodesInChaos • today at 6:17 PM

This is usually caused by an insufficently seeded PRNG.

Are you generating the UUID in the backend, or the frontend? Frontend is fundamentally unreliable for many reasons, including deliberate collisions. So if that case you'll need to handle collisions somehow. Though you can still engineer around common sources of collisions, the specifics depend on the environment.

On the other hand making a backend reliable is feasible. What kind of environment is your code running in? Historically VMs sometimes suffered from this problem, though this should be solved nowadays. Heavily sandboxed processes might still run into this, if the RNG library uses an unsafe fallback. Forking processes or VMs can cause state duplication and thus collisions.

_kst_ • today at 7:16 PM

This reminds me of a passage from the book "Pro Git".

<https://git-scm.com/book/en/v2>

"Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (6.5 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. Thus, an organic SHA-1 collision is less likely than every member of your programming team being attacked and killed by wolves in unrelated incidents on the same night."

Deliberate collisions are addressed in the following paragraph.

SHA-1 hashes are not random, so the issue of poor pseudo-random number generation doesn't apply as it does to uuidv4. And SHA-1 hashes are 160 bits, vs. 128 for uuidv4.

But I love the idea of unrelated wolf attacks.

➕ show 2 replies

sedatk • today at 10:11 PM

> Duplicate UUIDs (Googlebot)

> This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers. This can cause problems for apps that expect client-generated UUIDs to always be unique. Developers should be prepared for this and have a strategy for dealing with possible collisions, such as:

> - Check for duplicate UUIDs, fail gracefully

> - Disable write operations for Googlebot clients

https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd...

e12e • today at 5:10 PM

Some discussion here:

https://github.com/uuidjs/uuid/issues/546

Eg:

> FWIW, I just tested crypto.getRandomValues() behavior on googlebot and it is also deterministic(!)

juancn • today at 1:52 PM

Something off on how the RNG is initialized? Lack of entropy?

If the rng is not customized it will use:

    const rnds8 = new Uint8Array(16);
    export default function rng() {
        return crypto.getRandomValues(rnds8);
    }

getRandomValues doesn't specify a minimum amount of entropy.

➕ show 1 reply

adyavanapalli • today at 9:40 AM

What you're talking about is so extremely rare that it's much more likely that the entire Earth is destroyed by an asteroid right this inst...

➕ show 4 replies

beejiu • today at 6:04 PM

Are your UUIDs generated client side or server side? If it's client side, it could be due to a crawling bot. Googlebot for example executes Javascript using deterministic "randomness".

➕ show 1 reply

dweez • today at 4:50 PM

Good moment to revisit this fun article: https://jasonfantl.com/posts/Universal-Unique-IDs/

If the entire universe were turned into a giant computer and did nothing but generate uuids until its heat death, how many bits would you need for the ID space?

mittermayr • today at 8:20 AM

I fully agree. It makes no sense. Yet...

The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.

I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.

➕ show 4 replies

Geee • today at 11:22 AM

According to the many-worlds interpretation of quantum mechanics, there's bound to be one branch of universe where every UUID is the same. Can you imagine what those guys are thinking?

➕ show 3 replies

smokel • today at 8:05 PM

Multiple times have I blamed compilers, cosmic rays, quantum effects, or at the very least an obscure kernel bug, before realizing that I was the source of a bug.

A collision at 15,000 records is so unlikely that I would first suspect something else. Duplicate processing, replayed requests, reused objects, misleading logs, or another code path reusing the identifier.

Could you share a bit more of the surrounding code so we can check?

jbverschoor • today at 4:37 PM

Most plausible cause: uuid package depends on some random number generator package, which has recently been compromised in order to make “random” numbers predictable. As a result, many crypto (ssl + currency) projects are compromised due to a supplychain attack.

➕ show 1 reply

merlindru • today at 2:28 PM

Gotta be a seeding issue. If it's not, and you can prove it, you're about to be a little famous probably :P

8organicbits • today at 8:55 PM

I wrote about real world collisions, including that particular library last year (https://alexsci.com/blog/uuid-oops/).

There are a bunch of constraints that must be strictly held for UUIDs to be collision resistant, I'd guess there is a problem with your random number generator.

Lammy • today at 8:40 PM

> I thought this is technically impossible, and it will never happen

I always hated this meme/mindset, because if you dig in to the history of them you'll see that their original purpose was to collide. They were labels to identify messages in Apollo's distributed computing architecture. UID and later UUIDs were a reversible way to mark an intersection point between two dimensions.

Any two nodes in a distributed system would generate the same UID/UUID for the same two inputs, and a recipient of an identified message could reverse the identifier back into the original components. They were designed as labels for ephemeral messages so the two dimensions were time and hardware ID (originally Apollo serial number, later 802.3 hwaddress etc).

I think a lot of the confusion can be traced to the very earliest AEGIS implementation where the Apollo engineers started using “canned” (their term, i.e. static or well-known) UIDs to identify filesystems. Over time the popular usage of UUID fully shifted from ephemeral identifiers where duplicates were intentional toward canned identifiers where duplicates were unwanted and the two dimensions were random-and-also-random.

tumdum_ • today at 9:48 AM

Poorly seeded prng.

➕ show 1 reply

zie • today at 9:23 PM

You forgot to use https://www.random.org/ as your source of randomness :)

leni536 • today at 10:34 AM

It's not happening by chance, there is a bug somewhere.

From what I skimmed the package should just call to the js runtime's crypto.randomUUID(). I think it should always be properly seeded.

I think it is extremely unlikely that the runtime has a bug here, but who knows? What js runtime do you use?

serf • today at 8:05 AM

1 in 4.72 × 10²⁸

1 in 47.3 octillion.

i'd be suspecting a race condition or some other naive mistake, otherwise id be stocking up on lottery tickets.

(lol at the other user posting at the same time about the lottery ticket.. great minds and all that.)

➕ show 2 replies

jordiburgos • today at 8:57 AM

Please, do not use b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd, I checked my database and I was using it already.

➕ show 4 replies

nu11ptr • today at 6:00 PM

Ultimately it comes down to your entropy source. I always generate and insert in a loop for this reason, if there is a collision, I therefore handle that gracefully.

baq • today at 4:32 PM

the vm you're running on virtualized all the entropy away.

rglover • today at 6:38 PM

A check inside the generator function is the best way I've found to avoid this. Wrap uuid or whatever random generator with a check against an ID cache. If it already exists, just run the generator recursively.

mdavid626 • today at 5:49 PM

Or there is some other explanation, eg. somebody messed with the request manually, or with the db.

sudb • today at 4:56 PM

This is first time I have experienced some vindication that choosing CUID2[1] for one of my projects was actually a good idea.

1. https://github.com/paralleldrive/cuid2

glaslong • today at 10:08 AM

Buy some lava lamps

nozzlegear • today at 6:02 PM

> I thought this is technically impossible, and it will never happen,

In an eternal universe, even the most unlikely of events will happen an infinite number of times.

coldtea • today at 8:26 PM

Were the chances than an npm package is crap factored in?

sqquima • today at 5:41 PM

Meta, but if I had a question like this, I'd likely have asked on Twitter or Reddit first. I'll keep in mind using HN as an alternative Q&A site.

sbuttgereit • today at 5:00 PM

> I thought this is technically impossible

No, very technically possible... though, with good randomness, very, very unlikely.

But nothing technically prevents a UUIDv4 from generating a duplicate value.

danfritz • today at 5:42 PM

Always let your db generate uuids. On postgres this is easy since v18 it supports uuid v7!

There is no need to set uuids through javascript or node imo

➕ show 1 reply

beardyw • today at 8:30 AM

Just a stupid question, but why not append the date, even in seconds as hex. It's just a few bytes and would guarantee that everything OK now will be OK in the future?

➕ show 4 replies

NKosmatos • today at 10:40 AM

> I thought this is technically impossible

Actually it's not impossible, but very very improbable.

P.S. You should play a lottery/powerball ticket

P.P.S. Whenever I use the word improbable, the https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D... comes in mind

➕ show 2 replies

wg0 • today at 8:39 AM

Would the UUID v7 be more collision proof? Hard to say because it takes time into account but then the number of entropy bits are reduced hence the UUID generated exactly at the same time have more chance of a collusion because number of entropy bits are a much smaller space hence could result in collusions more easily.

Thoughts?

➕ show 1 reply

shortercode • today at 5:29 PM

Fun thing about random is that these things happen. UUIDv7 is less prone to this as it includes both a time component and random. I’ve been using ULID in a few project which has similar attributes to uuidv7 but more space efficient.

not_math • today at 10:38 AM

Reminds me of some code I saw running in production. Every time we added a new entry, we were pulling all the UUIDs from this table, generating a new UUID, and checking for collisions up to 10 times.

dist-epoch • today at 6:24 PM

It's much more likely that you hit an "impossible bug" due to a bit flip somewhere.

Imagine the database having the old UUID in a memory buffer due to a recent index scan, and a bit flip happened somewhere in the logic which basically copied the old UUID into the memory location of the new UUID, or some buffer addresses got swapped, or the operation which allocated the new UUID received a memory buffer containing the old one, and due to a bit flip the memcpy operation was skipped, or something along that line.

Facebook wrote extensively about this, stuff like "if (false) {do_x(); )" and do_x being called. For example their critical RocksDB kv store has extensive redundant protections to defend against such "impossible bugs".

lyfeninja • today at 11:02 AM

Although incredibly rare, it's not impossible so probably best to just plan for collisions. A simply retry should suffice. But I agree I feel like something is going on somewhere else ...

AndreyK1984 • today at 10:26 AM

Why not to have timestamp-uuid instead ?

➕ show 1 reply

nhumrich • today at 7:36 PM

> technically impossible

Not at all! Just very unlikely. It's about odds and statistics. Not physics.

➕ show 1 reply

zuzululu • today at 7:22 PM

just uuidv5

OutOfHere • today at 12:52 PM

This is why I prefer to use a random base32 string over UUID. At least you get a proper 128 bit entropy instead of just a 122 bit entropy as with UUIDv4. That's a 64x difference in collision probability. I always thought UUIDs were a toy, not for serious use. If you control the strings, you can even make a longer ID.

Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.

ares623 • today at 10:30 AM

Buy a lottery ticket

naikrovek • today at 8:56 AM

The chance of a UUIDv4 collision is very low, but it is never zero.

If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.

➕ show 1 reply

kittikitti • today at 7:49 PM

Almost all pseudo-random number generators are absolute garbage. They need you believe they work because the NSA needs backdoors and to foolproof ransomware attacks. This isn't surprising at all to me.

dividendflow • today at 6:40 PM

[flagged]

ESAM_C • today at 8:25 AM

[dead]

alt Hacker News

Ask HN: We just had an actual UUID v4 collision...

Comments

🔗 View 3 more comments