Hi, I’m the author of uuidv47. The idea is simple: keep UUIDv7 internally for database indexing and ...

aabbdev • yesterday at 2:04 PM • 5 replies • view on HN

Hi, I’m the author of uuidv47. The idea is simple: keep UUIDv7 internally for database indexing and sortability, but emit UUIDv4-looking façades externally so clients don’t see timing patterns.

How it works: the 48-bit timestamp is XOR-masked with a keyed SipHash-2-4 stream derived from the UUID’s random field. The random bits are preserved, the version flips between 7 (inside) and 4 (outside), and the RFC variant is kept. The mapping is injective: (ts, rand) → (encTS, rand). Decode is just encTS ⊕ mask, so round-trip is exact.

Security: SipHash is a PRF, so observing façades doesn’t leak the key. Wrong key = wrong timestamp. Rotation can be done with a key-ID outside the UUID.

Performance: one SipHash over 10 bytes + a couple of 48-bit loads/stores. Nanosecond overhead, header-only C11, no deps, allocation-free.

Tests: SipHash reference vectors, round-trip encode/decode, and version/variant invariants.

Curious to hear feedback!

Replies

JimDabell • yesterday at 4:13 PM

I like the idea.

UUIDs are often generated client-side. Am I right in thinking that this isn’t possible with this approach? Even if you let clients give you UUIDs and they gave them back the masked versions, wouldn't you be vulnerable to a client providing two UUIDs with different ts and the same rand? So this is only designed for when you are generating the UUIDv7s yourself?

the_mitsuhiko • yesterday at 3:53 PM

Two pieces of feedback here:

1. You implicitly take away someone else's hypothetical benefit of leveraging UUID v7, which is disappointing for any consumer of your API.

2. By storing the UUIDs differently on your API service from internally, you're going to make your life just a tiny bit harder because now you have to go through this indirection of conversion, and I'm not sure if this is worth it.

➕ show 3 replies

inopinatus • today at 3:34 AM

My biggest concern is the entropic quality of the random bits, since the design of UUIDv7 is fundamentally more concerned with collisions than predictability; consequently, although the standard says SHOULD for their nonguessability it isn't a MUST, and leaves room for implementations that use a weak PRNG, or that increment a counter, or even place additional clock data in the apparently random bits (ref. RFC9562 s6.2 & s6.9).

So there's definitely some gotchas with relying on rand_a and rand_b in UUIDv7 for seeding a PRF, and when ingesting data from devices outside of your trust boundary (as may be the case with high-volume telemetry), even if you wrote the code they basically can't be trusted for this purpose, and if those bits are undisturbed in the output it's certainly a problem if the idea was to obfuscate serialisation, timing, or correlation.

Even generations we might assume are safe may not be completely safe; for example, the new uuidv7() in PostgreSQL 18 fills rand_a entirely from the high precision part of the timestamp, and this is RFC compliant. So if an import routine generates a big batch of such UUIDs, this v7-to-v4 scheme discloses output bits that can be used to relate individual records as part of the same group. That might be fine for data points pertaining to a vehicle engine. It might not be fine for identifiers that relate to people.

So, since not all UUIDv7 is created alike, I'd add a strong caveat: unless generating the rand_a and rand_b bits entirely oneself with a high degree of confidence in their nonguessibility, then this scheme may still leak information regarding timing, sequence, or correlation of records, and you will have to read the source code of your UUIDv7 implementation to know for sure.

sergeyprokhoren • yesterday at 6:04 PM

Bad idea. In PostgreSQL 18 the optional parameter shift will shift the computed timestamp by the given interval

https://www.postgresql.org/docs/18/functions-uuid.html

➕ show 1 reply

alt Hacker News

Replies