Hacker News new | past | comments | ask | show | jobs | submit login

Unless you have specific needs, the only type of UUID you should care about is v4.

v1: mac address + time + random

v4: completely random

v5: input + seed (consistent, derived from input)

v7: time + random (distributed sortable ids)




As someone who only cares about v4, I periodically wonder why don't I just use fully random 128-bit identifiers instead (without the version information).


That'd be perfectly fine. You need to take some care to avoid duplicates or patterns though. Databases might not come with crypto-random out of the box but usually do have UUID support. Similarly, you need to use the crypto-random routines in your favorite standard library.

Depending on the implementation you still might have to worry about seeding issues. That's probably moot though since the UUID library would probably be compromised by something like that under the hood too.


Because the rest of the world uses uuidv4 and the extra couple bits doesn’t really buy you anything


UUID v4 isn't large enough to prevent collisions, that is why segment.io created https://github.com/segmentio/ksuid which is 160bit vs the 128bit of a UUIDv4.


I think you vastly overestimate the likelihood of collisions. If all of the 2 billion computers in the world produced a new uuidv4 every millisecond, we still wouldn't expect a collision for the next 5 quintillion years.

Or if the same number of bits are used in a more structured manner, like uuidv1 which combines a 48 bit MAC address, 60 bit 100-nanosecond timestamp, and a 14 bit uniquifier with an effective resolution of 6.5 femtoseconds, you could have 300 quadrillion computers make 160 billion uuidv1s per second with guaranteed zero collisions until the year 3400 (plus an extra 500 or so years because the timestamp is referenced to 1542 for some silly reason), after which point any collisions that do happen will be completely irrelevant because they're guaranteed to be colliding with db entries created over 2000 years prior.


> If all of the 2 billion computers in the world produced a new uuidv4 every millisecond, we still wouldn't expect a collision for the next 5 quintillion years.

This would generate 2^127.8 UUIDs.

First, no, the collision would be expected in less than one year, approximately after exhausting square root of the available space (2^64 generated UUIDs): https://en.wikipedia.org/wiki/Birthday_problem

Second, no, the UUIDv4 has 122 random bits, not 128 as you thought: https://en.wikipedia.org/wiki/Universally_unique_identifier#...


Sure, the presence of some collision somewhere is still likely to happen, but the chances that that collision will actually matter for anything is still vanishingly small.

In the real world, we do not spend 100% of our entire species's computing capacity generating uuids and doing nothing else. In the real world, we aren't burning through uuids at a rate of 2 billion per millisecond, and even if we were it wouldn't matter because the true denominator is the scope of the data system the uuid will be referenced in: if your hard drive partition and my webapp user entry happen to get the same uuid, we will never know, and if for some reason it does matter, then we can use uuidv1 or uuidv7 which guarantee no collisions for thousands of years by embedding a timestamp.


No number of bits is large enough to _prevent_ collisions.


That doesn't help me. My code may generate a whole bunch of IDs in a microsecond, so using a 32bit time isn't going to keep them time sorted. I may as well just use a UUIDv4 at that point.


Looks like you got things a bit mixed up. You’ll not see a v4 collision in your lifetime, that is not why they built ksuid. At the time, UUIDv1 was the only standard alternative that includes a time component, but in way less bits, not sortable, and the fixed mac address takes a lot of space from the random bits.

ksuid is similar to Twitter Snowflake, the main goal is distributed generation of collision-free sortable IDs. The UUIDv7 proposal is meant to address the same use case. You don’t need to worry about collisions as much here, as the timestamp is monotonically increasing, there is a 42-bit counter for every millisecond + the random 32 bits at the end. You’d have to be generating trillions of IDs per second to have the chance of a collision.


It would seem sequential keys for database performance is more than a 'specific' need.


Those are not helpful for database performance in a general sense. Last product I worked on used v4 uuids as primary keys without any issues - single master database, sorting by creation time only used in admin panels. You can index on created_at if needed. The IDs being sortable would be a non-feature.

Generating sortable IDs in very high volume, in a distributed architecture, is a problem very specific to systems like social networks, metrics collection etc. You won’t need that for your average e-commerce app or machine parts database.


v4 being completely random has terrible properties even on a non distributed database. Probably better to use v7.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: