Integers are by far the best primary key material possible. Only for lack of dis...

ignoramous · on Feb 23, 2021

> The benefits of an integer key vs a guid key are quite profound when you get into the academics of information theory. They provide implicit creation order of things, whereas GUIDs cannot.

We had this requirement, but a semi-UUID solution is desirable in a distributed setting. Ref prior art by Instagram engineering: https://archive.is/Dydln

(from your linked comment)

> If you are worried about security (i.e. someone hitting sequential keys in your URLs), then this is arguably an application problem. You should probably generate an additional column that stores some more obfuscated representation of the primary key and index it separately.

Yep, see: https://github.com/ai/nanoid and https://hashids.org/

baix777 · on Feb 19, 2021

The creation of integers doesn't scale for large data volumes. Plus you need a place to create these integers, and a failover location, which adds complexity. Multiple machines can each be creating their own guids in a very simple manner.

And the odds of a guid collision is extremely low, and for most applications is acceptable. Having worked with petabytes of data guid performance isn't really an issue as there are more important factors to worry about.

bob1029 · on Feb 19, 2021

> The creation of integers doesn't scale for large data volumes.

At which specific integer does the scaling start to slow down?

closeparen · on Feb 19, 2021

One concurrent insertion, or one network partition.

UUIDs can be generated on many machines with no awareness of each other and merged later.

bob1029 · on Feb 19, 2021

I would recommend reviewing my prior comments on this, as I address the concerns of multiple nodes needing to be able to independently produce identities without collisions or coordination.

If you know beforehand the maximum number of participants in your system, you can divide the keyspace across that quantity. If you are using BigInteger or equivalent, you have an infinite number of these things to work with, so it doesnt really matter if you wind up skipping trillions of identities at first. The original article even advocates for this as its first point, but without as much practical justification.

T-hawk · on Feb 19, 2021

If you're going to use a key space in the trillions, and partitioned and sparsely populated rather than sequential... isn't that just reinventing what the UUID already is?

A UUID is just a 128-bit integer, with creation algorithms designed to partition that space by things that already have enough entropy to need no further synchronization.

What you're proposing sounds like roll-your-own-UUID, which might be similarly inadvisable as roll-your-own-crypto.

andrewflnr · on Feb 19, 2021

Don't try to compare this to rolling your own crypto. The stakes are nowhere near the same for IDs as for crypto, and the stakes are the defining feature of the "don't roll your own crypto" meme.

T-hawk · on Feb 20, 2021

Yes I'm comparing it and you don't get to tell me I can't.

It's not about the stakes. It's the idea that in rolling your own, you're going to get it wrong, or otherwise do worse than existing ways that have already solved the same problem.

andrewflnr · on Feb 20, 2021

That applies to literally all code. So if it's not about the stakes, only experts are ever allowed to deploy code of any kind that they've written themselves.