Hacker News new | past | comments | ask | show | jobs | submit login

Integers are by far the best primary key material possible. Only for lack of discipline do they seem inferior to alternatives.

See a prior book I wrote on HN regarding this: https://news.ycombinator.com/item?id=25309248

The current domain model I am working with utilizes a global integer sequence to key all entities. This implicitly eliminates the class of bugs where the same keys of different types overlap and would otherwise mask exceptions. It also enables powerful domain modeling techniques in which the identities of things are themselves to be thought of as first class entities and referred to as a common class of thing. This is a little mind-bending at first, but it enables some really powerful abstractions that would otherwise be infeasible if we had to switch over all possible types of keys.

The benefits of an integer key vs a guid key are quite profound when you get into the academics of information theory. They provide implicit creation order of things, whereas GUIDs cannot. They are deterministic in that there will never be a collision. Their range can be made to be infinite. Integers are perfectly efficient, even if the computer representation isn't necessarily so - BigInteger types scale gracefully.




> The benefits of an integer key vs a guid key are quite profound when you get into the academics of information theory. They provide implicit creation order of things, whereas GUIDs cannot.

We had this requirement, but a semi-UUID solution is desirable in a distributed setting. Ref prior art by Instagram engineering: https://archive.is/Dydln

(from your linked comment)

> If you are worried about security (i.e. someone hitting sequential keys in your URLs), then this is arguably an application problem. You should probably generate an additional column that stores some more obfuscated representation of the primary key and index it separately.

Yep, see: https://github.com/ai/nanoid and https://hashids.org/


The creation of integers doesn't scale for large data volumes. Plus you need a place to create these integers, and a failover location, which adds complexity. Multiple machines can each be creating their own guids in a very simple manner.

And the odds of a guid collision is extremely low, and for most applications is acceptable. Having worked with petabytes of data guid performance isn't really an issue as there are more important factors to worry about.


> The creation of integers doesn't scale for large data volumes.

At which specific integer does the scaling start to slow down?


One concurrent insertion, or one network partition.

UUIDs can be generated on many machines with no awareness of each other and merged later.


I would recommend reviewing my prior comments on this, as I address the concerns of multiple nodes needing to be able to independently produce identities without collisions or coordination.

If you know beforehand the maximum number of participants in your system, you can divide the keyspace across that quantity. If you are using BigInteger or equivalent, you have an infinite number of these things to work with, so it doesnt really matter if you wind up skipping trillions of identities at first. The original article even advocates for this as its first point, but without as much practical justification.


If you're going to use a key space in the trillions, and partitioned and sparsely populated rather than sequential... isn't that just reinventing what the UUID already is?

A UUID is just a 128-bit integer, with creation algorithms designed to partition that space by things that already have enough entropy to need no further synchronization.

What you're proposing sounds like roll-your-own-UUID, which might be similarly inadvisable as roll-your-own-crypto.


Don't try to compare this to rolling your own crypto. The stakes are nowhere near the same for IDs as for crypto, and the stakes are the defining feature of the "don't roll your own crypto" meme.


Yes I'm comparing it and you don't get to tell me I can't.

It's not about the stakes. It's the idea that in rolling your own, you're going to get it wrong, or otherwise do worse than existing ways that have already solved the same problem.


That applies to literally all code. So if it's not about the stakes, only experts are ever allowed to deploy code of any kind that they've written themselves.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: