Hacker News new | past | comments | ask | show | jobs | submit login

That was my first thought... who on earth thought that a number (albeit 64-bit) would be enough for Twitter? Who even thought that a 32bit int would have been enough?

I didn't know that Javascript couldn't handle numbers bigger than 53-bits, but honestly, these should have been strings from the beginning.




I didn't know that Javascript couldn't handle numbers bigger than 53-bits

The JavaScript Number type can't handle more precision than 53 bits. Magnitude is orthogonal due to floating-point representation. Precision is governed by the size of the mantissa, which is 52+1 bits long in the 64-bit IEEE 754 representation used by JavaScript.


The problem isn't that their ids have already gone past the 53 bit (much less 64 bit) marker in sequential order. The problem is that they are going to start generating ids in a different fashion which is causing the issue.

This seems to be the relevant id generating code: https://github.com/twitter/snowflake/blob/master/src/main/sc...


64 bits is enough to have everyone on Earth send over a billion Tweets and still have enough room to find a new solution. That sounds like more than enough to me.


The problem is they added a timestamp and people assumed they would not need the full 64 bit ID.

IMO, it's not a bad idea on their part. A 32 bit UNIX timestamp * 2 ^ 32 + a 32 bit sequential id let's them track up to 4.2 billion tweets a second and should work just find up to the year 2106.

Edit: As to why it's a good idea, you can have different systems handing out ID's without stepping on each other’s toes or even talking to each other. The full ID is composed of a timestamp, a worker number, and a sequence number. Granted, I would probably put the sequence number ahead of the worker number so sub second tweets are better ordered vs. being ordered strictly based on the system that generated them.


Yeah, I guess this is why I didn't understand how they could use 64 bit numbers to begin with... I couldn't see how they were going to be able to generate them all without leaving huge gaps in the number-space. If you used 64bit ints you'd be unable to have one machine do the generating, so you'd have to have some sort of offset for the worker who generated the ID at the very least.

And once you've done that, why not just go all out and use UUIDs?


this is exactly what they have done in their new ids and the snowflake system. They have a timestamp, a sequence number and a system identifer, plus a few neat properties.

I don't think they can be blamed for using a trivial incremental key when they had 10 users, I am sure they were not expecting to have 200M :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: