Hacker News new | past | comments | ask | show | jobs | submit login

An issue I've always had with UUIDs and ULIDs is there isn't a great way to generate one deterministically, as far as I can tell: for a lot of use-cases, being able to reprocess data and generate identical IDs is really useful and there isn't a standard way that I know of to achieve this.



That's UUID v5 (uses a sha1 hash of input data).


Are you looking for something other than just a custom seed in the RNG?



Sure, there are workarounds in various languages, but it would nice to have a standardized hash-based UUID or ULID


If it's a standardized sequence, then that's no different than just 0, 1, 2, 3 but with different names. If you just want a non-sequential but deterministic sequence, then that's every random number generator that accepts a seed value, and being anymore standardized than that makes zero sense.


The problem with autoincrement in this context is you can’t reproduce the right value when replaying the input streams for your stream processing job. Hashing some combination of values and using that as a primary key solves this problem nicely and, when you’re using bitemporal data modeling, makes it easy to correct mistakes. The point of standardization is compatibility, not standardizing the sequence of keys used.


I agree on all points you're making, but you can't standardize on hashing when the data being hashed will vary due to business reasons. I just can't see any way that this can be realistically standardized outside of a single business, maybe even business-unit depending on the kind of company.

Perhaps you mean something like "standardized hash of all columnar data for the table row," but then you're just reinventing elasticsearch/lucene, with all its pros and cons. The power of foreign keys for a RDBS is that they are pointers, and as pointers, the mutability of their underlying data is what makes them powerful. I think I get what you're asking for, but I also think there can be no possible standard that is reasonable unless you have the technology to take a total snapshot of the universe, at which point, why not just measure the universe itself as your database? Perfect storage system.


from the article, it sounds like this is V5?


I missed that because I typically am using ULIDs these days. But, yeah, some standardized format for a hash of message data is what I want.


why wouldn't you use some sort of collision resistant hashing function on the data to achieve this instead?


Some systems expect UUIDs so you don't always have that choice.


v5... I use them all the time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: