Why would you ever be in a situation where you have a UUID but don't know what i...

StreamBright · on Jan 22, 2021

Because somebody (not necessarily you) wrote the following code:

printf("%s - %s", id0, id1)

You can even do this:

printf("user-id %s group-id %s", group-id, user-id)

Do you think you will be able to quickly debug what is wrong in a high severity situation when a customer facing system is down?

I do not, so we use eids (external ids) that are easy to identify.

williamdclt · on Jan 23, 2021

It happened to me few times:

- We got a "400 bad request" because we received a UUID for something that wasn't even supposed to be a UUID. I had to fish around for what this UUID was actually referring to, to track down the bug. It was an integration with a partner, and some faulty branching logic. - We ask our customer support team to add IDs to engineering support request (customer ID, order ID, etc etc) so that we don't lose time. Sometimes they make a mistake (or it's not always clear what they're supposed to do) and we get a user ID that doesn't refer to any user. Bit of fishing around then again - Wild logs: "could not find entity f5fac1d0-5d0f-11eb-9ab7-3e22fb071270".

I'm sure I could find more example but you get the gist of it

seppel · on Jan 22, 2021

* You stumble upon them in log files or error messages.

* Somebody asks you why some API call is not working.

* You send around Excel files to higher management (for example in incidents).

globular-toast · on Jan 22, 2021

Now I'm even more baffled. What kind of stupid logger logs a uuid without a clue as to what kind of object it identifies? Why would you need to communicate uuids to higher management?

This makes me think of Douglas Adams. You know the answer is 42 but you forgot what the question was. I really can't imagine how this could ever happen in a business. "Yes, sir, there was a problem with 1087." "1087 what?" "I don't know, but it was number 1087."

seppel · on Jan 22, 2021

> What kind of stupid logger logs a uuid without a clue as to what kind of object it identifies?

The code cannot know what kind of object an uuid identifies, it can only assume that it identifies what it expects. So it you might see:

"Unknown person 550e8400-e29b-11d4-a716-446655440000" in the log files. Much more helpful for debugging, however, is: "Unknown person order-550e8400-e29b-11d4-a716-446655440000". Then you know somebody accidentally put an order id where a person id belongs.

This is an easy example, in the real world you have persons, groups, representatives, accounts, acls, etc, which are easily confused. If you just have an uuid, you will have a hard time figured out what it actually is.

SigmundA · on Jan 22, 2021

This is why you have column headers?

If you just want it to look that way in a text log, then sure prepend the type with the id wasting some space there, but to use it throughout your db eats up tons of space from both text encoding the id and repeating the column name in the data over and over.

seppel · on Jan 22, 2021

> This is why you have column headers?

I'm not sure I can follow. You get an uuid from somewhere, probably external code that you cannot even look at. The uuid comes stringly-typed, let's say as json. Then you have no chance to know what type it is, you can only assume it is the correct type that you expect.

Just putting "550e8400-e29b-11d4-a716-446655440000" into a person colum does not make it a person id.

If you get instead "order-550e8400-e29b-11d4-a716-446655440000" as a typed-uuid, you know for sure that this is not a person id. It is an order id. Somebody made a mistake somewhere. You also know that you probably have to look in the order DB to debug what it actually refers to.

> but to use it throughout your db eats up tons of space from both text encoding the id

You dont need to put your uuid in this format into the db. You can add/remove the type before/after accessing the db.

SigmundA · on Jan 22, 2021

Just putting "order-550e8400-e29b-11d4-a716-446655440000" does not make it an order id either, thats what foreign keys are for, mistakes can be made either way.

If your sending id via json you should be using json to encode type either:

{ order:"550e8400-e29b-11d4-a716-446655440000" }

Or if for some reason the value could contain multiple types:

{ thing: { t:"order", id:"550e8400-e29b-11d4-a716-446655440000" } }

Even then just use:

{ thing: { order :"550e8400-e29b-11d4-a716-446655440000" } }

This is redundant and not necessary adding overhead:

{ order: "order-550e8400-e29b-11d4-a716-446655440000" }

Storing that in your db as json even worse (Mongo, JSONB), storing as varchar in normal column not much better.

If your not storing ID with a type prefix in the db, then thats not the form of your id's that just the UI/encoding for them in your app which eliminates half the supposed advantage which was easy debugging say in db query tools. It also means you are parsing / deparsing "order-550e8400-e29b-11d4-a716-446655440000" somewhere in your db mapper code instead of just using your json or query string parser or protobuf or whatever, why?.

seppel · on Jan 22, 2021

> Just putting "order-550e8400-e29b-11d4-a716-446655440000" does not make it an order id either, thats what foreign keys are for, mistakes can be made either way.

I'm not talking about the DB level here, I'm talking about what goes over the wire or ends up in logs files. And if you are handing out the uuids to your consumers, you can be reasonable sure that you only get back the uuids you handed out. So if "order-550e8400-e29b-11d4-a716-446655440000" is not an order, it an error on your side and not on your clients side.

> If your sending id via json you should be using json to encode type either:

Yes and you should also not make any mistakes.

SigmundA · on Jan 22, 2021

So it all comes down to less mistakes made with:

order:"order-550e8400-e29b-11d4-a716-446655440000"

vs

order:"550e8400-e29b-11d4-a716-446655440000"

Either one will error in a sane system that checks to make sure the UUID is actually in the db either one can have mishandling sending the wrong id, you just have some extra redundant bytes double confirming the key type while the actual ID will still have to verified.

The nice thing about UUID over say ints is there should be no overlap, so it should be easy to confirm an error like that, double encoding the key from external apis, sure I guess a very slight reduction in time taken to verify an error, for logs aren't you logging the json with the key already?

Of course this whole discussion was about an time order UUID's which are mostly useful just for DB's, if we are just talking about how ID's are formatted for external use in the app, well geez I thought that was what query strings and json was for but ok.

seppel · on Jan 22, 2021

> Either one will error in a sane system that checks to make sure the UUID is actually in the db either one can have mishandling sending the wrong id

Yes of course both will error. What is easier to debug:

order:"person-550e8400-e29b-11d4-a716-446655440000"

or

order:"550e8400-e29b-11d4-a716-446655440000"

In which case you could locate 550e8400-e29b-11d4-a716-446655440000 in one of your DBs quicker in case of a emergency situation?

SigmundA · on Jan 23, 2021

Yes redundantly encoding type everywhere in your id will be easier for a specific kind of bug yet you're adding that overhead all the time in both your db mapper and extra bytes everywhere.

Or we are getting reports of invalid order ID let me run a scan across db's and see if its valid anywhere, oh hey it shows up in persons somebody mixed them up somewhere or no it's just invalid somebody messed up in some other way.

If it happens enough to encode it everywhere in ID's you could just have a dev tool that looks for it across your db's with a copy paste, a little longer to scan, vs overhead always in the running app so you can quickly determine that one specific bugs cause. Either way its an invalid ID and being a UUID it won't collide and cause worse problems with overlapping person and order ID as would occur with integers.

I just don't see the value but hey at least you're not storing it that way in the db right?

seppel · on Jan 23, 2021

> Yes redundantly encoding type everywhere in your id will be easier for a specific kind of bug yet you're adding that overhead all the time in both your db mapper and extra bytes everywhere.

I can safe the extra bytes by just removing the '-' from the uuids and using the more compact enconding (base what ever). And at the same time I'll safe countless of debugging hours, because everybody is aligned what kind of ids we are talking about.

> Or we are getting reports of invalid order ID let me run a scan across db's and see if its valid anywhere, oh hey it shows up in persons somebody mixed them up somewhere or no it's just invalid somebody messed up in some other way.

Because you have access to all the DBs and you can easily write a query over all your dbs. That is the naturally assumed.

SigmundA · on Jan 23, 2021

>uuids and using the more compact enconding (base what ever). And at the same time I'll safe countless of debugging hours, because everybody is aligned what kind of ids we are talking about.

Will never be as compact as 16 byte binary UUID's. Will never be as compact as the same compact text encoding without the type prefix. Will always have the runtime overhead of parsing the type and id and rendering them back from the db.

Again if that one specific bug is so common to add extra id and db mapping overhead ALL THE TIME in the application maybe a simple devop tool should be made available to paste an id and verify its type would be better use of computing resources removing constant runtime overhead.

Seriously I have seen this type of bug once in a great while and it sucks with integer keys because they can easily overlap causing pretty nasty outcomes, UUIDs eliminate that. If I had to dig further I might do a search to see if its actually a valid UUID somewhere, but seriously someone is just sending the wrong ID, it should be pretty quick to track down without the prefix since its globally unique.

StreamBright · on Jan 22, 2021

>> but to use it throughout your db eats up tons of space

I think human resources and outages cost a lot more than disk space these days. If I can trade disk space for either of those I will do it in a blink of an eye.

SigmundA · on Jan 22, 2021

Not just disk space, larger index size mean less fits in caches and memory, more I/O etc.

These are keys which are typically used constantly by an app in the db with joins etc. They are some of the hottest data typically and you want to avoid cache misses on them if possible.

kube-system · on Jan 22, 2021

I can think of a lot of reasons why it might happen.

Maybe you have a process accepts multiple types of objects. Maybe you think you are passing in the correct type of object but you are not. Maybe the person reporting the error to you omitted information. Maybe the person who wrote the application didn't log enough context. Yes, all of these situations are not ideal, but if the process was ideal you wouldn't be debugging it in the first place.

> I really can't imagine how this could ever happen in a business. "Yes, sir, there was a problem with 1087." "1087 what?" "I don't know, but it was number 1087."

This happens all the time in many businesses. Users rarely generate perfect error reports, and it's not uncommon for intermediaries to incorrectly communicate details either. What developer hasn't seen an error report that consists of a vague screenshot with "doesn't work, got this error" as the description?

globular-toast · on Jan 22, 2021

> Users rarely generate perfect error reports, and it's not uncommon for intermediaries to incorrectly communicate details either. What developer hasn't seen an error report that consists of a screenshot with "doesn't work, got this error" as the description?

And the solution is to have them repeat a UUID to you? I don't think so...

kube-system · on Jan 22, 2021

Something can be not a solution to a problem, but contribute to making your life easier. This is that.

There are certainly applications that have type-ambiguous IDs which work just fine too. Not all engineers make the same decisions; that's ok.

jamescampbell · on Jan 22, 2021

I am with you. This makes no sense to me. But I also have wrapped userdata or account data in a generated hash to quickly have access to the underlying account info / expiration info etc. I think it is easier to justify why to implement vs. why not to implement.