Hacker News new | past | comments | ask | show | jobs | submit login

> Do you leave your windows open when you leave from home just because the burglar can kick the front door in instead?

Yes (or at least: I ain't terribly worried if I do forget to close the windows before leaving), because the likelihood of a burglar climbing up multiple floors to target my apartment's windows is far lower than the likelihood of the burglar breaking through my front door.

But I digress...

> You're missing the point and you're not thinking like a hacker yet.

I'd say the same about you. To a hacker who's sufficiently motivated to go through all those steps you describe, a lack of publicly-known autoincremented IDs is hardly an obstacle. It might deter the average script kiddie or botnet, but only for as long as they're unwilling to rewrite their scripts to iterate over alphanumerics instead of just numerics, and they ain't the ones performing attacks like you describe in the first place.

> For example your profile here is '/user?id=yellowapple' not '/user?id=1337'.

In either case that's public info, and it's straightforward to build a list of at least some subset of Hacker News users by other means (e.g. scraping comments/posts - which, mind you, are sequential IDs AFAICT). Yes, it's slightly more difficult than looping over user IDs directly, but not by a significant enough factor to be a worthwhile security mitigation, even in depth.

Unless someone like @dang feels inclined to correct me on this, I'm reasonably sure the decision to use usernames instead of internal IDs in HN profile URLs has very little to do with this particular "mitigation" and everything to do with convenience (and I wouldn't be all that surprised if the username is the primary key in the first place). '/user?id=worksonmine' is much easier to remember than '/user?id=8675309', be it for HN or for any other social network / forum / etc.

> Sometimes an incrementing ID is perfectly fine, but it's more difficult to shard across services so I usually default to UUID anyway except when I really want an incrementing ID.

Sharding is indeed a much more valid reason to refrain from autoincrementing IDs, publicly-known or otherwise.




> multiple floors

But not if you lived on the ground floor I assume?

> iterate over alphanumerics instead of just numerics

Why would you suggest another sequential ID as a solution to a sequential ID? I didn't, UUIDs have decent entropy and are not viable to brute force. Don't bastardize my point just to make a response.

> I'm reasonably sure the decision to use usernames instead of internal IDs in HN profile URLs has very little to do with this particular "mitigation" and everything to do with convenience

It wasn't intended as an audit of HN, I was holding your hand when walking you through an example and chose the site we're on. I missed my mark and I don't think a third attempt will do much difference when you're trying so hard not to get it. If you someday have public data you don't want a competitor to enumerate you'll know the solution exists.


> But not if you lived on the ground floor I assume?

Still wouldn't make much of a difference when my windows are trivial to break into even when closed.

> Why would you suggest another sequential ID

I didn't.

> UUIDs have decent entropy and are not viable to brute force.

They're 128 bits (for UUIDv4 specifically; other varieties in common use are less random). That's a pretty good amount of entropy, but far from insurmountable.

And you don't even need to brute-force anything; if these IDs are public, then they're almost certainly being used elsewhere, wherein they can be captured. There's also nothing stopping an attacker from probing IDs ahead of time, prior to an exploit being found. The mitigations to these issues are applicable no matter the format or size of the IDs in question.

And this is assuming the attacker cares about the specific case of attacking all users. If the attacker is targeting a specific user, or just wants to attack the first user one finds, then none of that entropy matters in the slightest.

Put simply: there are enough holes in the "random IDs are a meaningful security measure" argument for it to work well as a strainer for my pasta.

> when you're trying so hard not to get it

Now ain't that the pot calling the kettle black.

> If you someday have public data you don't want a competitor to enumerate you'll know the solution exists.

If I don't want a competitor to enumerate it then I wouldn't make the data public in the first place. Kinda hard to enumerate things when the only result of attempting to access them is a 404.


> I didn't

Then how would you iterate it? You said:

>> iterate over alphanumerics instead of just numerics

I assume you mean kind of like youtube IDs where 123xyz is converted to numerics. I wouldn't call brute-forcing iterating, at least not in the sense we're discussing here.

> They're 128 bits (for UUIDv4 specifically; other varieties in common use are less random). That's a pretty good amount of entropy, but far from insurmountable.

If you generate a billion UUIDs every second for 100 years you have a 50% chance to have 1 (one!) collision. It's absolutely useless to try to guess even a small subset.

> And you don't even need to brute-force anything; if these IDs are public, then they're almost certainly being used elsewhere, wherein they can be captured.

Sessions can be hijacked anyway, so why not leave the user session db exposed to the web unprotected. Right? There will always be holes left when you fix one. That doesn't mean you should just give up and make it too easy.

> And this is assuming the attacker cares about the specific case of attacking all users. If the attacker is targeting a specific user, or just wants to attack the first user one finds, then none of that entropy matters in the slightest.

So your suggestion is to leave them all incrementing instead, do I understand you correctly?

> Put simply: there are enough holes in the "random IDs are a meaningful security measure" argument for it to work well as a strainer for my pasta.

It's such a simple thing to solve though that it doesn't really matter.

> If I don't want a competitor to enumerate it then I wouldn't make the data public in the first place. Kinda hard to enumerate things when the only result of attempting to access them is a 404.

There are lots of things you may want to have public but not easily crawlable. There might not even be the concept of users and restrictions. A product returning 404 for everything to everyone might not be very useful to anyone. You've been given plenty of examples in other comments, and I am sure you understand the points.

One example could be recipes, and you want to protect your work while giving real visitors (not users, there are no users to hack) the ability to search by name or ingredients. With incremental IDs you can scrape them all no problem and steal all that work. With a UUID you have to guess either the UUID for all, or guess every possible search term.

Another could be a chat app, and you don't want to expose the total number of users and messages sent. If the only way to start a chat is knowing a public key and all messages have a UUID as PK how would you enumerate this? With incrementing IDs you know you are user 1337 and you just sent message 1,000,000. With random IDs this is impossible to know. Anyone should still be able to add any user if the public key is known, so 404 is no solution.

I'm sure you'll have something to say about that as well. The point is to make it difficult to abuse, while still being useful to real visitors. I don't even understand your aversion to random IDs, are they so difficult to implement? What's the real problem?


> Then how would you iterate it?

Like with any other number. UUIDs are just 128-bit numbers, ranging from 00000000-0000-0000-0000-000000000000 to FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF.

I'll concede that iterating through the entirety of that range would take a very long time, but this still presumes that said iteration in its entirety is necessary in the first place.

> If you generate a billion UUIDs every second for 100 years you have a 50% chance to have 1 (one!) collision.

Maybe, if they're indeed randomly-generated. Are they indeed UUIDv4? Is your RNG up to snuff?

And the probability doesn't need to be as high as 50% to be a tangible danger. A 0.5% chance of an attacker guessing a UUID to exploit is still more than enough to consider that to be insufficient as a layer of security. Meanwhile, you're likely fragmenting your indices and journals, creating performance issues that could be exploited for a denial-of-service. Tradeoffs.

> Sessions can be hijacked anyway, so why not leave the user session db exposed to the web unprotected.

Hijacking a session is harder than finding a publicly-shared ID floating around somewhere.

> So your suggestion is to leave them all incrementing instead, do I understand you correctly?

My suggestion is to use what makes sense for your application. That might be a random number, if you've got a sharded database and you want something resistant to collisions. That might be a sequential number, if you need cache locality or simply want to know at a glance how many records you're dealing with. It might be a number with sequential high digits and random low digits, if you want the best of both worlds. It might be a text string if you just care about human readability and don't care about performance. It might even be multiple fields used together as a composite primary key if you're into that sort of thing.

My point, as was obvious in my original comment and pretty much every descendant thereof, is that of the myriad useful factors around picking a primary key type, security ain't really among them.

> There might not even be the concept of users and restrictions.

In which case everything served from the database should be treated as public - including the quantities thereof.

> You've been given plenty of examples in other comments, and I am sure you understand the points.

Correct, and if those examples were satisfactory for arguing a security rationale for avoiding sequential IDs, then I wouldn't still be here disagreeing with said arguments, now would I?

> One example could be recipes, and you want to protect your work while giving real visitors (not users, there are no users to hack) the ability to search by name or ingredients. With incremental IDs you can scrape them all no problem and steal all that work. With a UUID you have to guess either the UUID for all, or guess every possible search term.

You don't need to guess every possible search term. How many recipes don't use the most basic and common ingredients? Water, oil, sugar, salt, milk, eggs... I probably don't even have to count with my toes before I've come up with a list of search terms that would expose the vast majority of recipes for scraping. This is peak security theater.

> Another could be a chat app, and you don't want to expose the total number of users and messages sent. If the only way to start a chat is knowing a public key and all messages have a UUID as PK how would you enumerate this?

Dumbest way would be to DDoS the thing. At some number of concurrent sessions and/or messages per second, it'll start to choke, and that'll give you the upper bound on how many people are using it, at least at a time.

Smarter way would be to measure request times; as the user and message tables grow, querying them takes longer, as does inserting into them if there are any indices to update in the process - even more so when you're opting for random IDs instead of sequential IDs, because of that aforementioned index and journal fragmentation, and in general because they're random and therefore entail random access patterns.

> I don't even understand your aversion to random IDs, are they so difficult to implement? What's the real problem?

My aversion ain't to random IDs in and of themselves. They have their place, as I have repeatedly acknowledged in multiple comments in this thread (including to you specifically).

My aversion is to treating them as a layer of security, which they are not. Yes, I'm sure you can contrive all sorts of narrowly-tailored scenarios where they just so happen to provide some marginal benefit, but in the real world "sequential IDs are insecure" is the sort of cargo-cultish advice that's impossible to take seriously when there are countless far-bigger fish to fry.

My additional aversion is to treating them as a decision without cost, which they also are not. As touched upon above, they carry some rather significant performance implications (see e.g. https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...), and naïvely defaulting to random IDs everywhere without considering those implications is a recipe for disaster. There are of course ways to mitigate those impacts, like using UUIDs that mix sequential and random elements (as that article advocates), but this is at the expense of the supposed "security" benefits (if you know how old a record is you can often narrow the search range for its ID), and it still requires a lot more planning and design and debugging and such compared to the tried-and-true "just slap an autoincrementing integer on it and call it a day".


> I'll concede that iterating through the entirety of that range would take a very long time

You don't say?

> but this still presumes that said iteration in its entirety is necessary in the first place.

It is, because it's compared to sequential IDs where you know exactly the start and end. No way of knowing with UUIDs.

> Maybe, if they're indeed randomly-generated. Are they indeed UUIDv4? Is your RNG up to snuff?

Stop constantly moving the goalposts, and assume they're used correctly, Jesus Christ. Anytime someone talks about UUID it's most likely v4, unless you want non-random/deterministic v5 or time-sortable v7. But the most common is v4.

> Hijacking a session is harder than finding a publicly-shared ID floating around somewhere.

Even firebase stores refresh tokens accessible by javascript (localstorage as opposed to HTTPOnly). Any extension with sufficient users is more viable than finding a single collision of a UUID. Programs with access to the cookies on the filesystem, etc. It's much easier to hijack sessions than guessing UUIDs.

> Dumbest way would be to DDoS the thing. At some number of concurrent sessions and/or messages per second, it'll start to choke, and that'll give you the upper bound on how many people are using it, at least at a time.

Won't tell you a single thing. Might be a beefy server serving 3 users, or a slow server with lots of downtime for millions of users. They may all be using it at once, or few of them sporadically. No way of knowing which. It's a retarded guesstimate. But this only shows the mental gymnastics you're willing to use to be "right". Soon you'll tell me "I'll just ask my contact at the NSA".

> sequential IDs are insecure

Nobody claimed this. They can be enumerated which may or may not be a problem depending on the data. Which I said in my first comment, and you seem to agree. This entire thread was a complete waste of my time.


> This entire thread was a complete waste of my time.

The feeling is mutual. Have a nice day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: