Eli5?

tejohnso · 2025-02-10T20:45:27 1739220327

If you're trying to find an available cubby hole for your boots, you'll find one very quickly if they're almost all empty. But if they're almost all full, it will take longer for you find one. This discovery shows that it won't take as long as previously thought.

laweijfmvo · 2025-02-10T20:38:11 1739219891

hash tables are most famous for their O(1) lookup time, but then you have to deal with collisions. There's many famous ways of dealing with collisions and you can convince yourself of their [worst case] lookup time. The article is discussing a new upper bound on that worst case.

SilasX · 2025-02-10T21:33:54 1739223234

This is always a point of frustration for me. Under the normal big-O model, hashtables aren't, and can't be O(1) lookup, even for the mythic perfect hash function! You can only get the O(1) figure by breaking the big-O model and bolting on some in-practice assumptions[1] that are used nowhere else in discussion of asymptotic complexity.

Specifically, as you resize the table, you need a function with a bigger output space to give you more places to store the item. To have a bigger output space, you unavoidably have to do more computations. If you go from a function with n possible outputs to one with 2n possible outputs, as n goes to infinity, you eventually have to use a function with more steps.

[1] elaborated on in this earlier discussion: https://news.ycombinator.com/item?id=9807739

Specifically this comment about the extra assumptions: https://news.ycombinator.com/item?id=9813524

magicalhippo · 2025-02-11T00:58:18 1739235498

Your sense of frustration seems to stem from thinking about something which is similar to big-O, but not actually being big-O.

All of the points you raise are valid if you want to have a good understanding of actual runtime cost of an algorithm.

None of it is relevant if you want to compare algorithms according to big-O, in order to find one which doesn't lead to failure[1] when faced with large amounts of data.

Perhaps you should define your own big-R notation for what you're after.

[1]: https://arstechnica.com/information-technology/2013/12/expon...

SilasX · 2025-02-11T03:55:06 1739246106

What are you taking about? By the actual standards of big-O that allow you to derive the constant run time? See the second linked comment which establishes that it only becomes constant when you add in practical hardware considerations.

No bounded run time function can have an infinite output space (while still halting, anyway).

magicalhippo · 2025-02-11T10:16:23 1739268983

What you are saying in that post is that a hashmap should be O(log(n)) because if you have a word size w you need k_w = log(n)/w operations to compute the hash of a key that's large enough to not have collisions.

However by that logic the comparison function should also have a similar factor, since by the same argument the key values need to be large enough not to collide. Ie if the key values are just the natural numbers up to n for example, you'll need k_w words to store them and hence comparison takes O(k_w) = O(log(n)).

Thus a binary tree should also have an additional log(n) factor due to this, and hence be O(log(n)^2)).

However the point of big O is to compare, and since both have this log(n) factor you could just cancel that in both, leaving you with the familiar O(1) and O(log(n)) respectively for the hashmap and binary tree.

Is it sloppy to do that prematurely? Kinda yes, but it makes comparisons much easier as you don't have to go cancelling those log(n) factors all the time. And the point of big O is to compare the algorithms, not accurate running time.

SilasX · 2025-02-11T15:42:43 1739288563

Okay, so some communication tips:

1) You should have led with that to get to the point immediately instead of cryptically alluding to the existence of such a point and adding an irrelevant example.

2) Your point was already made in an initial reply[A]. You didn't need to make your comment at all, and if you were going to lazily allude to the existence of an argument, you could have just linked that one. Or, just upvote it. Comment sections don't need repeats of points already made unless you have something to add.

To address the point you did add:

3) You're still conceding the core point that you have to go outside the usual big-O model to get hashtables being O(1). In no other case does anyone take big-O to mean asymptotic complexity relative to comparable solutions for the same problem -- not unless that's explicitly stated. You can't have a data structure with constant-time lookup, period.

[A] https://news.ycombinator.com/item?id=43006957

sigbottle · 2025-02-11T21:28:07 1739309287

I actually kind of get where you're coming from, I think. I'm trying to understand your argument, I think it goes something like this:

1. "But if the universe grows too big, then the integers get too big!". Fair point.

2. You concede that the Word-RAM model does accurately describe an O(1) hashing operation.

3. We now define a new computation model, where hashing is just O(1).

4. "this is still wrong, because this doesn't model the behavior as keys grow to infinity"

The important thing here is that the hashing oracle isn't defined to even care about integer arithmetic complexity. I think one thing you may be confused about is that it seems like the hashing oracle can only be implemented with integer-sized keys, actually the thing that epeople are trying to minimize in hashtable research is calls to that oracle. That oracle can be some genie in a lamp, my deadbeat cousin, or a computer program.

---

I think you even get that though on a logical level, so the crux probably goes something like, "The word-RAM model forces me to be of bounded size, but for higher level intuition, I just forget that? What? And then I stitch them back together? I mean sure, given that it's O(1), I get it, but there's no way this 'stitches together' cleanly, it's not actually O(1)!"

I didn't even know this term, but it seems like people have invented a term for this, the trans-dichotomous model. Basically, word size (integer size) scales with problem definition size. So I imagine that it's just taken for granted that trans-dichotomy sort of just "propagates up" your stack of logic, informally, if you want to understand each and every operation.

I mean, for me, the first paragraph was mostly fine enough for me - analyzing abstract models asymptotically w.r.t. various kinds of cost is already something I'm familiar with, so just saying hashing is an O(1) cost by definition makes sense.

The proofs doesn't have to know about what the hashing oracle actually is - the "integer" in the Word-RAM model is not the same as the "integer" produced by the hashing oracle, even though they both have "O(1)" cost within their respective models. I think viewing it as hooking them up together hurts you.

Word-RAM Integer is O(1) because it's trans-dichotomous, and in this specific hashtable analysis, the hashing oracle is O(1) just because it is by definition, we only care about # of calls to that function.

Even though the intuition is that they're hooking up together - recall that your hashing oracle backend can be your drunk cousin at 3am spouting out a consistent uniform random number mapping in the range [0...n-1], we're not really particularly interested in anything else.

Oh, and also, the oracle must take in two parameters, (key, size of space), and the assumption is that, for each size of space, the key is randomly uniformly consistently mapped. Although in the paper they don't even need to deal with variable hashtable resize schemes so they don't even need to re-define this.

But like again, like, we make these "constants" by saying, "I don't care how the hash function is performed, as long as it satisfies these properties; all I'm gonna try to do is implement an algorithm that does the minimal hashing possible". That's not nefarious, that's standard abstraction practice.

I did try to sit with your doubt for a while though and this is the best explanation I can come up with. If I didn't understand, oh well.

https://en.wikipedia.org/wiki/Word_RAM https://en.wikipedia.org/wiki/Transdichotomous_model

SilasX · 2025-02-12T18:00:41 1739383241

The problem is, big-O doesn't use the word-RAM model mentioned in the comment. That's something you have to bolt on to the get the O(1) hashtable result. Which is the very point I was making originally, that you have to go outside standard big-O and incorporate practical considerations, which isn't done anywhere else as a matter of course.

magicalhippo · 2025-02-12T19:46:44 1739389604

> That's something you have to bolt on to the get the O(1) hashtable result.

No, you brought that on by claiming not considering it was an error.

In standard big-O you take as an assumption that memory operations are O(1), comparisons are O(1) and calculating hash values are of the same order as making a comparison (and vice versa).

Once you do that, then a hash table becomes O(1) on average, as it's just a O(1) calculation followed by a O(1) memory lookup (assuming no collisions on average).

SilasX · 2025-02-15T16:57:51 1739638671

>In standard big-O you take as an assumption that memory operations are O(1)

No. If it look up big-O in a math book, it's not going to say anything about that. You can add assumptions on, but that's not big-O anymore, but something different. That's the point.

sigbottle · 2025-02-12T19:51:32 1739389892

Again, Big-O is only relevant w.r.t a specific cost analysis.

Hashtable analyses don't care how you get the hash function - it can take exactly 72 hours to arrive, but as long as the oracle satisifes some probability distribution guarantee, it's good. That is an O(1) operation, a single hash.

Like, there's no going "outside" for big O, there's no "standard". It's always backed by some notion of what you are computing. There's pointer machines, turing machines, automata, circuits, for your complexity. Pointer machines can't directly address at offsets, Word-RAM can. Different tradeoffs for different analyses - pointer machines often model GC-collected languages and heavy pointer data structures, while Word-RAM deals with more compact data structures.

Again, as mentioned in the paper, Big O isn't used as a notion for number of total operations, it's only trying to *minimize the number of calls to the hashing oracle*. So your analyses can get even more fine-grained.

You might even be excited to learn about the External Memory Model. In that model, you have two blocks of memory; one block where *every single operation* is free, and one block in which you can't access, you must fetch into memory. You fetch blocks of size B, your local memory is B << N, and external memory is N << M. Even operation in block N is free, but and the only thing you can do on block M is read/write a block of size B. The question then is, to fetch the minimum number of blocks for your specific algorithm.

I've been originally confused by algorithms in this model, too - sometimes the "real number" of operations would be an order of magnitude more than the number of block accesses, but we just don't even count them? How does that make sense?

Turns out this actually models real hardware surprisingly well - it's often faster to simply recompute and do extra computation in order to minimize memory accesses. It's used a lot in databases. Although I can't find the exact wording "External memory model", the FlashAttention paper uses the wording "IO/Aware" and analyzes block transfers like this on the GPU, so... yeah.

So going 'higher up' in the abstraction stack doesn't necessarily sacrifice practicality either; in fact, it can sometimes help.

Sure, we can view the intuition as chaining together layers of abstraction, I get that view. But each layer functions perfectly logically, independently of the others. It's just like good decoupled software practice.

> you have to go outside standard big-O and incorporate practical considerations, which isn't done anywhere else as a matter of course.

Hopefully I've explained it better. Different models of computation are a fundamental part of theoretical computer science, since of course, as you've pointed out, you have to be very careful about defining operations. At the same time, part of the great part about CS (and math in general) is, that you can specify, "Assuming these conditions, what can I logically derive?" and that's just done everywhere.

SilasX · 2025-02-15T17:02:57 1739638977

>Hashtable analyses don't care how you get the hash function - it can take exactly 72 hours to arrive, but as long as the oracle satisifes some probability distribution guarantee, it's good. That is an O(1) operation, a single hash.

The problem is that, as n goes to infinity, you're not using the same hash function -- you can't. So even if you can assumed that one hash function has constant time, the later ones you need as you scale up wouldn't be.

And I don't know why you think you need to bring up probability distribution properties -- I made clear by now that the above points apply even to the mythic perfect hash function.

>Hopefully I've explained it better. Different models of computation are a fundamental part of theoretical computer science, since of course,

And it's fine to use different models! It's not fine to silently switch between them in an ad hoc way, which is what's necessary to make the exceptions that get you O(1) lookup for hashtables. I have never seen anyone expect the relevant caveats when quizzing coders for the correct answer on that question. That makes this part of computer science more like memorization or stamp collecting than a genuine mathematical model from which you consistently derive answers.

Aunche · 2025-02-10T23:59:35 1739231975

To be fair, in tree tables, we also assume that comparison is a constant time operation as well when it also necessarily needs to be O(log(n)) for an arbitrary sized table, making it an O(log(n)^2) data structure by this standard.

meribold · 2025-02-11T12:53:20 1739278400

You don't need to compare the whole key most of the time. I think the amortized cost for a comparison is in O(1).

sltkr · 2025-02-11T14:40:14 1739284814

Not in a balanced tree, since the further down the tree you go, the longer the shared prefix between the search key and node key becomes.

So the complexity becomes something like 1 + 2 + 3 + .. + log n = O(log^2 n).

meribold · 2025-02-11T15:58:55 1739289535

Good point. Perhaps we could store the length of the common prefix from the last time we went left as well as from the last time we went right. The minimum of those two should remain a common prefix with the search key for the rest of the lookup and should therefore be safe to skip during subsequent comparisons.

SilasX · 2025-02-12T17:59:08 1739383148

But that would actually help it because you can store how far down the match is by that point in the tree, and know how few digits you have to actually compare.

anothernewdude · 2025-02-11T03:23:36 1739244216

O(1) lookup is bullshit that ignores the memory hierarchy.