Hacker News new | past | comments | ask | show | jobs | submit login
Hans Peter Luhn and the Birth of the Hashing Algorithm (ieee.org)
126 points by sohkamyung on Jan 31, 2018 | hide | past | favorite | 28 comments



Hashing functions are cool and all, but let's talk about that groovy "Cocktail Oracle" he invented. It's basically a clever physical representation of bitwise arithmetic.

The oracle works with bit strings all of the same length. Each bit position represents a single cocktail.

Each ingredient card is a bit string. Each bit is 1 (opaque) if the ingredient is needed for the cocktail at that bit position.

You take all the ingredients you have and discard their bitstrings ("turn down their cards"). Then you or together the remaining bitstrings of all of the ingredients you don't have (look through their overlapping cards). In other words, not having an ingredient forbids certain cocktails. Any bit positions that remain zero (transparent all the way through to the key card on the back showing the names of the cocktails at each position) are cocktails that you have enough ingredients to make.

Using opacity to represent bitwise "or" is pretty clever. Turning down the ingredient cards you don't have is really clever. I kind of want to make one of these now.



I mean this is really about set operations; thinking of it in terms of bits and integers doesn't really add anything.


The mechanics do matter because it's fundamentally a mechanical device.


What about it involves thinking of elements as bits and sets as whole numbers? I'm not seeing it. Everything you just described is just a physical representation of sets and operations on sets. I don't see any step where elements are mapped to powers of 2 and added up (unless you for some reason insist on thinking that way of aggregating elements into a set), nor any step where a number is broken into bits (again, unless you for some reason insist on thinking that way of breaking a set into its elements).


> What about it involves thinking of elements as bits and sets as whole numbers?

I said bitwise arithmetic, not binary numbers. :)

> Everything you just described is just a physical representation of sets and operations on sets.

Sure, and how is the set represented and its operations implemented? In the Oracle, each set is represented by a bitmask with a bit position for each element in the set. Set operations are implemented as bitwise operations on those bitmasks.


I’m more used to bits than sets, so it worked for me. People have different backgrounds.


It's a real shame they don't talk more specifically about Luhn's role in the 'record addressing' problem.

Back in 1953, IBM was investigating disk storage for record retrieval in large volumes of data. Seek/read time on these machines was on the order of a second, so a 10 level binary search could take around 10 seconds of total time. Unfortunately for IBM, this wasn't an improvement over the best human-driven manual lookup processes... which is an issue if you're trying to sell/lease expensive computer hardware.

With this as motivation, Luhn then essentially invented what we now would know as a hash table, which was key to the 1956 RAMAC release.


>"Seek/read time on these machines was on the order of a second, so a 10 level binary search could take around 10 seconds of total time. Unfortunately for IBM, this wasn't an improvement over the best human-driven manual lookup processes"

I am curious what human-driven manual lookup processes existed that could be in a single second?


A tub file:

https://en.wikipedia.org/wiki/Tub_file

(Note that the overall search takes 10 seconds, not the one second of an individual seek... so the human had a bit more time to be competitive with the disk than a single second. :-) )


I have no information as to what the parent is referring to but 2^10 is 1024. That seems roughly in line with what a double-page spread of a phone book could show you; I don't think it's unreasonable that a trained human could search through one of those in under 10 seconds.


Always amazing to realize that at some point in the past everything we know was unknown, and someone had to figure it out the first time. For everything we do today we stand on the shoulders of giants.


Sometimes we do, but a lot of things get invented again, by a different giant (or midget) who didn't have time to read all the existing literature.


This is my biggest gripe with hardcore self-made/individualist evangelists. Pretty much everything anyone has ever achieved was because of the hard work and determination of thousands if not millions of geniuses and craftspeople whose work enabled the future.


That's true, but there's also something unique about being able to take an existing idea and turn it into something relevant and marketable. Look at everything Xerox did to advance the state of the art, and compare that with their inability to turn any of it into viable products. As much as it's easy to say "Xerox messed up", that diminishes the value added by the likes of Jobs and Gates when they turned those ideas into something valuable for the masses. (Which is there the money is, among other things.)

I'm not saying it's necessarily fair, just that there's value in that 'last mile' between a long line of development and the end user of an idea or product.


I don't think you observation is in contradiction to my anecdote. I appreciate the last milers as much as anyone.


Or on top of piles of corpses.

If you look around you at various safety systems, they were almost all paid for in souls.

The New London School and oderised natural gas is one example I've learnt of in the past year or so. But that's only one of very, very many.


I have a strange recollection playing with Linux text processing command line tools in the 90s and stumbling upon something that looks like the "concordance" output and having no idea why it might be useful, nor any way to find out. Now I know! (And I can finally fix a dangling reference in my brain.)


>Much like a culinary hash of corned beef and potatoes, a hash algorithm chops and mixes up data in various ways

Perfectly obvious in hindsight but I never made the connection.


It would be interesting to read about how he came up with that specific checksum algorithm. Trial and error?


Should be: Hans Peter Luhn and the Birth of the Checksum Algorithm

Ctrl+F the article for "hash". What Luhn invented was a checksum, one that is apparently still used today in IMEIs (among other things), but the article says nothing about him furthering checksums into hash functions (and there is quite a distinction). Sure, a hash function can hardly be invented without going through the checksum stage, but it's like saying the person who discovered there's oil in the ground invented cars just because he laid the foundation.


> What Luhn invented was a checksum,

Agreed there is a distinction, but he invented hashing too.

Citations:

https://mitpress.mit.edu/books/ibms-early-computers

https://en.wikipedia.org/wiki/Hash_table


Luhn's checksum can discover any single digit error, but there's a stronger checksum algorithm that can also discover any transposition:

https://en.wikipedia.org/wiki/Verhoeff_algorithm


Reading a bit further into the article, I believe it does. Look for "buckets". Of course, it also touches the checksum problem, with it being explicitly called "checksum", not "hash".


Aren't checksum functions suitable hash functions though?


They are. I suspect GP is confused with cryptographic hash functions which are a strict subset.


Handkerchief in the suit pocket, tie, cool haircut. When engineers knew how to dress.


From the article:

> Within this nascent computer world, Luhn cut an unusual figure. An elegant dresser throughout his life, Luhn knew more about the textiles industry than computer science when he arrived at IBM in 1941.

So it might be more the exception than the rule there as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: