Can someone explain why this is cool and what it’s used for?

tommiegannert · 2024-05-05T06:44:55 1714891495

It seems to be a tool that generates instructions for building hash functions, then evaluates the hash functions to see how good they are. The goal metric has been chosen to be that as many output bits as possible should change as randomly as possible with every changed input bit.

It outputs C code for the best hash function it generated.

So it's useful if you want a hash function, and don't think any of the existing ones are good enough. Or if you're researching hash functions, and want new ideas for structures.

Cool? Well, generating code is cool. Doing it randomly is a first step towards genetic programming, which would be even cooler. ... and making computers burn CPU cycles computing (mostly unused) hashes seems to be something we humans want to do since around 15 years ago.

eru · 2024-05-05T08:54:54 1714899294

> [...] making computers burn CPU cycles computing (mostly unused) hashes seems to be something we humans want to do since around 15 years ago.

Keep in mind that there's a big difference between cryptographic hash functions and the kind of hash functions investigated here.

baq · 2024-05-05T10:29:10 1714904950

Yeah notably examples are given with their inverse functions. I’m pretty sure you don’t want that in a crypto hash ;)

keepamovin · 2024-05-05T14:07:09 1714918029

Haha! :) No, not overall, but you do want reversible steps in a crypto hash as Reversible steps make better mixing functions, Because you’re not losing information so you avoid iteratively diminishing that information leading to a smaller state space, and more state collisions.

AgentOrange1234 · 2024-05-05T12:59:06 1714913946

These functions are essential to a hash tables (other related names are hash maps, hash sets).

A hash table is a remarkable data structure that can be used to implement many algorithms simply and efficiently.

This efficiency depends on being able to create small (say 32- or 64-bit) and nearly-unique “hashes” for your data. For example, to hash user names, it wouldn’t work very well to just use the ASCII code for the first letter of the names, because then a lot of usernames would map to the same number. That’s called a collision. If there are a lot of collisions, hash tables become very inefficient.

A better approach would be to grab bits from the entire username, and smash them together somehow so that “throwaway_1237” and “throwaway_12373” still username different numbers.

Hash functions do this mapping. The property of “avalanche” describes how good of a job they do at avoiding collisions.

There’s generally a tradeoff between (1) how fast the actual hash function is, and (2) how good of a job it does of avoiding collisions.

World-class hash functions look remarkably weird, multiplying and xoring and shifting by strange amounts. It’s very hard for humans to look at these cryptic functions and guess how well they will do.

So this code is trying out a bunch of hash functions at random and baking them off against each other. This is cool because success here could improve real-world performance of a core data structure used across many languages and libraries.

masklinn · 2024-05-05T07:50:50 1714895450

Hash functions on integers, so if you want a fast integer hash for a set or map. If the functions diverge sufficiently it also provides fast hashes for a bloom filter.

alegeaa · 2024-05-05T06:36:06 1714890966

I am asking myself the same question.