Hacker News new | past | comments | ask | show | jobs | submit login
Group actions and hashing unordered multisets (2021) (jeremykun.com)
67 points by KqAmJQ7 4 months ago | hide | past | favorite | 5 comments



Interesting, not often you see a non-associative variant of commutativity. It confused me for a bit that 'h' itself is not commutative, but summing arbitrary sequences is order independent provided you start with the same seed and sum from left to right.

Edit: Not sure the definition of phi is right though, once you have h(a, h*(T + S)) you're pretty much stuck since the commutativity doesn't allow you to rearrange things from that point. I think I understand the gist, you want to start accumulating from a different seed, except that h(a, h*(T)) is not the hash of T if you replace the seed with a. You'd need something like:

    h_s*({}) = s
    h_s*(T + {x}) = h_s*(h_s(T), h(x))
    phi(T) = (a => h_a*(T))
then commutativity could be written

    h_(h_s*({a}))({b}) = h_(h_s*({b}))({a})
which is slightly more symmetric, but maybe not better.


There’s a nice writeup on group hashes here: https://cronokirby.com/posts/2021/07/on_multi_set_hashing/

In particular, if you choose a group where discrete log is hard (such as prime order elliptic curves), multiset hashing falls out for free


I think there are a few errors here. First there is afaict no reason the image of phi has to break up into power-of-two cyclic groups.

Second and more importantly, it seems very difficult to start with the decomposition into cyclic groups and then choose a map from the multiset group into the permutation group that corresponds to the given decomposition in a good way.

Relatedly, the isomorphism between the image of phi (i.e., the action of accumulating hashes) and the decomposition into cyclic groups may be difficult to compute, which can make finding collisions infeasible for an attacker when they could do it easily if given the explicit representation.

So overall the conclusion that “you might as well make this forced structure explicit, and just pick the block structure you want to use in advance” seems incorrect.

The blog post someone linked on multiset hashing with elliptic curves proves the foregoing points. The cyclic groups do not have power-of-two orders and the group action is very complicated even though the description in terms of elliptic curve addition is quite simple.


The "neat result" article linked at the top has some of the missing math: https://kevinventullo.com/2018/12/24/hashing-unordered-sets-...

Restated, if abelian G acts transitively on a set X, X and G have the same size. There's a tacit assumption, then, that you want as many possible states as possible, which the group action result immediately belies.

I'm not sure the author of TFA really thought through the implications of the "block" stuff, all of the conclusions feel pretty uninspiring. The elliptic curve solution is just taking G to be cyclic with prime order (smaller than 2^n). This avoids some pathological behavior that power-of-two abelian groups give you for the multi-set use case - collision probabilities are sort of bunched up around power-of-two multiples, with some unlucky hashes having extremely low order and e.g. adding two of an element doubling the number of potential collisions.


I think they skip a few steps, but in this derivation im phi is exactly Z/2^nZ so every subgroup should have cardinality 2^k for some k.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: