We're looking for a lossy compression function that maps strings of length 23 (answers) to strings of length 12 (bits of information in 5400 min). We seek to minimize the number of errors, aka the Hamming distance. Under the Hamming distance metric, the space of strings forms a hypercube where two strings are adjacent if they are related by a bit flip (one error).[1]
This gives an elegant visual interpretation: we're searching for a way to partition the hypercube of 23-bit strings into 2^12 balls[2] containing 2^11 elements each.
What is the minimum radius of the balls? There is 1 element in the center of the ball, 23C1=23 elements at distance 1, 23C2=253 elements at distance 2, and 23C3=1771 at distance 3, which sums to precisely 2^11. This combinatorial coincidence makes it possible to build a beautifully symmetric 23->12 compression function!
In order to avoid overlap, the centers of any two balls must be 7 bits apart. One idea to evenly space the centers: write all possible first 12 bits (1 bit distance), then distribute the remaining bits 6 bits apart.
We can also see that inverting the compression function gives an error-correcting code going 12->23, mapping 12-bit inputs to optimally-spaced 23-bit strings. There is a duality between lossy compression (aka rate-distortion theory[3]) and error correction.
As it turns out, the optimal 12->23 error-correcting code is the perfect binary Golay code discovered in 1949.[4] The inverse of the Golay code is the compression function we're looking for.
A bit of historical trivia: Golay's 1949 paper was reviewed by Berlekamp, who in 1974 called it the "best single published page" in coding theory. At the time, Berlekamp was working as a code breaker with Jim Simons at the Institute for Defense Analyses. Later, Berlekamp would help Simons found Renaissance Technologies, which remains today the most successful quant hedge fund in history.[6] Renaissance was famous, of course, for hiring many of the best minds in string theory.
We're looking for a lossy compression function that maps strings of length 23 (answers) to strings of length 12 (bits of information in 5400 min). We seek to minimize the number of errors, aka the Hamming distance. Under the Hamming distance metric, the space of strings forms a hypercube where two strings are adjacent if they are related by a bit flip (one error).[1]
This gives an elegant visual interpretation: we're searching for a way to partition the hypercube of 23-bit strings into 2^12 balls[2] containing 2^11 elements each.
What is the minimum radius of the balls? There is 1 element in the center of the ball, 23C1=23 elements at distance 1, 23C2=253 elements at distance 2, and 23C3=1771 at distance 3, which sums to precisely 2^11. This combinatorial coincidence makes it possible to build a beautifully symmetric 23->12 compression function!
In order to avoid overlap, the centers of any two balls must be 7 bits apart. One idea to evenly space the centers: write all possible first 12 bits (1 bit distance), then distribute the remaining bits 6 bits apart.
We can also see that inverting the compression function gives an error-correcting code going 12->23, mapping 12-bit inputs to optimally-spaced 23-bit strings. There is a duality between lossy compression (aka rate-distortion theory[3]) and error correction.
As it turns out, the optimal 12->23 error-correcting code is the perfect binary Golay code discovered in 1949.[4] The inverse of the Golay code is the compression function we're looking for.
A bit of historical trivia: Golay's 1949 paper was reviewed by Berlekamp, who in 1974 called it the "best single published page" in coding theory. At the time, Berlekamp was working as a code breaker with Jim Simons at the Institute for Defense Analyses. Later, Berlekamp would help Simons found Renaissance Technologies, which remains today the most successful quant hedge fund in history.[6] Renaissance was famous, of course, for hiring many of the best minds in string theory.
[1] https://en.wikipedia.org/wiki/Hamming_distance
[2] https://en.wikipedia.org/wiki/Ball_(mathematics)
[3] https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory
[4] https://en.wikipedia.org/wiki/Binary_Golay_code
[5] https://en.wikipedia.org/wiki/Hamming_bound
[6] https://en.wikipedia.org/wiki/More_Money_Than_God