Pick 5400 23-bit vectors (the answers you will write) and your friend picks the one that minimizes its Hamming distance to the true answer.
5400*23 + 5400*23*22/2 < 2**23
so answering 21 questions with certainty is impossible. (Pick the true answers so as to maximize the Hamming distance between it and whatever answer you decide to write, regardless of the scheme used.)
And there is a size 4096 < 5400 covering code [1] so answering 20 question is possible (by picking the 23-bit vectors in that code).
Sadly, I can only give the Google answer to this one:
> Working on a branch of physics called supersymmetry, Dr. James Gates Jr., discovered what he describes as the presence of what appear to resemble a form of computer code, called error correcting codes, embedded within, or resulting from, the equations of supersymmetry that describe fundamental particles.
I don't remember the details, but the possible transitions between particles have the same pattern as the valid codes in a Reed-Solomon code. So any particle is possible, but the transitions seem to have been error-corrected? I remember thinking it was a little flimsy at the time.
This was a neat brain-teaser (I took an information theory class years ago so it was a nice refresh), but the answer ended up being a lot less interesting than I had hoped :P
Expanding on this, though, you can improve the average result against an arbitrarily-powerful test-writer by first XOR'ing the test answers with a pre-shared value (key); the resulting value is random (even to very powerful test-writers), so e.g. encoding possible answers into the unused values improves your expected result (i.e. data rate.)
The naive approach (which is what I did) is to assign each second to a binary number and work out how high the number goes - this works out at 2^12.
The clever way other people did - and which I've been thinking about all morning, is:
The complete answer to the test is a set of true/false answers that we map to 23-bit vectors.
We know that there 2^23 vectors, which is way more than the 90*60=5400 ways we have to identify each of them.
Pretty sure we're all clear on that.
The hard part comes when we start saying that we can let some of the bits be wrong. This lets us say that some of the 2^23 23-bit vectors are pretty much the same as some others. I think of it like grouping vectors together into groups that are all related to each other by flipping up to 3 bits.
So let's take an example with 4-bit vectors instead because I find it more intuitive. This is a test with 4 true/false answers. Our naive method would make us think we need 2^4=16 seconds to correctly identify true/false answers. But if we look at what happens when we say one of those answers can be wrong, it gets interesting. Let's look at the first 4-bit vector:
[0 0 0 0]
Now we can let any bit be wrong:
[0 0 0 0]
[0 0 0 1]
[0 0 1 0]
[0 1 0 0]
[1 0 0 0]
...and we have FIVE possible vectors identified by the SAME second. Just by looking at the first vector we've reduced our space from 2^16 to 2^16-5. And remember we can group all of our 4-bit vectors together like this.
If you let 2 answers be wrong you have larger, fewer groups.
If you let 3 answers be wrong you only need one bit of information! Your cheating friend gets up on second 1 or second 2.
This is my intuitive way of looking at it anyway. For the actual maths and such look at other answers...
The table basically says it's possible to pick a set of 4096 (2^12) different vectors of T/F combinations in a way that you only need flip up to 3 (R=3) of the T/F answers to cover all 2^23 (n=23) possible solutions to the quiz. Thus you only need 68 minutes and 16 seconds (4096 seconds) in order to guarantee you get at least 20 answers correct.
You can use the remaining 1304 seconds to encode some more vectors to increase the possibility of getting more answers correct, but you cannot guarantee 21 or more correct answers in the allotted time. The lower & upper bound (not sure what this part means) for n=23,R=2 is 30686-32768 about 9 hours!
Very simple example with n=3. I have listed all the vectors with Hamming distance of 1 or less (R=1). Looking at the table, we see that for n=3, R=1, we can cover all the vectors with 2 different codes. Heads and tails for example. It's easy to see which combinations we can pick to guarantee coverage, a&h, b&g, c&f, d&e. Using any of those combinations and some way to signal to the other person 'heads' or 'tails' will guarantee they get at least 2 answers correct.
a TTT <= TTT TTF TFT FTT
b TTF <= TTT TTF TFF FTF
c TFT <= TTT TFT TFF FFT
d FTT <= TTT FTT FTF FFT
e TFF <= TTF TFT TFF FFF
f FTF <= TTF FTT FTF FFF
g FFT <= TFT FTT FFT FFF
h FFF <= TFF FTF FFT FFF
We always have to be wrong, but we can bunch up all the ways we can be wrong so they're always close to being right. Each way to be right will have a few ways to be wrong as neighbours, so even if we're wrong, we aren't wrong by much.
The test is a list of 23 true/false questions, so you can think of each question as a bit.
You've prearranged a list of 5400 unique 23-bit vectors, so he determines the bit vector with the most correct answers (ie. smallest Hamming distance) and leaves at whatever time offset corresponds to that bit pattern.
It's not a perfect scheme since you only have 5400 possible bit vectors, and 2^23 > 5400.
The game is you're trying to guess a long word 23 letters long and your buddy can only tell you a number below 10K, although you are allowed to share dictionaries...
You ask your buddy what the correct word is, and a dictionary holding all 23 letter combinations would be incredibly long, however in English there's not really all that many words, so your buddy can simply tell you the right word is on page 2352 of the dictionary and the word you pick might not be it, but it'll be close enough.
I tried to avoid using 5 yr old words to explain EE telecom coding, which isn't going to help much, so this is more "in the spirit of the problem" than being an exact analogy.
I think this attempt is closest to the "five years old" level:
So you have 23 questions, and on each you can either answer yes or no. You goal is that your friend "transmits" you some information so that you get the most of the answers right. As the problem is set up, what he can transmit to you is just an exact second of a 90 minutes span, that is one number between 0 and 5399.
Having 23 questions, every with 0 or 1, gives 23 bits of information to pass. 4096 is 2 to 12 th power, so the friend can pass you only 12 complete bits.
Now the major point: you're lucky that you don't care which answers exactly you get right. You need 23 bits passed to get the exact 100% right solution. But you don't care. It turns out, if you cleverly write on the paper 4096 different "solutions" before the test and your friend transmits you which one of these 4096 you should take to fill out the form, the mentioned cleverness can guarantee you that you get N answers right, you just don't know which specifically in advance and you don't care as you prepare these 4096 solutions. The friend must also know the same table, and has to select one of the prepared 4096 solutions only once he discovers what the "100%" solution is.
The remaining question was how many answers you can be sure you can get right (N), and it turns out, N is 20.
Why? If you carefully construct each of these 4096 solutions to be "as different" from all the other 4095 solutions as possible, it can be proven that then 20 is the result.
So you have to take to the test a list with 4096 longer binary numbers, carefully prepared, your fried can transmit you which one to write through the time of his leaving of the test, and you'll have 20 answers right. Or you can train yourself to perform a special algorithm to produce a pattern of 23 answers from the time and the friend to do the opposite.
The math details are, thanks to allenz, here (not on "five years old" level): https://en.wikipedia.org/wiki/Binary_Golay_code There are also some algorithms. The 20 in the solution can be figured out from the sentence "G23 is a perfect code. That is, the spheres of radius three around code words form a partition of the vector space." Which for our task means that not more than 3 answers will be "wrong" from 23 answers known to your friend, if the whole trick is performed correctly, therefore 23-3=20.
And of course, this kind of cleverness (but used for error correction, that is, to allow sending more data than minimally necessary, but where some of the data is lost before it reaches the other side and everything still works) is actually built-in in the modern communication and data storage equipment, but we, typically, just reap the benefits and don't think of it.
On another side, human languages also evolved to contain redundancies, so the evolution is very capable to find the acceptable solutions for the message encoding too.
--
1) This way you haven't used all 5400 possible values, but now your can split the remaining thousand-something seconds between you and your friend, for him to prepare the transmission, and for you to fill the form after you received the code but before the test is over.
I'm not sure if you're looking for an ELI5 of the question or the answer. I'll explain the answer since others already explained the question.
You'll only have the time at which your friend leaves (in seconds since the start of the exam). Knowing only this, whatever you two scheme, you'll have to write down your answers (from a scheme you both agreed on beforehand like "leaving after 324 seconds means answer true for the first 7 questions and false for the rest", "leaving after 325 seconds means answer false everywhere", ...) once your friend leaves. So really, you're just a transcriber. During the exam, its your friend who decide what you're going to write down by picking the right second to leave at.
Of course, they'll pick the second which makes you write as many correct answers as possible. (And of course, you'll want to pick the best scheme to help with that. But let's see what you can't do, no matter the scheme, first.)
Why can't you just always get all answers right? Because there are too many possible set of answers (2^23 for 23 questions) and your friend can only choose from the 90 * 60 = 5400 you two agreed on before the start of the exam.
In fact, you can't even be sure of getting 22 questions right. For each second (from 0 to 5399), there are only 23 set of true answers to the exam (among 2^23) for which leaving at that second gets you exactly one answer wrong (namely, one of getting each of the questions wrong). So 5400 + 5400 * 23 is an upper on the number of sets of true answers for which you'll get at most one answer wrong (using whatever scheme). Since 5400 + 5400 * 23 is still (much) less than 2^23, there will be many set of true answers to the exam where your friend doesn't have a choice but to make you write down many wrong answers.
The same goes for getting 21 answers right. For each second, there are 23 * 22 / 2 set of true answers for which leaving the exam at that time gets you exactly two questions wrong. (Pick the first question to get wrong, pick the second question to get wrong, divide by 2 because there are two ways to pick these two questions.)
5400 + 5400 * 23 + (5400 * 23 * 22 / 2)
is still two small. So there are bound to be sets of true answers where you'll get more questions wrong.
It turns out that getting 20 answers right is possible. Now you have to actually pick a scheme. We can't just make the same calculation and add
5400 * 23 * 22 * 21 / (2 * 3)
(which is now bigger than 2^23) because that's only an upper bound. Each second from 0 to 2399 "covers"
1 + 23 + 23 * 22 / 2 + 23 * 22 * 21 / (2 * 3)
set of true answers (so if those are the true answers to the test, your scheme works). But if for some set of true answers is covered by more than one of the seconds (it happens when there are two different times at which your friend can leave and give you at most 3 answers wrong), it means that some other set of true answers might not be uncovered.
In your scheme, the only thing that matters is which 5400 set of answers your friend can make you write (swapping which answer goes with which second doesn't change anything).
As it happens, people have made tables of the minimum and maximum number of seconds needed to cover all 2^23 possibilities for getting at most 3 answers wrong. And its 4096 seconds which is less than 5400. In fact, the table contain ranges for different total number of questions and different number of questions that you can get wrong.
(I think there's something more interesting about this particular scheme that gets you 4096 but I haven't read all the other comments yet and haven't thought about this questions more since.)
Sure, if you only need to specify one out of 4096 sets of answers by leaving the room at a certain time (accurate to the second) then you have 5400-4096 = 1304s = 12.7 min of time left over. Which means your friend needs a slightly less superhuman ability to solve string theory problems.
A simple approach that gives 13 correct answers minimum is to have him walk out after X seconds, where X is the number of true answers. Answer true for everything if X is greater than 12. Otherwise answer false for everything.
This uses only 6 bits of information, and you should be able to pack some extra info rather easily into the remaining 7.
Thinking about alternative approaches leads me to think that the only cases that you should focus on to solve this are those where the number of true/false is almost equal. Figure out how to pack them together, layer a few solutions together, and you should be able to answer the exam to within some N guaranteed.
EDIT: 12 correct answers. not 13. And you can easily pack this into 1 bit. Leaving my wrong answer for posterity.
You can actually transmit this information in a single bit: just have your friend send a 0 if you should answer false on everything or 1 if you should answer true on everything. To use the other 11 bits, transmit the first 11 answers first, and then encode whether to guess true or false on all of the remaining 12 questions in the last bit. This method gives 11 + (23-11)/2 = 17 answers correct in the worst case, but (according to the author) this is still suboptimal.
Doesn't your method only guarantee 12 answers -- half of 23 (rounded up)?
That can be reduced to a single bit:
If you should mark 'true' on all the answers, he leaves during the first half of the exam; if you should mark 'false' on all the answers, he leaves during the second part of the exam. This gets you at least 12 correct answers in 1 bit.
This leaves you with 11 bits to modify the pattern with to improve your score.
You can get at least 17 by using the 1 bit to express preference for just the first 12 questions, and the remaining 11 bits the answers to the last 11 questions exactly.
I've been trying to think if there's a clever scheme to spread the knowledge around to beat 17 with statistical tricks.
(As a fun aside: 17/23 is about 73%, which typically gets you a C -- so you'd pass the exam.)
I wrote a simple program to test all possible majority-of-group breakdowns for generic number of questions and bitlengths, and found that the optimal result in all cases for this class was to use all but one bit for individual questions and then use that one bit for majority of the rest.
You can probably get better bit usage by doing some sort of more complex error code correction scheme--imagine having overlapping groups that you know majority vote on. It's been a decade since I've touched that stuff, so I don't know any concrete numbers here.
Which answer do you use the extra bit on that's guaranteed to give you one more correct?
For any you choose, there are answer patterns where you already counted that answer being correct in your guaranteed number as part of the 2/3rds correct.
So I don't think you can (trivially) use that bit to guarantee another correct answer.
More practically (and specifically ruled out in the question): you can see when your friend "starts" the exam (picks up his pencil) and when he ends (gets up from his seat). Also it probably takes him at least 10 minutes to read the whole test and figure out the answers.
This makes it easy mode: friend takes 10 minutes to read test and get answers in his head, start the timer (0 seconds) at the 10 minute mark. Friend "starts" sometime in the next 34 minutes. 34 minutes is 2048 seconds or 11 bits of info. Then "finishes" between 44 minutes and 78 minutes: another 34 minute window and another 11 bits of info. 22 of 23 questions answered. Friend coughs on way out if last answer is true. Boom, all questions answered.
Yes, OP said we can't do this, but it is more practical than memorizing 5400 bit vectors, doesn't assume the friend can instantly finish the test, gives the friend some lead time to actually do some binary conversions, and ensure the correct times down to the second.
Presumably the extra credit is that the binary Golay code is closely related to the Leech Lattice, and thus to the entire moonshine situation, which gets you to string theory. See:
Hey, I'm a mathematician, not a physicist. We tend only to get as far as "oh yeah, here's all this pretty math we care about. I hear it has some applications to physics..."
Sure. Moonshine I know a lot about, relative to physics at least. Given that I now work somewhere that specializes in term embeddings, it kind of feels like I've spent half my life injecting stuff into vector spaces...
Unless I am misunderstanding something, all the answers here seem to assume that the friend is either able finish instantly, or to leave before he is done. That isn't stated in the question, so I don't think it's valid to assume you can use all of 90 minutes * 60 seconds to transmit information.
"a fraction of the time" isn't a very clear problem statement, as 9999999/1000000 is a fraction, but that's certainly not what is meant by the common sense understanding of this phrase.
I am wondering what we can do without clarifying the statement further.
We can still encode 90*60 23-bit vectors and have the friend pick the one, in the remaining time after he finishes, that minimizes the distance to the true answer, but without knowing how early he finishes. That is probably still the way you can get the best score on average, but I don't think it does better for the guaranteed number of correct answers than just transmitting 1 bit of information (which guarantees at least 12 correct answers).
Would it help to use odd vs even seconds to indicate whether the fiend finished early enough to use the clock to indicate an arbitrary time vs the time at which he should have left was already gone when he finished his exam? I can't think of how to use this (or a similar) signaling to improve the worse case odds.
On the other hand, while I cannot think of a way to improve your worse case situation, you should be able to improve average score by sorting your 5400 23-bit vectors so that the distances between the last ones are as large as possible.
I'm thinking create set of 90 * 60 = 5400 different True False patterns. Pick the one that gets the most answers correct. I'm assuming there is some algorithm with a name I don't know that can select the optimal set of patterns. Then the minimum they get correct in any situation probably has something to do with shannon entropy ...
We're looking for a lossy compression function that maps strings of length 23 (answers) to strings of length 12 (bits of information in 5400 min). We seek to minimize the number of errors, aka the Hamming distance. Under the Hamming distance metric, the space of strings forms a hypercube where two strings are adjacent if they are related by a bit flip (one error).[1]
This gives an elegant visual interpretation: we're searching for a way to partition the hypercube of 23-bit strings into 2^12 balls[2] containing 2^11 elements each.
What is the minimum radius of the balls? There is 1 element in the center of the ball, 23C1=23 elements at distance 1, 23C2=253 elements at distance 2, and 23C3=1771 at distance 3, which sums to precisely 2^11. This combinatorial coincidence makes it possible to build a beautifully symmetric 23->12 compression function!
In order to avoid overlap, the centers of any two balls must be 7 bits apart. One idea to evenly space the centers: write all possible first 12 bits (1 bit distance), then distribute the remaining bits 6 bits apart.
We can also see that inverting the compression function gives an error-correcting code going 12->23, mapping 12-bit inputs to optimally-spaced 23-bit strings. There is a duality between lossy compression (aka rate-distortion theory[3]) and error correction.
As it turns out, the optimal 12->23 error-correcting code is the perfect binary Golay code discovered in 1949.[4] The inverse of the Golay code is the compression function we're looking for.
A bit of historical trivia: Golay's 1949 paper was reviewed by Berlekamp, who in 1974 called it the "best single published page" in coding theory. At the time, Berlekamp was working as a code breaker with Jim Simons at the Institute for Defense Analyses. Later, Berlekamp would help Simons found Renaissance Technologies, which remains today the most successful quant hedge fund in history.[6] Renaissance was famous, of course, for hiring many of the best minds in string theory.
Similar to the lottery problem and other covering problems.
Let's say lottery has N numbers. Tickets contain K numbers. What is the least amount of tickets M that you need to buy so you guarantee when the winning ticket is pulled that you matched at least R numbers on that winning ticket with your pool of tickets? LP(N, K, R) = M.
LP(N, K, K) = (N choose K), is winning the lottery and for that to be sure we need to buy all tickets.
LP(N, K, 2) is solved, I believe, for many values (theoretically).
> answer at least N out of the 23 questions correctly
This represents S(N) = \sum_{i=N}^23 \binom{23}{i} acceptable states of correct answers out of 2^23 all possible states, which requires -log_2(S(N)/2^23) bits of self-information transmittable with a code of alphabet size of 5400. Therefore the largest N that satisfies this is 20.
Actually you can't. There is no prior information about the distribution of correct answers given in the question so everything is uniform distribution and there is nothing to set up "pre-arranged" entropy encoding with.
The trick (to get around the information theoretic limit of 12) is to be uncertain about which of the answers are correct.
For example, with 1 bit, you can transmit if the majority are 1 or 0, already getting you at least 11 right answers but no information about which ones are right.
This is really cool, since I'm currently working on a way to transmit extremely small amounts of data (~30 bits) in a situation where you cannot use any RF waves.
Of course, it's banned here but the simplest thing (for the person taking the test) would be to take a short amount of time - or a long amount of time - to fill in each bubble. A short scribble = false, a long scribble = true. To thwart the proctors you could swap the keys every other question, such that for question 1 short=false, long=true; question 2 short=true, long=false, etc etc.
Of course, for any of these answers the proctors could trivially catch you cheating. The best way to prevent this would be applying a simple one-time pad to them - exchange a small number beforehand and use that.
Shouldn't the friend also communicate the time he actually finished the exam (which can be different depending on the difficulty of the questions)? In other words, it took him 23 minutes to answer, left after 53, which means that the information about the answers is encoded in "30" (53-23). If it would have taken him 17 min to solve the questions, he would have left after 17+30=47 min.
This also means that the more difficult the questions, the less information he can provide (because the "time" allocated to coding the answers is smaller (as in smaller keyspace)
I rephrased the question slightly for clarity after you posted this: You're looking for the largest value N such that you can guarantee that you get at least N questions correct. Since log(5400)/log(2) is only ~12.3987, your approach only counts as 12 correct answers.
We can start with the prior that there will probably be a similar number of trues and falses. We are mapping 23 bits of state into 12 bits, we can leave out those states where T>>F or vice versa. If the test turns out to be one of those unexpected states, pick the answer that gives the best score.
Edit: Rereading, it looks like we're optimizing the worst case, rather than average. So we're looking for a 12:23 map where at most n bits are inaccurate, minimizing n. I'm sure there's a signals alg that does this...
log_2 of 60*90 is about 12.4. This shows that a naive scheme would expect to get about 12.4 + (23 - 12.4)/2 = 17.7 questions right. I would be very skeptical about any scheme that purports to guarantee better than this. A reasonable approach would be trade the possibility of doing better than this with the possibility of doing worse.
In fact, I can guarantee 17 correct questions, with only 11 bits which has the same floor for expected value (11 + (12/2) = 17):
Divide the first 18 questions into 6 groups of three.
Use the first 6 bits of time to indicate the majority in each group of three, giving you 12 guaranteed answers. The last 5 questions can be encoded directly.
It seems like there should be room for improvement, as this does very well on each group in the 1/4 time that each is uniform. I also worry that the average here is 12 + 1.5 + 5 > 17.7.
Perhaps overlapping the groups could help, which starts to look like an LDPC code, but with max, rather than parity. The difficulty is that overlapping bits can no longer guarantee more, because the previous ones could have already done so.
Getting 12 out of 23 should get him the average on the lucky assumption the teacher was not a joker that put all answers higher than bit 6 to false ...
The friend could easily submit far more answers than 23 if need be. He just needs to go to the toilet.
Step 1: 10 bits information - First 10 answers as seconds from start of exam. After x seconds, friend goes to toilet.
Step 2: Restart counter
Step 3: On return from the toilet, take time taken in toilet. You may want to limit this to 6 bits of information / ~ 1 minute in the toilet. Anything longer might be suspicious.
Step 4: Restart counter
Step 5: Only 7 bits of information to hand over to get 23 questions. Count time till he leaves.
Step 6: Repeat toilet-going process for more answers in different situations.
There should be no suspicions, as the answer-receiver will not go to the toilet.
In order to understand the solution, we need to first understand basics of coding theory. Unlike envelopes when we send messages over a wire, or through air (wireless) it is possible that the messages we send get corrupted. Specifically if we are sending “Hello” it can end up becoming “Hgllo” where e became g because of some electric/electromagnetic distortion. If we typed incorrectly using our mobile keyboards it would have auto corrected “Hgllo” to “Hello”. This is possible, because our natural languages have a lot of redundant information. Otherwise if we lost one word in a conversation whole sentence would not make sense.
Coding theory is about how do we systematically put redundant information so that computers can correct for errors. We are only going to talk about one type of error, where a letter gets replaced by another letter. If ABCDEF becomes GBCDEH ; here two letters are wrong, and we say they have a distance of 2. What will happen if we simply repeat each letter? Message ABC becomes AABBCC. Now if a single error happens GABBCC we know original message got corrupted. We still can’t figure out what the original message is. It could be either ABC or GBC. So a better technique would be to repeat each letter thrice. Message ABC becomes AAABBBCCC. Now if a single error happens GAABBBCCC, then we can say if there is only one error then the correct message is ABC by taking majority. Note that this cannot correct for two errors because if you see GGABBBCCC then you will think real message is GBC.
What if the message has 4 letters, ABCD becomes AAABBBCCCDDD? This scheme requires you to send 12 letters just to correct 1 error in 4 letters. We denote this by saying [12, 4, 1] — where 12 is the coding distance, 4 is the original message length, and 1 is the number of letters we can correct for.
For now let us temporarily restrict our focus to binary letters (0, 1). This is because in computers we communicate purely using 0s and 1s. For example letter A would be represented by a number 65, which would be represented as 0100 0001 an 8-bit message, or simply called a byte.
It turns out if we want to send 4 bits, then using 12 bits for it is actually not very efficient. You could do better. In fact just 7 bits is enough to represent 4 bits where you can correct for a single error. This is called the hamming code.
Let us understand why our repeating scheme works in more detail. Consider any two valid code sequence, for example AAABBBCCCDDD and AAAAEEECCCDDD, even though only one character changed in the input, 3 letters changed in the output. So if we make sure distance between any two coded sentences is 3 then we can correct for one error.
There are 16 possible words using 4 bits (0000, 0001, 0010, … 1111). If we can assign code words for each of them such that distance between any two is 3, then we can say that it can correct for 1 error. Hamming code essentially achieves it. If are encoding ABCD, then first 4 bits is the same as the the input viz ABCD, 5th bit is A + B + D; 6th bit is A + C + D, and 7th bit is B + C + D. Of course sum like 3 will be replaced by 1 again by taking remainder to 2. Such codes are also called linear codes, where each bit can be written as linear combination of input bits. You can read chapter 5 of Jiri Matousek’s 33 Miniatures to see a quick proof on why hamming code is indeed a valid code using a bit of linear algebra. Or you can simply write down 16 of those codes, and check every two combination and make sure they differ in at least three places. (You will need to compare 120 pairs though). http://kam.mff.cuni.cz/~matousek/stml-53-matousek-1.pdf
What if we want to correct for more than 1 error? Golay code is one such code. G23 and G24 are two codes that encodes 12 bits in 23 and 24 bits respectively. In the first case hamming distance is 7 and in second case it is 8. There is no reason to have a distance 8, distance 7 is enough because
Imagine two codes valid codes (x0) and (x7) which are 7 distances apart. Then using 7 changes you can reach from x0 to x7.
(x0) — x1 — x2 — x3* — x4* — x5 — x6 — (x7)
Even if (x0) makes 3 errors, and (x7) makes 3 errors you will only end up with (x3) or (x4). And never get confused. So if we want to correct for k errors then hamming distance between any two code words must be 2k + 1 or higher.
It turns out that G23 is not just a good code, it is a “perfect” code. Meaning if we want to send 12 bits, with 3 bits of correction, then you must need 23 bits.
Now back to our problem, you have 23 answers in front of you, assume it is G23 code and decode back to 12 bits. You get up at a time to represent these 12 bits. We know that when your friend encodes this 12 bits back to G23 code then at most 3 bits will be out of place. In other words you friend will get at most 3 answers wrong. Hence getting at least 20 of them correct.
Now relationship with String theory requires a bigger write up. But essentially it is because the internal structure of field theory on some kind of donut shaped universe, the type of transformations you can do there will look something like the type of transformations you can do on the Golay codes.
And there is a size 4096 < 5400 covering code [1] so answering 20 question is possible (by picking the 23-bit vectors in that code).
[1] see n = 23, R = 3 here http://www.sztaki.hu/~keri/codes/2_tables.pdf
Edit: formatting