What do new Sudoku techniques teach us about real-world problem solving?

ryeguy_24 · on March 31, 2022

I took an AI class back in college 16-17 years ago. In that class we had to solve a sudoku puzzle using multiple approaches to find the fastest approach (backtracking, backward propagation, forward propagation) for a few difficult puzzles.

I definitely don’t remember much of the content in any of my college courses. But I could reimplement those algorithms today without a problem. It’s amazing how “doing” really contributes to good memory retention. It was also one of the projects that sparked a fire in me. It really showed me the possibilities of computers and computer science.

Existenceblinks · on March 31, 2022

I don't think I remember implementation detail of any algorithm I learned in AI class (for me, it's just 13 years ago). If I'm going to do it today, I'd have to re-read its concept again.

ps; all projects, I did it by myself, and my friends just sit watching and collecting grade.

sandworm101 · on March 31, 2022

I'd be interested in whether solving them via brute force would be faster or slower than any AI-based technique. A first glance at the problem would say no, but if you started in from each corner sequentially then it shouldn't be into the billions.

finnh · on March 31, 2022

By AI-based do you mean "something with heuristics"? IIRC Algorithm X / Dancing Links is the main way to brute-force it quickly.

iso1210 · on March 31, 2022

That's depressing - the sudoku craze was nearly 2 decades ago.

selestify · on March 31, 2022

Why is that depressing?

Ensorceled · on March 31, 2022

Because I'm 20 years older?

techer · on March 31, 2022

It seems/feels rather recent…

snarf21 · on March 31, 2022

Very true. Wisdom is knowledge applied.

cevi · on March 31, 2022

Phistomefel’s Theorem (and the more general "set equivalence theory" described in the article) pops out directly from a standard technique known as the "Linear Programming relaxation" for Sudoku.

Essentially, the Linear Programming relaxation of a puzzle is a standard way of approximating the solution space with a system of linear equations and inequalities, replacing discrete yes/no answers to questions like "is the digit inside this box a 7?" with real numbers between 0 and 1 (which can be interpreted as probabilities, if you like). This system of linear inequalities and equations can then be solved efficiently with techniques from convex optimization.

Even the example from the Cracking the Cryptic video, with the conclusion that those three boxes at the bottom have to be 1, 2, and 3 in some order, would be deduced immediately from the Linear Programming relaxation of Sudoku. You don't need ontological remodeling when you know how to apply convex optimization :)

sischoel · on March 31, 2022

Do you know any sources on on how set equivalence theory is related to integer programming? Because I don't see the immediate connection and I could not find anything on the internet, but maybe I used the wrong terms.

cevi · on April 1, 2022

I don't know any reference for this - it's something I worked out after hearing about Phistomefel's Theorem from a friend. The idea goes like this:

The standard Linear Programming relaxation of the constraint "the nine variables x_1, ..., x_9 are a permutation of 1, ..., 9" is defined in the following way. First we make real variables p_ij which we think of as representing the "probability" that x_i is equal to j. Each p_ij has to be between 0 and 1, of course, and they have to satisfy the following system of linear equations:

- for each i, the sum over j of p_ij is equal to 1 (since every x_i has to have some value), and

- for each j, the sum over i of p_ij is equal to 1 (since every value from 1, ..., 9 has to show up in the list of x_is somewhere).

The fact that this system of equations, together with the inequalities 0 <= p_ij <= 1, corresponds exactly to the "convex hull" of the constraint that the x_i are a permutation of 1, ..., 9 is a fairly famous early result from the theory of the Linear Programming relaxation of the bipartite matching problem.

To describe the full Linear Programming relaxation of Sudoku, instead of just having 9 variables x_i you have 81 variables x_ab, which leads to 729 real "probability" variables p_abj between 0 and 1, and for each row, column, and square of the Sudoku these probabilities have to satisfy the equations listed above (applied to the relevant variables). That gives you a system of 324 (slightly redundant) linear equations in 729 unknown real variables p_abj, each of which is constrained to be between 0 and 1 - a piece of cake for a computer.

For a human, you don't want to write down that entire system of equations - you want to use just a few of them to quickly figure out some piece of the puzzle. The technique of set equivalence theory does just this: instead of focusing on all of the probabilities p_abj, you just use the fact that the sum of the probabilities p_ab1 along every row/column is 1, and add/subtract the equations you get from some rows and columns to notice that the sum of the p_ab1s for the (ab)s corresponding to the corners is equal to the sum of the p_ab1s for the (ab)s corresponding to the ring around the center. Then you do the same for the 2s, and so on.

zem · on March 31, 2022

set equivalence is something a human can do while solving the puzzle, though, and forming overlapping sets and then removing the common elements is very satisfying when you hit on the right sets. (hitting on the right sets is hard, and needs a good feel for the specific constraints in the grid. mark and simon are, of course, very good at it.)

cevi · on April 1, 2022

Agreed on all counts.

pdpi · on March 31, 2022

Cracking the Cryptic (the Youtube channel mentioned in the article) is a great channel to follow. Especially Simon's videos offer great insight into his thought process solving through some pretty hard sudokus (and variants). They've also taken on different puzzle games (e.g. The Witness, Baba is You), and it's absolutely fascinating to see how being very highly skilled in solving one type of puzzle does or does not translate into other classes of puzzle.

jamincan · on March 31, 2022

I occasionally watch Simon on Cracking the Cryptic and find it interesting how different his approach to solving Sudoku is to mine. I wouldn't even be able to start most of the puzzles he does, but nevertheless, I almost always find myself yelling at him in my head at some point where I see something so obvious to me that he overlooks.

jamie_ca · on March 31, 2022

Yeah, he's commented many times that his scanning is sometimes terrible (especially I notice when he's working with brand-new puzzle mechanics), but I remain super impressed at his ability to not only work out the complex logic, but do it (a) mostly in his head when there's no easy way to notate it on screen while (b) simultaneously explaining it to viewers in a very understandable fashion.

yen223 · on March 31, 2022

I would "yell at the screen" whenever he misses things. Then I realised that he still solves these puzzles much faster than I did, which makes me realise he probably just better than me at paying attention to the parts that are important to the solve.

nwbort · on March 31, 2022

It's been pretty amazing to watch the evolution of sudoku theory over even the last two years, some truly genius work by eg Phistomofel, Aad van de Wetering, Totally Normal Cat as well as the enormous contribution of Cracking the Cryptic

markisus · on March 31, 2022

The author describes a process called "ontological remodeling", which is when a change in viewpoint radically simplifies a previously intractable problem. This is the story, not just of Sudoku, but all of mathematics.

rhdunn · on March 31, 2022

3Blue1Brown has a few videos where he discusses and uses this technique to solve some complex problems. One of those was for a mathematical competition.

It's also common in audio manipulation, e.g. change to the frequency domain in order to modify pitch, then change back to the time domain.

orlp · on March 31, 2022

> It's also common in audio manipulation, e.g. change to the frequency domain in order to modify pitch, then change back to the time domain.

Funnily enough, yes but actually no. For understanding and mathematical proofs the Fourier Transform is obviously essential. But when you first get into audio DSP programming it might seem that the FFT is crucial as well. But virtually all digital audio filters directly operate on the on the time domain.

The problem is that we always work with a sampled signal. And while the Nyquist theorem tells us that as long as our sample frequency is at least twice as high as the highest frequency in our input that our sampling doesn't lose any information, we still have to be aware of it.

When you convert a fixed frequency sampled signal into the frequency domain, you get a sum of a fixed set (co)sine functions, because we're still sampled. Now some operations you can do in this frequency domain exactly and simply. E.g. if 100Hz is part of our fixed set of functions and we wish to subtract a 100Hz signal, we can directly do that on our coefficient. A pitch shift by an exact multiple of the frequency sample delta is possible exactly too, assuming the lowest/highest frequencies shifted off are inaudible.

But you almost never want to do these operations. A classic example of something you might want to do is a low-pass filter. Simple right? FFT to frequency domain, zero out the coefficients above the cutoff frequency, and convert back. No! Zeroing out coefficients is equivalent to subtracting those specific sine waves. But that is not a low-pass. As an example, suppose our sample frequency is such that the FFT's sine wave coefficients are 10Hz apart, and we wish to do a 30Hz low-pass filter. This means that a 25Hz signal should be completely unaffected, but a 25Hz signal can't be represented just using the 10Hz and 20Hz coefficients - you need higher order terms! Thus if we zero those out, we distort our 25Hz signal.

So to solve this in the real-world you get into the difficult topic of filter design, or other audio DSP algorithms that are vastly more involved than a simple FFT.

10000truths · on March 31, 2022

Is there any scenario where the discrepancy between FFT and its analog cousin cannot be resolved by upping the sampling density? Distortion is always there, whether it comes from an imprecise sensor/instrument or from extrapolating sampled data. In the audio space, nothing above 20 kHz is audible anyways, so even a bog-standard 44.1 kHz sampling rate should do "good enough" for most DSP operations there.

112233 · on April 1, 2022

Transients. FFT is time-symmetrical, so signals that start because of an event (like a mallet hitting something) have a lot of "nothing audible "above 20kHz", because of the abrupt start. Forcing nyquist limit on such signal alway causes pre-ringing, that breaks causality - you have sound starting before the event happened.

(edit) e.g. bandwidth limited signal with only a single non-zero sample does not represent a rectangular function, but a sinc.

So, FFT is a lie. But very useful one.

orlp · on March 31, 2022

> Distortion is always there, whether it comes from an imprecise sensor/instrument or from extrapolating sampled data.

Specifically regarding the latter part of "extrapolating sampled data", I would highly recommend watching this video: https://xiph.org/video/vid2.shtml. As long as your input signal is low-passed to below 22kHz, the 44.1kHz sampling is perfect. No information is lost, no distortion.

I however am not qualified enough to tell you how the naive FFT filter approach changes in distortion as you raise the sampling frequency.

skybrian · on March 31, 2022

I'm just using simple IIR filters because it's one float of storage and one line of code, but I'm curious: which filtering algorithms are you talking about?

orlp · on March 31, 2022

That's almost the beauty of a lot of audio DSP filters. They can be 1-10 lines of code, but to understand why they work and how to create similar ones with properties you want you need a heavy math background and a 500+ page book (e.g. https://www.native-instruments.com/fileadmin/ni_media/downlo...).

Other complicated DSP algorithms I've read about (but haven't come close to fully grokking) are:

- Band-limited oscillators. Ask yourself, how do you generate a square wave signal? It's "just" 1 for t seconds and -1 for t seconds, repeating, right? But what if t isn't a multiple of our sampling frequency? Also, a square wave can be interpreted as an infinite sum of ever increasing frequency sine waves. But Nyquist tells us we can't have those higher frequencies. So even if t was a multiple of our sampling frequency, it still wouldn't be right to have t/f samples of 1 followed by t/f samples of -1. But just taking the first N terms of the infinite sum (those below Nyquist) is slow, so you get things like polyblep: https://www.kvraudio.com/forum/viewtopic.php?t=375517.

- Resampling. So a sampled bandlimited signal is perfectly represented by the samples through the Whittaker–Shannon interpolation formula. To resample we simply have to pick equidistant samples using the interpolation formula. However this is slow (and global, so not real-time), so there's a bunch of techniques to speed this up with minimal distortion: http://ldesoras.free.fr/doc/articles/resampler-en.pdf

- Pitch shift. This seems innocuous enough, right? But suppose it were easy to shift pitch. Then we could resample our audio to 2x the frequency, shift the pitch up by an octave and play back at the original speed. Now we've made our audio twice as long without changing the pitch! Pitch shifting and timescaling are two sides of the same coin. And I don't understand either side. Fundamentally I don't really understand what it means to make an audio signal "longer", without changing the pitch, at the waveform level. But people do it anyway: https://en.wikipedia.org/wiki/Audio_time_stretching_and_pitc...

- The rabbit hole goes on...

sriram_malhar · on April 1, 2022

Excellent explanation. Thank you!

rhdunn · on March 31, 2022

That's fascinating. Thanks for the information.

ykonstant · on March 31, 2022

Since you mention it, 3b1b has a splendid video juxtaposing two attitudes to problem solving via an example here: https://www.youtube.com/watch?v=ltLUadnCyi0 . I highly recommend watching it, I think it is one of his best.

barathr · on March 31, 2022

Chapman's writings on the subject are worth a read:

https://metarationality.com/remodeling

fsckboy · on March 31, 2022

> "ontological remodeling", which is when a change in viewpoint radically simplifies a previously intractable problem. This is the story, not just of Sudoku, but all of mathematics.

it's such a powerful tool that the only problem it leaves is, what benefit is there to calling it ontological remodeling instead of "a change in viewpoint"?

marmshallow · on March 31, 2022

Indeed. After reading the title of the post, I thought it may be about the Cook-Levin theorem, which is about how all NP problems (of which Sudoku is one) can be reduced to one another.

pjungwir · on March 31, 2022

"Ontological remodeling" is a lovely term. I think it's ubiquitous actually, but another nice example is the puzzle about tiling a chessboard with dominoes when the board is missing two opposite corners. Can you do it? If so, how? If not, why not?

Btw is the footnote a joke? I don't really get it:

The sum of the digits 1 to 9 is 45[1]

[1] This is a secret that Simon only tells his closest friends.

gravenimage · on March 31, 2022

Using the 'secret' is a common way of deriving further digits in a Sudoku - if you know the sum of digits in a row or a box has to be X, then the missing digits must sum to 45-X. In the 'Cracking The Cryptic' videos, Simon is always careful to explain this, and always prefaces it with a warning that he only tells it to his closest friends (all 400k of them).

lupire · on April 1, 2022

It's a joke. The numerical value of the digits has no meaning. There is no addition in sudoku.

sellyme · on April 1, 2022

There is plenty of addition in modern sudoku, as visible in the example shown in the linked article featuring killer cages.

WithinReason · on March 31, 2022

Also, using a similar trick as with the dominoes, can a certain Legend of Zelda puzzle be solved?

https://gazj.substack.com/p/python-and-the-legend-of-zelda?s...

Article doesn't contain a mathematical proof (only a brute force one), but I wrote one up. Spoilers: https://news.ycombinator.com/item?id=30639211

pjungwir · on March 31, 2022

That's a cool puzzle! It's the same as the Bridges of Königsberg, right? If each square is a node, connected by edges to adjacent squares, then it's only solvable if there are at most two squares with an odd number of edges. Think about it like this: except for the squares where you start and finish, you must use one edge to enter and another to leave.

I guess the Zelda puzzle is different because in Königsberg you can revisit islands, just not recross bridges. But that feels like something you can finesse somehow. . . . Ah, just swap nodes & edges, right? Squares : bridges :: sides : islands. Indeed, that lines up not just the restriction but the goal too.

EDIT: Oh your second link is a much nicer solution. But still it feels like there is a relationship to Königsberg.

vikingerik · on March 31, 2022

The Zelda puzzle isn't the same as Königsberg because in Königsberg the goal is to traverse every edge, while the Zelda puzzle doesn't care about that. Think about the rows of tiles along the edges of the room - the Zelda room has dozens of tiles with 3 edges each, but it's no problem walking across them exactly once. You enter each tile from one edge and leave by a second, and it doesn't matter if you never traverse the third edge.

Swapping nodes and edges doesn't work, because many of the Zelda nodes have 3 or 4 edges, but an edge is defined as 2 endpoints. It doesn't make sense to talk about an edge with 3 or 4 ends.

The proof of non-solution to the Zelda puzzle is a simple checkerboard argument. The room's dimensions are 13 x 9, both odd, so all the corners are the same color (call it black) and there is one more black square than white. And the prize square replaces a white square. So there are two more black squares than white squares, making the problem unsolvable, since you must always alternate visiting white and black squares. The statues are a red herring - there are two on each color and so they don't affect this proof.

alanbernstein · on March 31, 2022

Great article.

twiceaday · on March 31, 2022

"A secret I only tell my friends" is how Simon usually introduces 1 to 9 sum to 45 when he needs that fact to explain his train of thought in Cracking the Cryptic videos.

tomstuart · on March 31, 2022

Yes, it’s a joke. Simon discloses “the secret” in almost every video, so the joke is that it’s always presented as mysterious insider knowledge even though it’s clearly not.

tromp · on March 31, 2022

> a chessboard with dominoes when the board is missing two opposite corners.

A tricker version asks what square remains (unique up to symmetry) when covering a chessboard with 21 trominoes, each of which covers 3 adjacent board squares, i.e. 1x3 or 3x1.

brador · on March 31, 2022

What does missing two opposite corners mean? Opposite to each other? Opposite to the player?

creata · on March 31, 2022

Opposite to each other. Imagine a chess board, but with the bottom-left corner and the top-right corner removed.

scaredginger · on March 31, 2022

Either A1 and H8 or A8 and H1

shisisms · on March 31, 2022

The extrapolated concept is great for life in general. Quite literally use different vocabulary to reframe an existing challenge to force a shift in perspective and thereby arrive at a different problem.

A quite simple observation by the article on how perhaps to approach challenging life issues as well as say mathematical ones. As shifting perspectives when in the trenches of complex life challenges is really hard.

CollinLysford · on March 31, 2022

The article I link at the beginning of this one, Representation and Uncertainty, is about exactly this - how to think about the extrapolated concept of representation for life in general. Self-promo I know but I think anyone who resonates with your comment would really enjoy it: https://desystemize.substack.com/p/representation-and-uncert...

shisisms · on March 31, 2022

That wasn’t skimmable, lol. But a striking point no-less.

Put another way, a lot of the economic advantages of “problem solving” in big tech in particular is so dangerously devoid (or “bankrupt” as you state) of suitable representation that the framing leads to the dangerous social precipice we now find ourselves in.

It’s questions without representation, with lowly symbolism if any at all.

Eg “connecting people” is a useless solution without the correct representation.

I hope I didn’t totally miss your point :)

buzzdenver · on March 31, 2022

For anyone following along the article, it should be "Row 4, Row 7, Box 4, and Box 6" where it says "Row 4, Row 7, Box 4, and Box 7". Somebody please correct me if I'm misunderstanding.

milliams · on March 31, 2022

You are quite right, it's a typo in the article.

Isamu · on March 31, 2022

So why is it row 4 and not row 3?

JKCalhoun · on March 31, 2022

If I'm not mistaken, you can in fact get the same digit three times in the ostrich head.

EDIT: I misread the article — it said you can't place the same digit more than three times.

Such a deep dive on ontological remodeling that I found myself starved for oxygen about the time palindromic lines came up.

Phistomofel’s Theorem though blew my mind. I'm still trying to convince myself it is legit. (Probably where I started to think about heading back to the surface.)

RyanCavanaugh · on March 31, 2022

The article says "No digit can appear in the ostrich head _more than_ three times."

escapedmoose · on March 31, 2022

I don’t even care what the “real-world” implications are; this is just an awesome way to think about Sudoku, and number sets.

Buttons840 · on March 31, 2022

Makes me wonder, if we were able to make a fun puzzle out of training machine learning models by hand, would we soon find better training algorithms?

conv · on March 31, 2022

Here is another experiment where you can hand-tune the weights of a neural network for a regression problem: https://asdf10.com/nn.html

milliams · on March 31, 2022

Here's a fun experiment along those lines: https://koaning.github.io/human-learn/ It's not a game, but it is investigating how having a human in the loop can improve ML models.

Gravityloss · on March 31, 2022

I am not familiar with the set theory field, or whatever it's called more exactly. The language in the article, from a layman point, is vague, and I think also uses many words that have somewhat a different meaning to their mainstream one that it makes it pretty much impossible to read for me.

Maybe this is where a science journalist could do a good job! I'm thinking Vi Hart for example.

zem · on March 31, 2022

https://www.youtube.com/watch?v=e9_FkcNAZcA and scroll to 9:20 if you want to watch simon explain phistomefel's theorem

Gravityloss · on March 31, 2022

Thanks, a good and patient explanation!

jcims · on March 31, 2022

This feels like it’s related to abstraction but not exactly the same thing. Can’t really put my finger on it.

iainmerrick · on March 31, 2022

It strikes me as a kind of modularisation. There's a constraint on the solution that involves dozens of rows, columns and boxes in combination; but you can treat combinations of that style as a reusable module, like "center ring and corners".

I'm not sure that's an accurate way of thinking of it, but if so it would make it similar to Rubik's Cube moves, where you string dozens of individual low-level moves together to build the desired effect.

Edit to add: hmm, thinking about it, many Cube moves have the form A-B-A', where you get into position with A, apply the key move B, then back out of A again. The Sudoku equivalent might be applying a grid permutation A that preserves the constraints, solving cell B, then backing out of A again. But that's probably not a convenient way for human solvers to think about it, as you can't physically permute a Sudoku like you can with a Cube.

haunter · on March 31, 2022

I'd love to read something similar about nonogram/picross

imglorp · on March 31, 2022

setter: noun. One who does set theory.

aardvark179 · on March 31, 2022

Is this really a new technique? Dividing a puzzle up into a different set of regions that can still be reasoned about is something I’ve done occasionally while solving them, and I’ve definitely done it while designing and generating puzzles because it can really help cut down the degrees of freedom.