Hacker News new | past | comments | ask | show | jobs | submit login
What do new Sudoku techniques teach us about real-world problem solving? (desystemize.substack.com)
244 points by Ariarule on March 31, 2022 | hide | past | favorite | 67 comments



I took an AI class back in college 16-17 years ago. In that class we had to solve a sudoku puzzle using multiple approaches to find the fastest approach (backtracking, backward propagation, forward propagation) for a few difficult puzzles.

I definitely don’t remember much of the content in any of my college courses. But I could reimplement those algorithms today without a problem. It’s amazing how “doing” really contributes to good memory retention. It was also one of the projects that sparked a fire in me. It really showed me the possibilities of computers and computer science.


I don't think I remember implementation detail of any algorithm I learned in AI class (for me, it's just 13 years ago). If I'm going to do it today, I'd have to re-read its concept again.

ps; all projects, I did it by myself, and my friends just sit watching and collecting grade.


I'd be interested in whether solving them via brute force would be faster or slower than any AI-based technique. A first glance at the problem would say no, but if you started in from each corner sequentially then it shouldn't be into the billions.


By AI-based do you mean "something with heuristics"? IIRC Algorithm X / Dancing Links is the main way to brute-force it quickly.


That's depressing - the sudoku craze was nearly 2 decades ago.


Why is that depressing?


Because I'm 20 years older?


It seems/feels rather recent…


Very true. Wisdom is knowledge applied.


Phistomefel’s Theorem (and the more general "set equivalence theory" described in the article) pops out directly from a standard technique known as the "Linear Programming relaxation" for Sudoku.

Essentially, the Linear Programming relaxation of a puzzle is a standard way of approximating the solution space with a system of linear equations and inequalities, replacing discrete yes/no answers to questions like "is the digit inside this box a 7?" with real numbers between 0 and 1 (which can be interpreted as probabilities, if you like). This system of linear inequalities and equations can then be solved efficiently with techniques from convex optimization.

Even the example from the Cracking the Cryptic video, with the conclusion that those three boxes at the bottom have to be 1, 2, and 3 in some order, would be deduced immediately from the Linear Programming relaxation of Sudoku. You don't need ontological remodeling when you know how to apply convex optimization :)


Do you know any sources on on how set equivalence theory is related to integer programming? Because I don't see the immediate connection and I could not find anything on the internet, but maybe I used the wrong terms.


I don't know any reference for this - it's something I worked out after hearing about Phistomefel's Theorem from a friend. The idea goes like this:

The standard Linear Programming relaxation of the constraint "the nine variables x_1, ..., x_9 are a permutation of 1, ..., 9" is defined in the following way. First we make real variables p_ij which we think of as representing the "probability" that x_i is equal to j. Each p_ij has to be between 0 and 1, of course, and they have to satisfy the following system of linear equations:

- for each i, the sum over j of p_ij is equal to 1 (since every x_i has to have some value), and

- for each j, the sum over i of p_ij is equal to 1 (since every value from 1, ..., 9 has to show up in the list of x_is somewhere).

The fact that this system of equations, together with the inequalities 0 <= p_ij <= 1, corresponds exactly to the "convex hull" of the constraint that the x_i are a permutation of 1, ..., 9 is a fairly famous early result from the theory of the Linear Programming relaxation of the bipartite matching problem.

To describe the full Linear Programming relaxation of Sudoku, instead of just having 9 variables x_i you have 81 variables x_ab, which leads to 729 real "probability" variables p_abj between 0 and 1, and for each row, column, and square of the Sudoku these probabilities have to satisfy the equations listed above (applied to the relevant variables). That gives you a system of 324 (slightly redundant) linear equations in 729 unknown real variables p_abj, each of which is constrained to be between 0 and 1 - a piece of cake for a computer.

For a human, you don't want to write down that entire system of equations - you want to use just a few of them to quickly figure out some piece of the puzzle. The technique of set equivalence theory does just this: instead of focusing on all of the probabilities p_abj, you just use the fact that the sum of the probabilities p_ab1 along every row/column is 1, and add/subtract the equations you get from some rows and columns to notice that the sum of the p_ab1s for the (ab)s corresponding to the corners is equal to the sum of the p_ab1s for the (ab)s corresponding to the ring around the center. Then you do the same for the 2s, and so on.


set equivalence is something a human can do while solving the puzzle, though, and forming overlapping sets and then removing the common elements is very satisfying when you hit on the right sets. (hitting on the right sets is hard, and needs a good feel for the specific constraints in the grid. mark and simon are, of course, very good at it.)


Agreed on all counts.


Cracking the Cryptic (the Youtube channel mentioned in the article) is a great channel to follow. Especially Simon's videos offer great insight into his thought process solving through some pretty hard sudokus (and variants). They've also taken on different puzzle games (e.g. The Witness, Baba is You), and it's absolutely fascinating to see how being very highly skilled in solving one type of puzzle does or does not translate into other classes of puzzle.


I occasionally watch Simon on Cracking the Cryptic and find it interesting how different his approach to solving Sudoku is to mine. I wouldn't even be able to start most of the puzzles he does, but nevertheless, I almost always find myself yelling at him in my head at some point where I see something so obvious to me that he overlooks.


Yeah, he's commented many times that his scanning is sometimes terrible (especially I notice when he's working with brand-new puzzle mechanics), but I remain super impressed at his ability to not only work out the complex logic, but do it (a) mostly in his head when there's no easy way to notate it on screen while (b) simultaneously explaining it to viewers in a very understandable fashion.


I would "yell at the screen" whenever he misses things. Then I realised that he still solves these puzzles much faster than I did, which makes me realise he probably just better than me at paying attention to the parts that are important to the solve.


It's been pretty amazing to watch the evolution of sudoku theory over even the last two years, some truly genius work by eg Phistomofel, Aad van de Wetering, Totally Normal Cat as well as the enormous contribution of Cracking the Cryptic


The author describes a process called "ontological remodeling", which is when a change in viewpoint radically simplifies a previously intractable problem. This is the story, not just of Sudoku, but all of mathematics.


3Blue1Brown has a few videos where he discusses and uses this technique to solve some complex problems. One of those was for a mathematical competition.

It's also common in audio manipulation, e.g. change to the frequency domain in order to modify pitch, then change back to the time domain.


> It's also common in audio manipulation, e.g. change to the frequency domain in order to modify pitch, then change back to the time domain.

Funnily enough, yes but actually no. For understanding and mathematical proofs the Fourier Transform is obviously essential. But when you first get into audio DSP programming it might seem that the FFT is crucial as well. But virtually all digital audio filters directly operate on the on the time domain.

The problem is that we always work with a sampled signal. And while the Nyquist theorem tells us that as long as our sample frequency is at least twice as high as the highest frequency in our input that our sampling doesn't lose any information, we still have to be aware of it.

When you convert a fixed frequency sampled signal into the frequency domain, you get a sum of a fixed set (co)sine functions, because we're still sampled. Now some operations you can do in this frequency domain exactly and simply. E.g. if 100Hz is part of our fixed set of functions and we wish to subtract a 100Hz signal, we can directly do that on our coefficient. A pitch shift by an exact multiple of the frequency sample delta is possible exactly too, assuming the lowest/highest frequencies shifted off are inaudible.

But you almost never want to do these operations. A classic example of something you might want to do is a low-pass filter. Simple right? FFT to frequency domain, zero out the coefficients above the cutoff frequency, and convert back. No! Zeroing out coefficients is equivalent to subtracting those specific sine waves. But that is not a low-pass. As an example, suppose our sample frequency is such that the FFT's sine wave coefficients are 10Hz apart, and we wish to do a 30Hz low-pass filter. This means that a 25Hz signal should be completely unaffected, but a 25Hz signal can't be represented just using the 10Hz and 20Hz coefficients - you need higher order terms! Thus if we zero those out, we distort our 25Hz signal.

So to solve this in the real-world you get into the difficult topic of filter design, or other audio DSP algorithms that are vastly more involved than a simple FFT.


Is there any scenario where the discrepancy between FFT and its analog cousin cannot be resolved by upping the sampling density? Distortion is always there, whether it comes from an imprecise sensor/instrument or from extrapolating sampled data. In the audio space, nothing above 20 kHz is audible anyways, so even a bog-standard 44.1 kHz sampling rate should do "good enough" for most DSP operations there.


Transients. FFT is time-symmetrical, so signals that start because of an event (like a mallet hitting something) have a lot of "nothing audible "above 20kHz", because of the abrupt start. Forcing nyquist limit on such signal alway causes pre-ringing, that breaks causality - you have sound starting before the event happened.

(edit) e.g. bandwidth limited signal with only a single non-zero sample does not represent a rectangular function, but a sinc.

So, FFT is a lie. But very useful one.


> Distortion is always there, whether it comes from an imprecise sensor/instrument or from extrapolating sampled data.

Specifically regarding the latter part of "extrapolating sampled data", I would highly recommend watching this video: https://xiph.org/video/vid2.shtml. As long as your input signal is low-passed to below 22kHz, the 44.1kHz sampling is perfect. No information is lost, no distortion.

I however am not qualified enough to tell you how the naive FFT filter approach changes in distortion as you raise the sampling frequency.


I'm just using simple IIR filters because it's one float of storage and one line of code, but I'm curious: which filtering algorithms are you talking about?


That's almost the beauty of a lot of audio DSP filters. They can be 1-10 lines of code, but to understand why they work and how to create similar ones with properties you want you need a heavy math background and a 500+ page book (e.g. https://www.native-instruments.com/fileadmin/ni_media/downlo...).

Other complicated DSP algorithms I've read about (but haven't come close to fully grokking) are:

- Band-limited oscillators. Ask yourself, how do you generate a square wave signal? It's "just" 1 for t seconds and -1 for t seconds, repeating, right? But what if t isn't a multiple of our sampling frequency? Also, a square wave can be interpreted as an infinite sum of ever increasing frequency sine waves. But Nyquist tells us we can't have those higher frequencies. So even if t was a multiple of our sampling frequency, it still wouldn't be right to have t/f samples of 1 followed by t/f samples of -1. But just taking the first N terms of the infinite sum (those below Nyquist) is slow, so you get things like polyblep: https://www.kvraudio.com/forum/viewtopic.php?t=375517.

- Resampling. So a sampled bandlimited signal is perfectly represented by the samples through the Whittaker–Shannon interpolation formula. To resample we simply have to pick equidistant samples using the interpolation formula. However this is slow (and global, so not real-time), so there's a bunch of techniques to speed this up with minimal distortion: http://ldesoras.free.fr/doc/articles/resampler-en.pdf

- Pitch shift. This seems innocuous enough, right? But suppose it were easy to shift pitch. Then we could resample our audio to 2x the frequency, shift the pitch up by an octave and play back at the original speed. Now we've made our audio twice as long without changing the pitch! Pitch shifting and timescaling are two sides of the same coin. And I don't understand either side. Fundamentally I don't really understand what it means to make an audio signal "longer", without changing the pitch, at the waveform level. But people do it anyway: https://en.wikipedia.org/wiki/Audio_time_stretching_and_pitc...

- The rabbit hole goes on...


Excellent explanation. Thank you!


That's fascinating. Thanks for the information.


Since you mention it, 3b1b has a splendid video juxtaposing two attitudes to problem solving via an example here: https://www.youtube.com/watch?v=ltLUadnCyi0 . I highly recommend watching it, I think it is one of his best.


Chapman's writings on the subject are worth a read:

https://metarationality.com/remodeling


> "ontological remodeling", which is when a change in viewpoint radically simplifies a previously intractable problem. This is the story, not just of Sudoku, but all of mathematics.

it's such a powerful tool that the only problem it leaves is, what benefit is there to calling it ontological remodeling instead of "a change in viewpoint"?


Indeed. After reading the title of the post, I thought it may be about the Cook-Levin theorem, which is about how all NP problems (of which Sudoku is one) can be reduced to one another.


"Ontological remodeling" is a lovely term. I think it's ubiquitous actually, but another nice example is the puzzle about tiling a chessboard with dominoes when the board is missing two opposite corners. Can you do it? If so, how? If not, why not?

Btw is the footnote a joke? I don't really get it:

The sum of the digits 1 to 9 is 45[1]

[1] This is a secret that Simon only tells his closest friends.


Using the 'secret' is a common way of deriving further digits in a Sudoku - if you know the sum of digits in a row or a box has to be X, then the missing digits must sum to 45-X. In the 'Cracking The Cryptic' videos, Simon is always careful to explain this, and always prefaces it with a warning that he only tells it to his closest friends (all 400k of them).


It's a joke. The numerical value of the digits has no meaning. There is no addition in sudoku.


There is plenty of addition in modern sudoku, as visible in the example shown in the linked article featuring killer cages.


Also, using a similar trick as with the dominoes, can a certain Legend of Zelda puzzle be solved?

https://gazj.substack.com/p/python-and-the-legend-of-zelda?s...

Article doesn't contain a mathematical proof (only a brute force one), but I wrote one up. Spoilers: https://news.ycombinator.com/item?id=30639211


That's a cool puzzle! It's the same as the Bridges of Königsberg, right? If each square is a node, connected by edges to adjacent squares, then it's only solvable if there are at most two squares with an odd number of edges. Think about it like this: except for the squares where you start and finish, you must use one edge to enter and another to leave.

I guess the Zelda puzzle is different because in Königsberg you can revisit islands, just not recross bridges. But that feels like something you can finesse somehow. . . . Ah, just swap nodes & edges, right? Squares : bridges :: sides : islands. Indeed, that lines up not just the restriction but the goal too.

EDIT: Oh your second link is a much nicer solution. But still it feels like there is a relationship to Königsberg.


The Zelda puzzle isn't the same as Königsberg because in Königsberg the goal is to traverse every edge, while the Zelda puzzle doesn't care about that. Think about the rows of tiles along the edges of the room - the Zelda room has dozens of tiles with 3 edges each, but it's no problem walking across them exactly once. You enter each tile from one edge and leave by a second, and it doesn't matter if you never traverse the third edge.

Swapping nodes and edges doesn't work, because many of the Zelda nodes have 3 or 4 edges, but an edge is defined as 2 endpoints. It doesn't make sense to talk about an edge with 3 or 4 ends.

The proof of non-solution to the Zelda puzzle is a simple checkerboard argument. The room's dimensions are 13 x 9, both odd, so all the corners are the same color (call it black) and there is one more black square than white. And the prize square replaces a white square. So there are two more black squares than white squares, making the problem unsolvable, since you must always alternate visiting white and black squares. The statues are a red herring - there are two on each color and so they don't affect this proof.


Great article.


"A secret I only tell my friends" is how Simon usually introduces 1 to 9 sum to 45 when he needs that fact to explain his train of thought in Cracking the Cryptic videos.


Yes, it’s a joke. Simon discloses “the secret” in almost every video, so the joke is that it’s always presented as mysterious insider knowledge even though it’s clearly not.


> a chessboard with dominoes when the board is missing two opposite corners.

A tricker version asks what square remains (unique up to symmetry) when covering a chessboard with 21 trominoes, each of which covers 3 adjacent board squares, i.e. 1x3 or 3x1.


What does missing two opposite corners mean? Opposite to each other? Opposite to the player?


Opposite to each other. Imagine a chess board, but with the bottom-left corner and the top-right corner removed.


Either A1 and H8 or A8 and H1


The extrapolated concept is great for life in general. Quite literally use different vocabulary to reframe an existing challenge to force a shift in perspective and thereby arrive at a different problem.

A quite simple observation by the article on how perhaps to approach challenging life issues as well as say mathematical ones. As shifting perspectives when in the trenches of complex life challenges is really hard.


The article I link at the beginning of this one, Representation and Uncertainty, is about exactly this - how to think about the extrapolated concept of representation for life in general. Self-promo I know but I think anyone who resonates with your comment would really enjoy it: https://desystemize.substack.com/p/representation-and-uncert...


That wasn’t skimmable, lol. But a striking point no-less.

Put another way, a lot of the economic advantages of “problem solving” in big tech in particular is so dangerously devoid (or “bankrupt” as you state) of suitable representation that the framing leads to the dangerous social precipice we now find ourselves in.

It’s questions without representation, with lowly symbolism if any at all.

Eg “connecting people” is a useless solution without the correct representation.

I hope I didn’t totally miss your point :)


For anyone following along the article, it should be "Row 4, Row 7, Box 4, and Box 6" where it says "Row 4, Row 7, Box 4, and Box 7". Somebody please correct me if I'm misunderstanding.


You are quite right, it's a typo in the article.


So why is it row 4 and not row 3?


If I'm not mistaken, you can in fact get the same digit three times in the ostrich head.

EDIT: I misread the article — it said you can't place the same digit more than three times.

Such a deep dive on ontological remodeling that I found myself starved for oxygen about the time palindromic lines came up.

Phistomofel’s Theorem though blew my mind. I'm still trying to convince myself it is legit. (Probably where I started to think about heading back to the surface.)


The article says "No digit can appear in the ostrich head _more than_ three times."


I don’t even care what the “real-world” implications are; this is just an awesome way to think about Sudoku, and number sets.


Makes me wonder, if we were able to make a fun puzzle out of training machine learning models by hand, would we soon find better training algorithms?


Here is another experiment where you can hand-tune the weights of a neural network for a regression problem: https://asdf10.com/nn.html


Here's a fun experiment along those lines: https://koaning.github.io/human-learn/ It's not a game, but it is investigating how having a human in the loop can improve ML models.


I am not familiar with the set theory field, or whatever it's called more exactly. The language in the article, from a layman point, is vague, and I think also uses many words that have somewhat a different meaning to their mainstream one that it makes it pretty much impossible to read for me.

Maybe this is where a science journalist could do a good job! I'm thinking Vi Hart for example.


https://www.youtube.com/watch?v=e9_FkcNAZcA and scroll to 9:20 if you want to watch simon explain phistomefel's theorem


Thanks, a good and patient explanation!


This feels like it’s related to abstraction but not exactly the same thing. Can’t really put my finger on it.


It strikes me as a kind of modularisation. There's a constraint on the solution that involves dozens of rows, columns and boxes in combination; but you can treat combinations of that style as a reusable module, like "center ring and corners".

I'm not sure that's an accurate way of thinking of it, but if so it would make it similar to Rubik's Cube moves, where you string dozens of individual low-level moves together to build the desired effect.

Edit to add: hmm, thinking about it, many Cube moves have the form A-B-A', where you get into position with A, apply the key move B, then back out of A again. The Sudoku equivalent might be applying a grid permutation A that preserves the constraints, solving cell B, then backing out of A again. But that's probably not a convenient way for human solvers to think about it, as you can't physically permute a Sudoku like you can with a Cube.


I'd love to read something similar about nonogram/picross


setter: noun. One who does set theory.


Is this really a new technique? Dividing a puzzle up into a different set of regions that can still be reasoned about is something I’ve done occasionally while solving them, and I’ve definitely done it while designing and generating puzzles because it can really help cut down the degrees of freedom.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: