Can Neural Networks Crack Sudoku?

timdellinger · on June 12, 2017

To me, this answers the question "How well can neural networks crack sudoku if we forget all of our domain knowledge, and just try to brute force our way through with insufficient training data and a neural network that's poorly sized?".

I suggest an alternate approach that would use a training set consisting of real-world data of sudoku puzzles that are in the process of being solved: before a box is filled in, and then after it's filled in. This would teach the network to solve the puzzle one step at a time, instead of all at once.

If you insist on only using the as-received puzzle and the solution for training, my intuition is that you'll need more layers than 10 in the NN, and much much more data.

A taxonomy of what complex solving strategies need to be learned already exists: see e.g. http://www.sudokuwiki.org/y_wing_strategy and the categories "basic", "tough", diabolical", and "extreme". My guess is that the NN (using the original author's strategy, but with more layers and lots more data) will eventually do well on "basic" and "tough", but will start to miss on some of the latter conditions depending on the training set and how the network is set up.

theptip · on June 12, 2017

To me, it's an interesting question to ask how well NNs can perform without hand-holding, i.e. can they figure out all of the complex strategies that humans have figured out, starting from a blank slate?

C.f. AlphaGo, which invented new strategies that proved interesting to human players.

Of course, if you're training your NN to solve a real-world problem, you wouldn't choose to train it in this way, you would do as you suggested and feed it as much relevant training data as possible, to give it a head-start.

(Your points RE: sizing and quantity of training data sound reasonable and are out of my expertise, so nothings to add there).

PaulHoule · on June 12, 2017

There is a very simple strategy that solves all Sudoku puzzles.

Pick one open square, try a number that is possible for that, backtrack if you get stuck.

For better results, pick the open square which has the fewest number of possible choices.

Human solvers might try something more efficient, but the above strategy is not at all bad for computer implementation.

It's very much a solved problem with existing technology, see

https://en.wikipedia.org/wiki/Satisfiability_modulo_theories

adding a neural net to it doesn't help in any way.

andrewflnr · on June 12, 2017

Yeah, yeah, sudoku is easy for computers. That's not the point. The point is learning about neural networks. So yeah, "adding a neural net" does help with that.

PaulHoule · on June 12, 2017

In which case the hard part is finding a suitable problem; you might be better off learning about SMT solvers because you don't have Marc Cuban telling everybody to do it.

andrewflnr · on June 13, 2017

We don't really know if it's suitable until we try it.

nojvek · on June 12, 2017

I just feel bad about this "my intuition is that you'll need more layers than 10 in the NN, and much much more data."

I still find current AI as basically a filter. Image to text, speech to text e.tc

A smart AI would be able to figure out sudoku rules from a very small sample of games, it would then figure out a rudimentary backtracking algorithm. It would then sleep over it and figure out more patterns like naked singlets e.t.c and build a really fast sudoku solver with 100% accuracy. All this happening in a matter of seconds.

jacquesm · on June 12, 2017

When a pig suddenly starts to dance the remarkable thing is not that it can not do a tango, but that it can dance at all.

_qc3o · on June 13, 2017

Pigs dance all the time for suitable definitions of dance and time.

I don't think optimizing a rats nest of functions with SGD is that impressive. If a model lacks explanatory power what use is it to people?

In this case when the solution from the NN is incorrect what recourse do you have? The actual solution might be an arbitrary permutation of what the NN gave you and there is no way to tell which rows and columns will have to be reshuffled to get the actual solution or even if there is a solution. The constraints might be inconsistent and you will never know.

iaw · on June 12, 2017

I suspect there may be a method by which one could convert solved sudoku's into numerous candidate puzzles.

I imagine strategically removing numbers from solved puzzles so as to reinforce the neural connections for solution and filter out possible 'noise'

SomeStupidPoint · on June 12, 2017

My immediate thought was to generate training data by making up solved puzzles, and then removing numbers one at a time (while ensuring there's still a unique solution), with a few different branches at each step from an initial solution.

You could generate massive datasets that way -- it would be pretty easy to generate a few billion pre-images of a solution. I mean, a solution has 80 numbers and a puzzle about 20. 80 choose 20 is about 10^18, or a billion billion.

dgfgfdagasdfgfa · on June 12, 2017

Well, sudoku is a solved problem. This would amount to teaching a neural net graph coloring.

dkersten · on June 12, 2017

There's a very elegant solution using dancing links, I wonder could you teach a NN to essentially do that. Somehow.

kortex · on June 12, 2017

So much missing-the-point in this thread. The point isn't "can we solve sudoku with a completely over-wrought solution", it's "can NN be applied to X class of problems with no human-added domain specific knowledge". And that I think is quite cool.

ska · on June 12, 2017

   "can NN be applied to X class of problems with no human-added domain specific knowledge"

Universal approximation theory suggest that for a significant class of problems, the answer to this is "obviously" [1]. The problem that remains is how effective is the learning.

I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.

[1] NB it's not immediately clear that OPs problem is in that class.

jmull · on June 12, 2017

> teaches us anything useful...

Maybe the useful thing here is that Sudoku is a problem space a lot of people understand pretty well even though it's not trivial. That makes a nice domain for thinking about and understanding NN. I guess that people who are familiar with the problem space of solving Sudoku and similar problems but who don't know much about NN will find this pretty interesting but people who already understand NN pretty well (or aren't familiar with solving puzzles like Sudoku) won't.

ska · on June 12, 2017

I think your guess is reasonable. It's the reason I wasn't very prescriptive: i.e. NN is not a very sensible approach if the goal is a great Sudoku solver, but the exercise might have other value I just can't immediately see.

bmh100 · on June 13, 2017

Implications around strategy learning, reinforcement, and learning logical systems.

ska · on June 13, 2017

The point is - would you do better off on these with a more applicable problem domain? Do the things you learn transfer there well?

It's not obvious.

cityhall · on June 12, 2017

Universal approximation just says there exists a solution, not that there's a reasonable way to find it with a bounded amount of training data. The networks used in proofs typically have contrived and impractical architectures.

ska · on June 12, 2017

That's exactly what I meant by:

    The problem that remains is how effective is the learning.

The distinction I'm trying to draw is a tiny bit nuanced - since we know NN's are broadly applicable if we can figure out how to train them, the posters "can NN be used here" is really "can we figure out how to train it". My question is, since Sudoko solution obviously has better, non NN approaches, does spending time on that lead to anything generally useful, or would you be better off spending the same time working out how to train a NN on a more appropriate problem domain?

To late to add an comment to the original.

taneq · on June 13, 2017

> I'm not convinced applying them to areas like this one where the are clearly much better approaches teaches us anything useful for more interesting cases, but maybe it does.

I don't think it teaches us much about sudoku, but any formerly 'hard' problem that we've solved using traditional means is perfect for rapidly generating arbitrary amounts of training data, which makes it very useful for learning about neural nets.

make3 · on June 13, 2017

I wonder how the speed of this method (or a slightly more fine tuned version) compares to the regular method.

petters · on June 12, 2017

I agree. I was a bit surprised that this approach worked as well as it did.

(I still think it would fail spectacularly on "super-human" sudokus.)

jcoffland · on June 12, 2017

> The point isn't "can we solve sudoku with a completely over-wrought solution", it's "can NN be applied to X class of problems with no human-added domain specific knowledge"

But why would you care about solving a class of problems with NNs when that class of problems already has much better solutions? Too many new programmers are running to NNs out of pure laziness. NNs are like magic. They solve the problem for you so you don't have to learn how to solve it yourself. Or at least that's the enticing promise.

namuol · on June 12, 2017

> why would you care about solving a class of problems with NNs when that class of problems already has much better solutions?

> NNs are like magic. They solve the problem for you so you don't have to learn how to solve it yourself.

Sounds like you answered your own question; am I missing something?

jcoffland · on June 13, 2017

The part you are missing is that the magic is illusory.

bonzini · on June 12, 2017

And especially when humans don't use intuition to solve it, but basically an inefficient version of the same algorithms that are used by computers.

monkeydust · on June 12, 2017

Agree its impressive.

majewsky · on June 12, 2017

Evidence that DNNs are somewhere between phase 2 and 3 of https://en.wikipedia.org/wiki/Hype_cycle

dahart · on June 12, 2017

I'm sometimes cynical about NNs too, but even if I were to take the hype cycle theory at face value, I have to admit it's possible that NNs are reaching maturity ("plateau of productivity") now, given that there were large hype cycles over them in the 1960s and 1980s, and given that they are starting to really work and are being deployed in large scale applications. But, this hype cycle theory isn't scientific, so all supporting evidence is anecdotal and subjective, right?

wiredfool · on June 12, 2017

Soduku is trivially directly solvable for puzzles that are not in the hard/ultra hard level by propagating constraints.

For cases where that's not possible, iterative approaches can be very effective -- with not particularly optimized python, the worst solution time I ever saw was 1 sec on a 1st gen eeepc netbook. That had about 6 layers of iterative backtracking before it came up with the solution.

jcoffland · on June 12, 2017

I wrote a Soduku solver in Python while waiting in the airport on a long layover that solves even the most difficult puzzles in a few seconds on an now old laptop. It requires some backtracking but it's really not that hard.

Edit: I decided to publish my implementation on GitHub. https://github.com/jcoffland/fsudoku

wiredfool · on June 12, 2017

Yeah, I was seeing if I could do the complete solution faster than a single solution. But I was at the inlaw's dinner table, not the airport.

This is it, uploaded a few years later in a repo for soduku over sms:

https://github.com/wiredfool/sms-doku/blob/master/so.py

brian_herman · on June 12, 2017

You can solve sudoku with a SAT solver. You don't need neural networks, this was an assignment in cs 251 at UIC.

zmonx · on June 12, 2017

Indeed! Here is Sudoku, formulated as a SAT problem. You can use for example SICStus Prolog to run it:

    sudoku(Rows) :-
            length(Rows, 9), maplist(same_length(Rows), Rows),
            maplist(row_booleans, Rows, BRows),
            maplist(booleans_distinct, BRows),
            transpose(BRows, BColumns),
            maplist(booleans_distinct, BColumns),
            BRows = [As,Bs,Cs,Ds,Es,Fs,Gs,Hs,Is],
            blocks(As, Bs, Cs), blocks(Ds, Es, Fs), blocks(Gs, Hs, Is).

    blocks([], [], []).
    blocks([N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]) :-
            booleans_distinct([N1,N2,N3,N4,N5,N6,N7,N8,N9]),
            blocks(Ns1, Ns2, Ns3).

    booleans_distinct(Bs) :-
            transpose(Bs, Ts),
            maplist(card1, Ts).

    card1(Bs) :- sat(card([1],Bs)).

    row_booleans(Row, Bs) :-
            same_length(Row, Bs),
            maplist(cell_boolean, Row, Bs).

    cell_boolean(Num, Bs) :-
            length(Bs, 9),
            sat(card([1],Bs)),
            element(Num, Bs, 1).

This uses the sat/1 constraint to denote Boolean satisfiability.

Here is an example Soduku problem, taken from Donald Knuth's The Art of Computer Programming:

    knuth(b, [[1,_,3,_,5,6,_,8,9],
              [5,9,7,3,8,_,6,1,_],
              [6,8,_,1,_,9,3,_,5],
              [9,5,6,_,3,1,8,_,7],
              [_,3,1,5,_,8,9,6,_],
              [2,_,8,9,6,_,1,5,3],
              [8,_,9,6,_,5,_,3,1],
              [_,6,5,_,1,3,2,9,8],
              [3,1,_,8,9,_,5,_,6]]).

Sample query and result:

    ?- knuth(b, Rows),
       sudoku(Rows),
       maplist(portray_clause, Rows).
    [1, 2, 3, 4, 5, 6, 7, 8, 9].
    [5, 9, 7, 3, 8, 2, 6, 1, 4].
    [6, 8, 4, 1, 7, 9, 3, 2, 5].
    [9, 5, 6, 2, 3, 1, 8, 4, 7].
    [7, 3, 1, 5, 4, 8, 9, 6, 2].
    [2, 4, 8, 9, 6, 7, 1, 5, 3].
    [8, 7, 9, 6, 2, 5, 4, 3, 1].
    [4, 6, 5, 7, 1, 3, 2, 9, 8].
    [3, 1, 2, 8, 9, 4, 5, 7, 6].

tchalla · on June 13, 2017

SAT - Satisfiability

https://en.wikipedia.org/wiki/Boolean_satisfiability_problem

Acronyms are always a pet peeve of mine. I would know what a SAT is - since I've spent more than 50% of my life with Computer Science. However, not many people would understand the acronym you'd use. So, it would be great if in general one mentions what SAT refers to rather than using the acronym. Mention the long form the first time and then use the short form later in your sentence.

Sorry if this came across as too affront, just a personal observation. I saw this a couple of ago with GRUs on another thread too.

YeGoblynQueenne · on June 13, 2017

I think "SAT solver" gives a sufficient hint?

stefs · on June 12, 2017

yes, a NN isn't the best tool for the job. i guess it's just an experiment if one was able to do this with a reasonable rate of success.

brian_herman · on June 12, 2017

Yeah, I figured as such. I assume you are trying to learn about NN?

likelynew · on June 12, 2017

There are no easy way to solve SAT as the size of sudoku. Modern day SAT solvers are very large and contains years of experience, containing 10s of heuristics and complex structures. If we can simplify using neural network, I think it's a great step.

CJefferson · on June 12, 2017

Reasonably modern SAT solvers (Say MiniSAT) have a lot less complexity than a modern neural network.

Also, I set as a intro to C practical writing a SAT solver which can easily solve any real-world Sudoku instance.

likelynew · on June 12, 2017

Are you serious.. SAT solvers have some of the hardest code I have seen. Look at the SAT solvers that won satcompetition. On the other hand IMO, implementing a neural network on your own is very easy. But the point is not that. I consider the neural network libraries(e.g. tensorflow) as given, as it is in not related to the problem, and can be used in many different problems. I will consider only the code above the neural network API for the complexity(like models, encoding etc.). But a SAT solver has a specific function and is core part of the problems it is trying to solve.

taeric · on June 12, 2017

I think you are running afoul of not realizing that "world class" neural networks are a ton of code. Just as a trivial SAT solver is easy and can be implemented rather quickly.

Now, neither will be that good at the job they are made for. But that is far from unique to this field. Consider, a car/bike/house/whatever is not exactly hard to build for backyard use. Getting world class, on the other hand, escalates to difficult very quickly.

joshuata · on June 12, 2017

Yes, competitive SAT solvers are very complicated. However, they are not outside of most developers' abilities. During a verification course at my University we had to build a simple SAT solver. Most students were able to complete the project in ~20-30 hours. Mine wasn't fast, but it worked well enough to be a proof of concept.

Also, saying that NN are easy with a library like tensorflow is akin to saying that Sudoku solvers are easy with Prolog. Of course they are! The library/language does the vast majority of the heavy lifting.

CJefferson · on June 12, 2017

There are super-complicated SAT solvers certainly, but MiniSAT has (in my opinion) reasonably easy to understand code. It's not very long either. Have you tried reading the code to Tensorflow?

Also, in my personal experience, SAT is currently applicable to more problems than Neural Networks (although that is changing as Neural Networks get applied to more things). You can use SAT solvers to solve any problem in NP (well, other things as well but it rapidly gets painful). That's a lot of problems!

make3 · on June 13, 2017

I just want to mention that this NN code is as simple as it gets and that anyone remotely knowing what they're doing could code op's thing in about 2 hrs.

contravariant · on June 12, 2017

Well, it is really more of a exact cover problem, which can be solved quite simply and elegantly with Knuth's algorithm X.

A neural network isn't simpler in any sense of the word. You might as well throw a simulated annealer at the problem.

microtherion · on June 12, 2017

> A neural network isn't simpler in any sense of the word. You might as well throw a simulated annealer at the problem.

Great idea! I'm sure D-WAVE would be happy to work with you on a quantum sudoku solver.

likelynew · on June 12, 2017

I am not saying it is not. My condition is in what if neural network can solve a problem without any domain knowledge. I think it's a huge win even if we don't understand anything about the solution. Be it well solved problems like approximate minimum distance in a graph to practically unsolvable like automatically proving unproved theorems in mathematics.

likelynew · on June 12, 2017

I take it this way. How many problems can you solve using knowing mostly exact cover. Not very many. But, if we can build ML systems that can solve mostly any problem(we can't today) even without giving any insight, I would say it will be one of the things whose impact will surpass anything. Note, I am not getting into AGI, just saying a system that can solve objective problems. And while sudoku is not a very good example, but I think it shows we can do many cool things from neural networks. Neural networks are the most sure bet for such a system.

taeric · on June 12, 2017

You'd be surprised how far exact cover can get you. The trick isn't in knowing exact cover, per se. But in seeing how to map problems to different problems.

That is, the "exact cover" nature of Sudoku is not immediately obvious to everyone. At least, it wasn't obvious to me. Seeing how quickly you can map it to that and then get a solution was a lot of fun and ridiculously educational.

contravariant · on June 12, 2017

None of the alternatives I proposed require domain knowledge, they just require you to state the sudoku rules, which is inevitable.

jakob223 · on June 12, 2017

I don't know that neural netowrks are particularly "simpler" than SAT solvers. They're both complex general tools with different requirements of problem knowledge.

taeric · on June 12, 2017

I'd be surprised if NNs come up with a simpler result. At least, not one that has anything close to intuitive results.

That is, yes, sat solvers have a lot of heuristics to speed them up. But, almost by definition, we understand those heuristics do can reason about the answers they give. NNs, however, are notorious for being incredibly opaque. They give his probabilistic answers, but my understanding is we really only trust them probabilistically and can't explain their answers.

brian_herman · on June 12, 2017

We did use a library for the SAT solving. Hmm didn't think of that though.

jcoffland · on June 12, 2017

Even a SAT solver is overkill.

the-dude · on June 12, 2017

Obligatory reference to the (former?) Prime Minister of Singapore : https://arstechnica.com/information-technology/2015/05/prime...

willvarfar · on June 12, 2017

Every time sudoku comes up on HN I think back to when Ron Jeffries tried to solve it using Test Driven Development ;)

http://ravimohan.blogspot.se/2007/04/learning-from-sudoku-so... - enjoy :D

c3534l · on June 12, 2017

I was going to say those really don't seem comparable, but after reading Jeffries', that really was awful.

sarabande · on June 12, 2017

I find it interesting that when the NN fails to complete the whole puzzle, it seems to fail spectacularly (github.com/Kyubyong/sudoku#results) -- that is, there aren't a lot of good-partial attempts (90% or above). Does anyone know why this might be the case? Does the first wrong placement of a number in a gap essentially ruin the rest of the guesses?

suddensleep · on June 12, 2017

My intuition says yes; if a wrong number is placed in a square and then taken as ground truth for solving the rest of the puzzle, it certainly seems like the error would propagate.

As a concrete example, say you misplace a '2' somewhere within a given puzzle. Obviously, this cell is incorrect. But depending on the nature of what the NN has learned, it may believe the row (resp. column, 3x3 box constraint) already has the '2' in it, so tries to fill its correct spot with another number. Which of course then leads to the column and/or 3x3 box of that cell to learn an incorrect value, starting the process over again.

This same phenomenon can be seen in the game Kenken; depending on the strategies you use at any given point in the game, one mistake can propagate outward pretty quickly and spoil large sections of the puzzle.

dahart · on June 12, 2017

I'm not certain, but based on the description by the author, this solver is sometimes doing probabilistic guessing. You're more likely to make a mistake early when you do this -- there are fewer chances to guess incorrectly near the end of the game as inferences grow stronger, and more chances in the beginning and middle to guess wrong which will then snowball into a more wrong answer. So statistically, it makes sense that failures tend to be large.

I thought it was interesting that the "hard" category seemed to have more wrong answers than the 2 harder categories. Maybe there just aren't enough samples, but in the data shown in the table, it seems like a big difference.

bitwize · on June 12, 2017

Reminds me of Watson's Jeopardy! exhibition match. Watson was right far more often than the humans, but when it was wrong it was way off base.

CJefferson · on June 12, 2017

One big problem is the network is never trained on incorrect Sudokus, so at that point it's effectively into random guessing.

LanceH · on June 12, 2017

Once you get a few right, the solution spaces closes very fast. A valid sudoku puzzle has a unique solution, so if you had 90% correct, there would only be one thing to choose from for each of the other 9 spaces.

Undertow_ · on June 12, 2017

Probably an early-onset snowballing of what you said. When one wrong placement is made, it's pretty hard to recover for a full puzzle.

benp84 · on June 13, 2017

As a combinatorial optimization problem, integer programming is the best tool for solving Sudoku. Fast, accurate, and guarantees the optimality of the solution. I'm impressed at how well this neural network did anyway though!

nathell · on June 12, 2017

> 1M games were generated using generate_sudoku.py for training. I've uploaded them on the Kaggle dataset storage.

No need to do that. It would suffice to just include the random seed in the script so that its results are reproducible.

jononor · on June 12, 2017

One should do both. Scripts and their dependencies have bugs, potentially leading to another sequence of games. At the very least one should specify a couple of the games, so one can check against them.

trextrex · on June 12, 2017

That looks interesting. Sudoku is a fairly useful constraint satisfaction problem/testbed to test neural networks architectures on, since the most general version of sudoku is NP-complete.

There was some interesting work a while ago that solved sudoku using spiking neural networks too [1] (Fig. 5) with some nice associated theory.

[1] http://journals.plos.org/ploscompbiol/article?id=10.1371/jou...

bhaavan · on June 12, 2017

I thought Sudoku can be solved in constant time. It would be a O(9! * 9! ) or something, which is just a O(1), isn't it?

trextrex · on June 12, 2017

That's not really constant time is it? Constant time implies that the time taken to solve the problem is not a function of the input size (input in this case being the sudoku board). For a board size of 9x9 it's O(9!x9!).

The paper which showed that the general case of Sudoku is NP-complete was apparently this one [1] (Linked from wikipedia)

[1] http://www-imai.is.s.u-tokyo.ac.jp/~yato/data2/SIGAL87-2.pdf

std_throwaway · on June 12, 2017

Can we say that the number of possible states that the universe can assume is finite? Can we further conclude that everything that is possible must fit within the boundaries of the universe and therefore everything is either O(1) or unrealistic?

dahart · on June 12, 2017

A constant time solution in years or eons isn't very helpful. There are around 6.7e21 complete sudoku boards if you want to brute force enumerate them.

https://en.m.wikipedia.org/wiki/Mathematics_of_Sudoku#Enumer...

tom_mellior · on June 12, 2017

> if you want to brute force enumerate them

But you don't. Even what you might call "brute force" Sudoku solvers start from the given hints in the puzzle, which reduce the search space considerably. Given a set of Sudoku hints with a unique solution, you won't need "years or eons" with even a very simple solver.

dahart · on June 12, 2017

Yes, of course you don't, I think you misunderstood my comment. The comment I replied to was asking about true brute force enumeration of all boards -- not a solver that uses inferences based on the hints -- he guessed that you only need 9! * 9! total attempts which implies enumerating all boards and picking the one that matches the initial hints. This is in fact a constant time solution like he guessed, but it is also intractable.

A normal "very simple" solver of the kind you're talking about will solve the solution very quickly, in seconds or less, and it will also not be a constant time algorithm.

mamon · on June 12, 2017

9!*9! is just little over 131 billion dashboards. checking correctnes (no number is repeated in row or column) is as simple as counting a sum in all boxes, rows and columns.

Generation of all possible boards is little more complex but I can hardly see how it could take "eons" - with proper implementation (Apache Spark :P ) and reasonably powerful hardware (1000 CPU core cluster, ha! ha! ha!) it should run under 1 day and take just little over 13 TB of disk/RAM space :).

dahart · on June 12, 2017

9! * 9! is not the number of boards, that is a bad estimate. If you don't account for symmetries, rotations, re-numberings and other things, the number of boards is 6.7e21. Even if you could check a full board in a nanosecond (which you can't) enumerating that number would take more than 200,000 cpu-years.

The paper linked to in the OP's article says: "Due to the sheer number of sudoku solution grids a brute force search would have been infeasible, but we found a better approach to make this project possible. Our software for exhaustively searching through a completed sudoku grid, named checker, was originally released in 2006. However, this first version was rather slow. Indeed, the paper [1] estimates that our original checker of late 2006 would take over 300,000 processor-years in order to search every sudoku grid."

https://arxiv.org/pdf/1201.0749.pdf

The estimate in the comment (9!*9!) seems to be implying a simple enumeration, not a complex strategy of symmetry-folding. But even if you do reduce the enumeration, the authors of that paper say their software requires 800 CPU years. I'm not making any claims about whether getting that down to a day might be possible, but I wish you good luck. By all means, show everyone how to do it with a proper implementation and a large cluster! ;)

tom_mellior · on June 12, 2017

O(1) means bounded by a constant upper bound, not that every actual run of the program takes the exact same amount of time. Just like "simple quick sort is O(n^2)" does not mean that every invocation will take a quadratic amount of time. The parent's "can be solved in constant time" means the same thing, informally.

Anyway, we are in agreement that enumerating all boards would be quite inefficient.

dahart · on June 12, 2017

Are you claiming that all sudoku solvers are constant time algorithms, because the upper bound is not greater than enumerating the total constant number of boards? There are no O(1) sudoku solvers in existence that run in a short amount of time, yet you seem to be suggesting that the "very simple" sudoku solvers you referred to can be classified as O(1) algorithms.

tom_mellior · on June 12, 2017

I'm claiming it's possible to write Sudoku solvers that have some constant upper bound on execution time. If you read this as "a short amount of time", you are misreading it.

I'm also not saying that this is true for all Sudoku solvers, since you could write one that sometimes just decides to uselessly burn cycles for a few centuries before returning a correct solution.

Here's that simple argument again: Sudoku can be solved in time at most exponential in the input size. The input size of 9x9 Sudoku is constant. The exponential of that constant is a constant. Not a very interesting argument? I agree, so I'm dropping this now.

daxfohl · on June 12, 2017

"For the most general version of Sudoku" -- post doesn't detail what "general" means, but if the variable is board size, then it's O(N! * N!).

zeckalpha · on June 12, 2017

Sure, but so can decision trees.

sr2 · on June 12, 2017

It's a fascinating subject. Worth reading this entry for more details:

https://en.wikipedia.org/wiki/Sudoku_solving_algorithms

And apparently quantum computers can solve them using sheer brute force alone, but these articles don't go into enough detail (i.e: can they solve the 'evil' puzzles?)

https://www.engadget.com/2007/02/14/worlds-first-commercial-...

https://www.scientificamerican.com/article/first-commercial-...

xor0110 · on June 12, 2017

Coming up next: CNNs applied to training CNNs

dimatura · on June 12, 2017

That actually happened a long time ago.

tbonza · on June 13, 2017

Peter Norvig cracked Sudoku using a clever algorithm: http://norvig.com/sudoku.html

andreyk · on June 12, 2017

Using 10 convolutional layers seems questionable - the precise values and location of each number is essential for getting the right answer, so it seems like the model design is poorly suited to the task. The obvious approach to me would be to train a neural net to be a heuristic to order exploration in a constraint satisfaction solver, mirroring how AlphaGo combines monte carlo search trees with neural nets. Might net a nice speed up.

tinyrick2 · on June 12, 2017

A more interesting problem would be to train an n x n sudoku and test the model on an m x m sudoku where m < n, or even m > n. Does it make sense?

edit: formatting

jcoffland · on June 12, 2017

Here is my collection of standard sized Sudoku test puzzles: https://github.com/jcoffland/fsudoku/tree/master/puzzles

Some of these are the hardest possible puzzles, with a solution, for the size.

bluetwo · on June 12, 2017

Funny, I taught my AI engine to solve easy, medium, and hard NYT sudoku last week.

It needed only 150 tries to solve the hard puzzle.

faragon · on June 12, 2017

I would like to know if can be proven that NN for problems involving permutations can be as efficient as a SAT solver, less, or more efficient (e.g. if using the training just as a acceleration avoid learning futile permutations works for saving training time, or if it is not noticeable)

_qc3o · on June 13, 2017

This is interesting. My intuition says NNs will be terrible at combinatorial optimization types of problems. Then again Go is a combinatorial type of game and AlphaGo beat everyone.

Maybe an NN for Sudoku should use AlphaGo type of architecture.

shriphani · on June 12, 2017

Indeed. A simple solution is to add a differential optimization layer (see Amos et al): https://arxiv.org/abs/1703.00443

DrNuke · on June 12, 2017

One underlying question is who pays for diffuse crowdsourcing in 2017? I have indirect notice of at least a dozen small enterprises suspected of raiding githubs daily and refactorizing for commercial purposes.

picrin · on June 12, 2017

I'd say that using neural networks to solve sudoku-like puzzles is a bad idea -- a SAT solver, constraints program or integer program would be all quicker to run and quicker to implement.

cabalamat · on June 12, 2017

I can't help thinking that a gofai approach would be a better way to solve sudoku than an NN.

santaclaus · on June 12, 2017

Interesting -- there must be a large literature on neural nets and NP-completeness, no?

1001101 · on June 12, 2017

If the wet ones can, I don't see why the ones of an arbitrary size made out of bits can't. Not a very good argument, but there's this: recurrent neural networks are Turing complete.

sp332 · on June 12, 2017

Humans don't learn how to play Sudoku by staring at a bunch of solved puzzles. We are given the rules up front.

gm-conspiracy · on June 12, 2017

Nail, meet hammer.

jcoffland · on June 12, 2017

Or rather, vaguely nail shaped object, meet hammer.

webninja · on June 14, 2017

When you have a hammer, everything starts looking like a nail.

jlebrech · on June 13, 2017

except maybe the point is to make the hammer.