How We Built the ARKit Sudoku Solver

bnjmn · on Oct 10, 2017

Since the author spent so much time on the optical character recognition step, it's worth mentioning that you don't even really need OCR for this task.

You can just find the squares, group the characters in the squares into visual equivalence classes, assign each class an arbitrary number, solve the puzzle in terms of those numbers, then fill in each empty square with the (average?) image of the equivalence class it matches.

This would allow you to solve a Sudoku puzzle with letters or WingDings instead of numbers, and the output font would naturally match that of the original puzzle.

spaceheeder · on Oct 10, 2017

If the app still wanted to support puzzles that had notes scribbled in them then it would need some kind of OCR to tell the difference between a "starter" cell with known-good data and "puzzle" cells it needs to solve.

But treating the symbols in the starter cells as arbitrary is ingenious, imo!

gok · on Oct 10, 2017

Using OCR means you can solve a Sudoku with some printed and some hand-written numbers though.

yeldarb · on Oct 10, 2017

That's an interesting idea! What happens if the source puzzle doesn't have all of the digits present though?

ndh2 · on Oct 10, 2017

Take random characters from WingDings.

bnjmn · on Oct 10, 2017

This! Still, admittedly, this is a drawback of my strategy.

amrrs · on Oct 10, 2017

An earlier thread about the same app where the dev promised to write this medium post: https://news.ycombinator.com/item?id=15299822

Simon_says · on Oct 10, 2017

> By the time we launched the app it was trained on over a million images of Sudoku squares.

This is super cool, but I can't help but think that something is missing if it takes hundreds of thousands of examples of digits for a machine learning algorithm to be able to differentiate them. It wouldn't take a human child this many. The available machine learning algos are not using near the amount of information available.

empthought · on Oct 10, 2017

I'll venture to say that one missing piece is the hundreds of million of years of nervous system evolution to distinguish signal from noise.

mcintyre1994 · on Oct 10, 2017

FWIW the MNIST dataset they talked about (handwritten digits) only has 60,000 training samples - and that dataset surely has more variance than printed puzzles from magazines.

rfish · on Oct 10, 2017

I think today's machine learning algorithms mimic more evolution than they do how humans actually learn. I think current techniques in machine learning however can one day teach a machine how to learn by building a fixed neural net that has memory that is the learning algorithm.

Cthulhu_ · on Oct 10, 2017

You'd think recognising a square (containing squares and more squares if need be) would be relatively simple and not require advanced machine learning / training. Or that recognising a square doesn't take as much. The demo also indicates the sudoku needs to be fairly accurately scanned, similar to a QR code.

dr_zoidberg · on Oct 10, 2017

I think he meant for character recognition -- it's explained in the post that they went for an in-house dataset (from sudoku magazines I understand) instead of MNIST. They ran into some issues, found the way to solve them, and improved their training set. This allowed them to reach 98.6% accuracy, and after a few updates to the app over 99%.

zimpenfish · on Oct 10, 2017

I don't think the square recognition used any machine learning.

> We use iOS11’s Vision Library to detect rectangles in the image.

Looking at https://github.com/gunapandianraj/iOS11-VisionFrameWork - this definitely doesn't touch CoreML

yeldarb · on Oct 10, 2017

It's unclear whether Vision uses machine learning behind the scenes though. It's kind of implied in their docs that it uses CoreML behind the scenes (which makes sense with the other things it does like Face recognition and object tracking).

The nice thing is it detects "projected rectangular regions" so even if the puzzle isn't aligned with the camera it still works.

I do wish I had more control though; it runs into trouble sometimes and there's not much I can do other than apply heuristics afterwards to determine whether I should throw out the sample or continue.

Example of a bad read from Vision Rectangle Detection: https://imgur.com/a/RSpTG

zimpenfish · on Oct 13, 2017

> Example of a bad read from Vision Rectangle Detection: https://imgur.com/a/RSpTG

Well, it's technically correct - it did find a rectangle :)

zaroth · on Oct 10, 2017

Really enjoyed reading this. The process is really explained well including all the fun rabbit holes and unexpected pitfalls on launch day, and the technical steps to overcome.

Interesting limitations to work around such as vertical planes vs horizontal, and focal length.

Not at all surprised they saw better performance with almost immediate payback by training models on their own $1,200 hardware than running in the cloud.

Very interesting they trained their own character recognition model and not only that but built their own custom crowd-sourced image labeling system complete with accuracy checks and review screens.

Overall, fantastic write-up!

yeldarb · on Oct 10, 2017

"They" haha :P

I was surprised to read that IKEA had 70 employees working on their ARKit app! (https://twitter.com/DanielZarick/status/917472837295837185)

This whole thing (including the backend tools) took me about the equivalent of 1 month of full-time work (I was doing it mostly nights & weekends though since our games are what pay the bills).

I brought in one of my (excellent) designers from Hatchlings a couple days before launch to make the cool grid "scanning" animation and to do our branding and logo.

Someone · on Oct 10, 2017

Only 70? I can see lots of work creating accurate (in size and in colors) 3D models of every item in their catalog that look good, and maybe even more discussing with management whether the current model accurately portrays the product.

I don’t know whether the functionality is present (last time I checked, the app wasn’t available in ‘my’ App Store), but integrating the app with their inventory system(s) and translating it also can’t be free.

kemayo · on Oct 10, 2017

They have some in-store kiosks for building mocked up rooms already, which would plausibly have some usable assets there.

sgt101 · on Oct 10, 2017

The machine vision part of the ARKit project is definitely "wow"!

I had fun writing a Sudoku solver in Julia.

https://github.com/sgt101/simons-silly-sudoko-in-julia

DannyBee · on Oct 10, 2017

So, 6 years ago, Google Goggles could solve sudoku puzzles for you from pictures.

It's interesting to compare how far the interface/speed has come in 6 years if you watch the videos of how it was done then: http://googlemobile.blogspot.com/2011/01/google-goggles-gets...

(Goggles did not do the image handling on device)

langitbiru · on Oct 10, 2017

I think, in a year or two, someone will build a crossword puzzles solver using AR, ML, and computer vision. Granted, it is more difficult because we need to recognize the alphabets and solving crossword puzzles is much harder than solving sudoko. At least, the crossword puzzles solver can give word recommendation if it can not solve the puzzle completely.

dfan · on Oct 10, 2017

Crosswords are pretty tough, partially because many puzzles (such as the Tuesday, Thursday, and Sunday puzzles in the New York Times) have enough wordplay in their answers (not just the clues) that they break normal crossword rules, in a specific way that the solver has to determine. I think Dr. Fill (https://arxiv.org/abs/1401.4597) is still the state of the art.

rmorey · on Oct 10, 2017

I found this very interesting, regarding crowdsourcing the training data:

"After the first pass I had enough verified data that I was able to add an automatic accuracy checker into both tools for future data runs (it would periodically show the user known images and check their work to determine how much to trust their answers going forward)."

lingz · on Oct 10, 2017

Seems like the challenge of applying AR is more in smart design than advanced ML.

yeldarb · on Oct 10, 2017

Good observation! Be sure to check out part 1 where I talked about the design decisions behind the app: https://blog.prototypr.io/why-we-built-magic-sudoku-the-arki...

gtm1260 · on Oct 10, 2017

Does anyone know how the detection of sudokus on vertical planes can be achieved. Great article esp on the crowdsourcing and machine vision fronts, but the authors explanation of this aspect left a lot to be desired.

yeldarb · on Oct 10, 2017

Sorry, I was pretty hand-wavy with that because it was basically just trial and error until it worked sufficiently well.

The data I had available to mess with was the difference in width of the top of the puzzle and the bottom (with some trig you can determine its angle relative to the camera) and the projection matrix of the camera relative to the scene origin.

It's not perfect but it works better than having nothing at all.

elsurudo · on Oct 10, 2017

Definitely cool, and I like the application of ARKit. But using ML to solve a sudoku seems like overkill. I remember writing a constraint-based solver as the first assignment for an undergrad-level AI course back in uni. Surely this implementation is less efficient? Someone let me know if I am wrong.

If the aim was to simply learn new tech, though, then I get it. I am just wary of ML being a hammer used on anything even remotely resembling a nail.

deafcalculus · on Oct 10, 2017

The author says ML wasn't used for solving Sudoku. Vision was only used for transducing the image of a sudoku puzzle to a puzzle structure in memory.

elsurudo · on Oct 10, 2017

Oops, must have missed it. Thanks for the clarification.

dingo_bat · on Oct 10, 2017

It says they used a "traditional recursive algorithm", probably referring to the backtracking solution. In my experience it's fast enough to not matter for this sort of application (the other things that are going on are 1000x more complex).

eutectic · on Oct 10, 2017

There is a small fraction puzzles for which simple backtracking gets stuck on unproductive branches and essentially never finishes.

yeldarb · on Oct 10, 2017

Do you have any examples? I'd love to improve the algorithm.

Someone on /r/programming pointed me here: http://apollon.issp.u-tokyo.ac.jp/~watanabe/sample/sudoku/in...

But the app seems to already handle those Ok without doing anything special: https://www.dropbox.com/s/arfd03kr8ieczk5/recursive-solver-k...

gitgud · on Oct 11, 2017

Did they really acquire 600,000 images for training data by scanning books by hand? How long did that take?

yeldarb · on Oct 11, 2017

A couple of hours. I made a tool to do it automatically. Flip page, hold up phone, move it around to a few different angles, flip page, repeat.

I should note that’s 600k small squares so each full puzzle scan yields 81 small images.