Hacker News new | past | comments | ask | show | jobs | submit login
How We Built the ARKit Sudoku Solver (prototypr.io)
151 points by anielsen on Oct 10, 2017 | hide | past | favorite | 37 comments



Since the author spent so much time on the optical character recognition step, it's worth mentioning that you don't even really need OCR for this task.

You can just find the squares, group the characters in the squares into visual equivalence classes, assign each class an arbitrary number, solve the puzzle in terms of those numbers, then fill in each empty square with the (average?) image of the equivalence class it matches.

This would allow you to solve a Sudoku puzzle with letters or WingDings instead of numbers, and the output font would naturally match that of the original puzzle.


If the app still wanted to support puzzles that had notes scribbled in them then it would need some kind of OCR to tell the difference between a "starter" cell with known-good data and "puzzle" cells it needs to solve.

But treating the symbols in the starter cells as arbitrary is ingenious, imo!


Using OCR means you can solve a Sudoku with some printed and some hand-written numbers though.


That's an interesting idea! What happens if the source puzzle doesn't have all of the digits present though?


Take random characters from WingDings.


This! Still, admittedly, this is a drawback of my strategy.


An earlier thread about the same app where the dev promised to write this medium post: https://news.ycombinator.com/item?id=15299822


> By the time we launched the app it was trained on over a million images of Sudoku squares.

This is super cool, but I can't help but think that something is missing if it takes hundreds of thousands of examples of digits for a machine learning algorithm to be able to differentiate them. It wouldn't take a human child this many. The available machine learning algos are not using near the amount of information available.


I'll venture to say that one missing piece is the hundreds of million of years of nervous system evolution to distinguish signal from noise.


FWIW the MNIST dataset they talked about (handwritten digits) only has 60,000 training samples - and that dataset surely has more variance than printed puzzles from magazines.


I think today's machine learning algorithms mimic more evolution than they do how humans actually learn. I think current techniques in machine learning however can one day teach a machine how to learn by building a fixed neural net that has memory that is the learning algorithm.


You'd think recognising a square (containing squares and more squares if need be) would be relatively simple and not require advanced machine learning / training. Or that recognising a square doesn't take as much. The demo also indicates the sudoku needs to be fairly accurately scanned, similar to a QR code.


I think he meant for character recognition -- it's explained in the post that they went for an in-house dataset (from sudoku magazines I understand) instead of MNIST. They ran into some issues, found the way to solve them, and improved their training set. This allowed them to reach 98.6% accuracy, and after a few updates to the app over 99%.


I don't think the square recognition used any machine learning.

> We use iOS11’s Vision Library to detect rectangles in the image.

Looking at https://github.com/gunapandianraj/iOS11-VisionFrameWork - this definitely doesn't touch CoreML


It's unclear whether Vision uses machine learning behind the scenes though. It's kind of implied in their docs that it uses CoreML behind the scenes (which makes sense with the other things it does like Face recognition and object tracking).

The nice thing is it detects "projected rectangular regions" so even if the puzzle isn't aligned with the camera it still works.

I do wish I had more control though; it runs into trouble sometimes and there's not much I can do other than apply heuristics afterwards to determine whether I should throw out the sample or continue.

Example of a bad read from Vision Rectangle Detection: https://imgur.com/a/RSpTG


> Example of a bad read from Vision Rectangle Detection: https://imgur.com/a/RSpTG

Well, it's technically correct - it did find a rectangle :)


Really enjoyed reading this. The process is really explained well including all the fun rabbit holes and unexpected pitfalls on launch day, and the technical steps to overcome.

Interesting limitations to work around such as vertical planes vs horizontal, and focal length.

Not at all surprised they saw better performance with almost immediate payback by training models on their own $1,200 hardware than running in the cloud.

Very interesting they trained their own character recognition model and not only that but built their own custom crowd-sourced image labeling system complete with accuracy checks and review screens.

Overall, fantastic write-up!


"They" haha :P

I was surprised to read that IKEA had 70 employees working on their ARKit app! (https://twitter.com/DanielZarick/status/917472837295837185)

This whole thing (including the backend tools) took me about the equivalent of 1 month of full-time work (I was doing it mostly nights & weekends though since our games are what pay the bills).

I brought in one of my (excellent) designers from Hatchlings a couple days before launch to make the cool grid "scanning" animation and to do our branding and logo.


Only 70? I can see lots of work creating accurate (in size and in colors) 3D models of every item in their catalog that look good, and maybe even more discussing with management whether the current model accurately portrays the product.

I don’t know whether the functionality is present (last time I checked, the app wasn’t available in ‘my’ App Store), but integrating the app with their inventory system(s) and translating it also can’t be free.


They have some in-store kiosks for building mocked up rooms already, which would plausibly have some usable assets there.


The machine vision part of the ARKit project is definitely "wow"!

I had fun writing a Sudoku solver in Julia.

https://github.com/sgt101/simons-silly-sudoko-in-julia


So, 6 years ago, Google Goggles could solve sudoku puzzles for you from pictures.

It's interesting to compare how far the interface/speed has come in 6 years if you watch the videos of how it was done then: http://googlemobile.blogspot.com/2011/01/google-goggles-gets...

(Goggles did not do the image handling on device)


I think, in a year or two, someone will build a crossword puzzles solver using AR, ML, and computer vision. Granted, it is more difficult because we need to recognize the alphabets and solving crossword puzzles is much harder than solving sudoko. At least, the crossword puzzles solver can give word recommendation if it can not solve the puzzle completely.


Crosswords are pretty tough, partially because many puzzles (such as the Tuesday, Thursday, and Sunday puzzles in the New York Times) have enough wordplay in their answers (not just the clues) that they break normal crossword rules, in a specific way that the solver has to determine. I think Dr. Fill (https://arxiv.org/abs/1401.4597) is still the state of the art.


I found this very interesting, regarding crowdsourcing the training data:

"After the first pass I had enough verified data that I was able to add an automatic accuracy checker into both tools for future data runs (it would periodically show the user known images and check their work to determine how much to trust their answers going forward)."


Seems like the challenge of applying AR is more in smart design than advanced ML.


Good observation! Be sure to check out part 1 where I talked about the design decisions behind the app: https://blog.prototypr.io/why-we-built-magic-sudoku-the-arki...


Does anyone know how the detection of sudokus on vertical planes can be achieved. Great article esp on the crowdsourcing and machine vision fronts, but the authors explanation of this aspect left a lot to be desired.


Sorry, I was pretty hand-wavy with that because it was basically just trial and error until it worked sufficiently well.

The data I had available to mess with was the difference in width of the top of the puzzle and the bottom (with some trig you can determine its angle relative to the camera) and the projection matrix of the camera relative to the scene origin.

It's not perfect but it works better than having nothing at all.


Definitely cool, and I like the application of ARKit. But using ML to solve a sudoku seems like overkill. I remember writing a constraint-based solver as the first assignment for an undergrad-level AI course back in uni. Surely this implementation is less efficient? Someone let me know if I am wrong.

If the aim was to simply learn new tech, though, then I get it. I am just wary of ML being a hammer used on anything even remotely resembling a nail.


The author says ML wasn't used for solving Sudoku. Vision was only used for transducing the image of a sudoku puzzle to a puzzle structure in memory.


Oops, must have missed it. Thanks for the clarification.


It says they used a "traditional recursive algorithm", probably referring to the backtracking solution. In my experience it's fast enough to not matter for this sort of application (the other things that are going on are 1000x more complex).


There is a small fraction puzzles for which simple backtracking gets stuck on unproductive branches and essentially never finishes.


Do you have any examples? I'd love to improve the algorithm.

Someone on /r/programming pointed me here: http://apollon.issp.u-tokyo.ac.jp/~watanabe/sample/sudoku/in...

But the app seems to already handle those Ok without doing anything special: https://www.dropbox.com/s/arfd03kr8ieczk5/recursive-solver-k...


Did they really acquire 600,000 images for training data by scanning books by hand? How long did that take?


A couple of hours. I made a tool to do it automatically. Flip page, hold up phone, move it around to a few different angles, flip page, repeat.

I should note that’s 600k small squares so each full puzzle scan yields 81 small images.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: