Hacker News new | past | comments | ask | show | jobs | submit login

> "We are seeing better than human-level performance in some visual tasks," in particular, for the problem of extracting house numbers in photos taken by Google's Street View cars

Can someone explain to me how this is news, given that the handwritten addresses on snail mail envelopes in the US have been OCR'd by neural networks for more than twenty years now?




The USPS address recognition technology (at least as of 2 years ago when I was working on it) is not human-level. A fraction of the images cannot be resolved and are still sent to human keyers at rec sites. This fraction has been decreasing steadily over the years but it has not yet fallen to zero.

It's important to remember that performance is critical when talking about machine perception. OCR, handwriting recognition, face recognition, etc. can all be done but at what level of accuracy ? At least until very recently machine performance on these tasks has fallen well short of human level abilities.


Handwritten addresses aren't fully OCRed for the system to work. I worked on the first systems that were released (in the 90s), and the basic algorithm was as follows: first, try to read all the numbers in the address, and identify the ZIP and the street number. Now, given the ZIP and the street number, the number of possible street names is very small (on average, 4 or 5); this is done via the USPS's address database. Now the problem becomes one of matching the handwritten street name with one of these 4-5 names. (Of course, there's more to it, but this is the gist of it).

Last I heard, the percentage of handwritten mail successfully sorted by a machine had reached in the low 80s. The Russian company Parasoft (who also worked on Newton's online handwriting recognition) has been the leader in this field.


The order of complexity difference between recognising figures on plain paper perpendicular to a scanning device in a controlled environment, and doing the same thing on huge amounts of non-standard chaotic data is why.


Virtually all house numbers are either painted from a stencil or composed of mass-produced shapes on a background of uniform color, whereas addresses on envelopes are handwritten by doctors, six-year-olds and people with Parkinson's disease. I'm not convinced it's a harder problem.


What don't you get? One is on a white background. One is in random orientations, placed in complex scenes, with random fonts, positions, numbers, sizes, shapes and locations and you don't even know where they are.

It's like a game of "Where's Waldo" on freaking crack.

You have literally no idea how complex this stuff is now then do you?


Compare the addresses at http://www.realsimple.com/home-organizing/decorating/eye-cat... and http://mandydouglass.blogspot.com/2010/10/addressing-envelop... . Those are two representative images I picked from the first google hits for "house number" and "handwritten address" respectively - all the others were comparable. Are you seriously going to claim that the house number is harder to recognize than the handwriting?

I do have an idea (literally, even) that there are additional problems having to do with extracting the house number images themselves from full-motion video, but that's an image registration problem and not an object recognition problem.


Those aren't the pictures Google works off - cf. maps.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: