Convert to grayscale > find the tracking markers > rotation to turn it upright >...

lifthrasiir · on Sept 14, 2022

In fact, the distinct tracking markers are designed so that at any angle there is some line that contains a sequence of black-white segments in the 1:1:3:1:1 length ratio. Find three such markers, look at alignment and timing patterns at the expected place, and voila, you can infer a matrix of pixels ("modules" in a jargon). And masks are designed so that at least one mask will obscure naturally occuring patterns that look like tracking markers (the evaluation process even mentions the aforementioned 1:1:3:1:1 ratio and penalizes it).

lordnacho · on Sept 14, 2022

I would like more detail on this. I think for most people, once you know what the light and dark patches are, the rest follows (Reed-Solomon, etc). But how do you take an image that contains an imperfect QR and identify

a) There's enough of a QR there

b) The orientation is {coords}

c) The contents are {contents}

lifthrasiir · on Sept 14, 2022

As I alluded in a sibling comment, it might help to look at one-dimensional barcodes first, specifically the pervasive Universal Product Code (UPC). Maybe a sensor in a reader can only see a single pixel and a user moves it, or a reader itself has a row of sensors, but the input is always a scanline worth of pixels. The barcode starts with on-off-on-off pattern and ends with off-on-off-on pattern, so we first look at two 1:1:1:1 patterns. Since we know each segment of these patterns is a single unit ("module") in the barcode, we can extrapolate and infer where all other units would be. This can be never accurate---people rarely move their hand that accurately---but those patterns give a good deal of information about the barcode.

Finder patterns in the QR code can be recognized in the same way as long as your scanline actually hits those patterns. Since it's two dimensional you would need multiple scanlines to recognize all of them, but once you've got three finder patterns and they are not coplanar then there is a good chance that the alignment pattern in the fourth position; again, this can be recognized with a 1:1:1:1:1 pattern and you can make a good guess about its size. This gives the orientation [1], and timing patterns between two pairs of finder patterns finally give a transformation you need to apply to convert pixels into a matrix of modules. You read the version and format info (which has its own error correction codes), unapply a mask, read the actual data and ECC, correct any recoverable error and you are done. The very purpose of masking is to prevent those patterns accidentally occurring in the data section, and the standard defines a set of scoring criteria for masks; if 1:1:3:1:1 pattern occurs in the data after masking the mask is scored so low that it won't be chosen.

As you can imagine, while the concept is fairly simple the actual implementation might be complex. I should note that a large enough QR code can have multiple alignment patterns, because as your QR code gets larger it has a larger chance to be warped, and they help detecting where the warp has occurred. Also most barcodes including QR code have a concept of quiet zone that clearly separates a barcode from surrounding environment, mostly because otherwise you might be unable to recognize the beginning and end of the barcode (or in 2D barcodes, its bounding rectangle).

[1] Exercise: Now thinking about that, how can a UPC reader see whether the barcode is upside down or not?

sophacles · on Sept 14, 2022

I would pay to read a more in depth explanation from you (and pay more if there was an annoted code example) - your comment gave me a better understanding than I have managed in the past. In all seriousness, where can I pre-order a copy of "Reading QR code: a programmatic approach" by lifthrasiir?

lifthrasiir · on Sept 14, 2022

I probably know this only because back in time I wrote a QR code generator in JavaScript [1] which was also one of such libraries reviewed by Nayuki [2], and as I haven't actually implemented a decoder my knowledge stops there. Sorry :-)

[1] https://github.com/lifthrasiir/qr.js

[2] https://www.nayuki.io/page/qr-code-generator-library

sophacles · on Sept 14, 2022

I see, fair enough. I've used your library in the past (i think, it seems familiar anyway), so thanks for that, and also thanks for the insight that writing my own toy qr generator might help me understand the reading process. :)

ajhurliman · on Sept 14, 2022

You first need to figure out the orientation first, no? Or do they just throw down several lines until they find the 1:1:3:1:1 ratio?

Either way, incredible design, it's mind-blowing to learn more about the elegance and level of sophistication.

lifthrasiir · on Sept 14, 2022

> Or do they just throw down several lines until they find the 1:1:3:1:1 ratio?

Essentially yes. If you are aware of how one-dimensional barcodes are read, it's the same. You would need to check multiple scanlines, but you don't need the full image analysis. It might be the case that modern QR decoders actually look at the full image, but this is hardly necessary.

sophacles · on Sept 14, 2022

Ok i understand:

* how to convert to grayscale

* how to rotate

The rest of this makes abstract sense, but not any sense wrt "i can write code to do it". As another poster said - its "draw the rest of the owl". It would be nice if someone knows about a nice annotated bit of code that starts from picture and ends at "return data;".

ajhurliman · on Sept 14, 2022

Here's a link that shows how to normalize (python/ openCV is great for image processing): https://www.delftstack.com/howto/python/opencv-normalize/

If you want to dive into computational photography/ computer vision some more, here's a great class for it: https://www.udacity.com/course/introduction-to-computer-visi...

sophacles · on Sept 14, 2022

Oh cool - thanks!

guhidalg · on Sept 14, 2022

It really is draw the rest of the owl, here's a good place to start https://en.wikipedia.org/wiki/Feature_(computer_vision)#Dete...

In computer vision there is no magic bullet, you will have to sink deep into the details.

sophacles · on Sept 14, 2022

Here's an idea - instead of assuming I'm being lazy and looking for a magic bullet why don't you consider this: I think this QR code thing might be a great example of a tractable, relatively small problem that lets me sink my teeth into the computer vision stuff that usually is presented in a very abstract way.

In order to do that efficiently, a nice annotated codebase would help a lot. So back to the question:

Do you know of any such codebases?

trevorhlynn · on Sept 14, 2022

This paper (https://link.springer.com/chapter/10.1007/978-3-030-57058-3_...) might help and is available on github (https://github.com/nimiq/qr-scanner).

guhidalg · on Sept 14, 2022

It's not a small problem, I'm just warning you. Check the chromium source for their shape detection API: https://source.chromium.org/chromium/chromium/src/+/main:ser...

dublin · on Sept 14, 2022

Sure you can write your own QR and other bar code processing code, but it makes as much sense as rolling your own crypto, and for many of the same reasons.