Tesseract only does well if the input is clean, aligned, two tone image, already cut down -- but LPR images you get from a moving camera are anything but.
Machine learning is a fine tool for this job. The specific machine learning setup is an overkill, though.
Most of these problems are much easier to solve by simply picking a better camera. Moreover license plates are specifically made to be easier to read (ie reflective).
No they are not. Converting a semi-shaded black-on-yellow to clean black-and-white is not a matter of picking a better camera. Aligning it into am axis aligned rectangle form (from whatever angle you managed to capture them) is not a matter of picking a better camera. Tesseract is really bad unless your input is really clean.
Reflective helps you only if you actually shine light from the direction of the camera. which you can't practically do from within a car (the setup described in the post), and in most jurisdictions cannot legally do at all.
It's more like a "learnèd machine" approach. No need to train the model, but a pre-trained model may be (or not be, depending on circumstances!) more efficient than "hand-written" OCR approaches.
The solution described in the blog uses 12 Nvidia T4s, which are basically specialized RTX 2080s. Depending on what vCPUs etc. get used, it's running 1000-1500 W of computing power continuously.
If their car was electric, this project would be increasing the power consumption of the vehicle by around 10%.
To clarify, I mean indirectly the power consumption. I guess this will become an important point of discussion soon-ish, as we get more and more electric cars and ML stuff in them.
My view is, if your car uses electrons, and some of those are for compute, and you just offload the compute to the cloud, you haven't actually reduced the total electricity consumption.
Similarly here, the total "footprint" of the car is increased by adding this feature.
I'm the creator here. Yeah, 20 K80s is a bit excessive. That's because cortex (cortexlabs), which is the ML-model-deployment platform didn't initially have multiprocessing on each of their replicas - so I was bound to using just one CPU per GPU. AWS has instances with 4, 8, 16 vCPUs and so on.
Once cortex started supporting gunicorn (still yet unreleased but present on their master branch), I was able to reduce the number of GPUs significantly. Finally, the detector only needs 2 T4 GPUs and the identification part is the most expensive one (10 T4s GPUs) for a grand total of 12.
Converting the models to use single-precision could further reduce the need to about 1.5 GPUs for T4s or just 1.2 GPUs for V100 - which in both cases it would still mean using 2 GPUs.
Consider using a smaller, lighter-weight network (e.g. TinyYOLO) and object tracking (instead of running inferences on every camera frame) for faster throughput -- I imagine you should be able to get through with <1/4 of a V100 and still real-time for all practical cases.
You can also customize the network to your use case, e.g. you don't need YOLO's default 5 anchor box sizes if you know the thing you're detecting is a license plate.
Also, profile your code and see where your bottleneck is. If your bottleneck is at NMS for example there are things you can do to speed it up. I've seen a lot of cases where the neural network runs fast but there's a lot of Python bloat for pre/post-processing -- not sure about yours without seeing code.
You really should be able to run a license plate detector/reader on something a lot smaller than a V100. A Xavier or quite possibly even a Jetson Nano would very likely be good enough if you use it well.
I like all of your suggestions. I've been thinking about using TinyYOLOv3 as well. Provided the training set is considerably bigger than my own (I've created about ~550 samples and fine-tuned the model with them), you could end up with a very capable detection system that uses very few resources.
Object tracking is yet a very good idea. I will consider it. Anchor-box tuning is another very good idea.
Also, the CRAFT text detector that I'm using should IMHO be removed. Instead just use a very well trained text recognizer (like the CRNN I'm using). The text detector is expensive computationally since it's based on the VGG-16 model.
Then convert the models to use mixed-precision.
All in all, I think the performance improvements can be anywhere between 1 and 2 orders of magnitude.
I believe you may achieve the same result locally using https://github.com/openalpr/openalpr and cut your AWS and cell bills to exactly zero. It has Tesseract and OpenCV inside. Would love to see it as a part two of the article!
The obvious question is why not use a local accelerator?
Either the Neural Compute Stick or the Google Coral both have more than enough grunt to run real-time object detection models. Both will run on USB2 power. I don't know the overhead of good OCR, but license plates are a very standard format so perhaps you could train a second detector to extract the letters?
Even if you do OCR in the cloud, local bounding box extraction would save a huge amount of bandwidth.
Hey, founder of a relevant startup here. Just wanna chime in on OCR + bounding boxes performance. We offer text recognition with bounding boxes as a service. Our average processing duration, between reading bytes off the wire and writing the JSON response, is just under 3 seconds on average. Obviously that throws it out the window for frame-by-frame applications, but I think it’s still worth mentioning. The recognition is just as accurate as Google Cloud Vision —- it can handle human handwriting and even cursive, in most cases.
Probably yeah, but the potential of the cloud was much more appealing to me.
Detecting the license plates is really cheap computationally speaking, but not on the RPi. The most expensive part computationally was identifying the words (letters) - that's because detecting the text within the bounding boxes obtained from YOLOv3 is based on a VGG-16 model. Running that multiple times in a single frame (for multiple license plates) is expensive.
Surprisingly, the bandwidth was the least of my concerns. I was very surprised to see I didn't need much at all. For YOLOv3@ 416p and @30FPS I need about ~3Mbps. I wouldn't consider that much.
Now, this is a demo of what a production system could theoretically look like. I know it could be much better optimized.
3Mbps doesn't sound much, but that's constant bandwidth. I have a 15 minute commute, which would make each journey about 400MB of data. That adds up quite fast especially on a mobile contract.
I really do appreciate the effort and the quality of the writeup, but this approach is a very inefficient and round-about way of going about things.
You can do real-time number plate recognition on a Pi3/4 with a Coral TPU or Movidius 2. Both of which costs ~US80. It's probably got a lot to do with hammers and nails. Not everything needs to be 'web-centric' or 'cloud based'
Surprised by the negativity in the comments. This is an extremely impressive presentation of ability to put together current technologies and get the thing working front start to finish.
The reason for the negativity is that this demonstrates the "Design by StackOverflow" mentality where the solution is like swatting a fly with a sledgehammer and no real domain knowledge. Plus the author didn't even train the neural nets: it's just a LEGO project. I'd higher this person to be a lab intern, but nothing above that. The fact the author couldn't solve it locally and had to invoke the CLOUD is... laughable. This problem has been solved for over two decades on lesser hardware.
TL;DR: reinventing wheels is a good way to learn a lot.
For trying out something in a few hours, of course you don't want to spend hundreds of hours setting it up, by definition. Yes, the result is "just about works, but doesn't scale" - but that's the point of experimenting. Sure, this is a LEGO-style experiment in reinventing the wheel, but exactly for that, an excellent way to start learning about this problem domain: power consumption? Latency? ML basics? Sure. That's hacking at its core - even though the project is rudimentary.
> Sure, this is a LEGO-style experiment in reinventing the wheel
If I'm not mistaken Larry Page used to be praised some time ago for building a printer out of LEGO pieces (that may just be an urban legend, I admit I never verified this information).
Ok, I have a use case for this. If you could figure out how to get it off the cloud and low power, it would be awesome to have one of these on the gates to my house, so when I drive up the plates are automatically recognized and the gates open to my car.
Don't rely on garage openers for security, either: the door relies on a rather simple password scheme. In other words, treat it as a garden fence, not a blast door.
Better to use the tire pressure sensors and a software defined radio module to pick up the unique identifiers of your wheels. Way lower power than OCR.
This is nice from an academical standpoint since it can scale to track the cars, infer speed of the cars, track lanes, signs maybe experiment and try to do some predictions(...)
But, if the target is just to find plates and recognize the letters I've seen this being done with only a RPI.
(I am actually considering to do it just for fun :))
What are the privacy implications of this? For example, in the EU under GDPR, would you really be allowed to collect information on, for example, license plates, time and location? It seems to me that you could argue that license plates are "personally identifiable information" and collecting this type of data could, with or without intention, mean that you are collecting data on people's whereabouts.
Not every plate may idenfity a person (cars can be owned by companies and used by different employees and so on), but a significant number of cars is owned and driven by a single private owner. You can't easily discern if a license plate does or does not map to one individual, thus if you can't prove that the license plate does not map to a specific person, you should assume it does.
Thus the license plates can easily be argued they are PII.
Would be great if we could have built-in Outline support on HN, or some kind of bot commenting a link automatically when someone posts a story with a paywall. Just throwing an idea out there. :)
That's a great idea actually!
@dang are such bots allowed in HN? I can roll it out pretty easily with Monitoro (https://monitoro.xyz) and would be happy to cover the costs.
No, they've changed that. They probably received funding and are now desperate to earn that money back. There should be a site like Medium that wasn't tainted by Vulture Capitalists.
I use Firefox focus which has no session or cookies stored, and I never has a problem with medium articles. Do they save the read articles locally in normal browsers and block the user after a certain threshold?
The article works fine for me, and I do not have an account for that. (I don't like the formatting, but it is readable.) (I don't even see any mention of login.)
It’s just a question of time before people can hack their own similar facial recognition system. The new glasses from Bosch, with a projector straight on to the retina, is an obvious choice for displaying personal info about everybody in reach of the camera. With 5G, everybody has a reliable access to all the computing power needed.
It is easy to imagine someone with these glasses from Bosch, walking around in a bar. Personal information about other people would projected right to the retina.
Last I knew Romania was in EU, thus posting other people's personally identifiable information on internet should be covered by GDPR and thus be legally technically illegal.
Creator here. I get about 30FPS. The more compute power you throw in, the higher the framerate.
It's really buttery smooth if I disable the recognition part and just leave in the detection. Since it's demo project (something I just wanted to experiment with), my focus hasn't been on optimizing it. Lots of improvements could be brought to it.
Tesseract OCR can do this, using only the Raspberry Pi, at a "good enough" framerate for any real driving situation.