How to do OCR on a Mac using the CLI or just Python

zavertnik · on Jan 2, 2024

Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database. I tried other OCR CPU methods (since macOS and Nvidia still don't play nice together) such as Tesseract but found the output to be incorrect too often. The vision framework was not only the highest quality output I had seen, but it also used the least amount of compute. It was fairly unstable, but I can chalk that up to user error w/ my implementation.

I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.

I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.

[1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...

[2]: https://github.com/straussmaximilian/ocrmac

RockRobotRock · on Jan 2, 2024

It's better than Tesseract? That's really impressive.

Could you run a farm of macOS machines and turn this into an API for profit? Would that be legal?

wcedmisten · on Jan 3, 2024

You could run a farm of iphones to OCR memes if you felt so inclined

https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...

oarsinsync · on Jan 3, 2024

That blog post is glorious. Thanks for sharing.

lelandfe · on Jan 2, 2024

In my experience using it constantly, it is far beyond Tesseract’s.

I have never gotten truly garbled output from Apple’s, whereas Tesseract will frequently produce random Unicode characters from text.

Apple’s also handles things like overlapping text or changing font sizes and typefaces far better than any open-source OCR I’ve used.

laborcontract · on Jan 3, 2024

IMO it goes head to head with the anazon/google cloud OCR services. It’s works superbly.

bufo · on Jan 2, 2024

Way, way better than Tesseract!

gvkhna · on Jan 2, 2024

Yes, as long as you pay for the mac hardware it’s yours to do with as you please. I’m not an attorney and this is not legal advice.

xp84 · on Jan 6, 2024

Notably, Apple seems to attach some very unfriendly restrictions to some of the built-in stuff, such as the voices. You can't use those commercially, it appeared to me when I researched it.

kkielhofner · on Jan 3, 2024

Tesseract alone is widely known to be "meh" at this point.

If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

There are also various approaches to use these in conjunction but that gets involved.

[0] - https://github.com/Unstructured-IO/unstructured-inference

[1] - https://github.com/mindee/doctr

[2] - https://github.com/mindee/doctr#models-architectures

[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

fancy_pantser · on Jan 3, 2024

https://github.com/mindee/doctr/issues/1049

https://github.com/JaidedAI/EasyOCR#whats-coming-next

Happy to see OCR is advancing lately, but I really need HWR.

I am looking for something this polished and reliable for handwriting, does anyone have any pointers? I want to integrate it in a workflow with my eink tablet I take notes on. A few years ago, I tried various models, but they performed poorly (around 80% accuracy) on my handwriting, which I can read almost 90% of the time.

Someone · on Jan 3, 2024

Reading https://heartbeat.comet.ml/comparing-apples-and-google-s-on-... (2017), I expect this code to work for handwritten text.

How well it works on your handwriting is for you to test, but if you, having all kinds of contextual information, can’t read it well, I guess it won’t, either.

riveducha · on Jan 3, 2024

This is maybe not a solution, but how does ChatGPT do on your handwriting if you upload a photo? If that works well then maybe you can use the API?

animal_spirits · on Jan 3, 2024

AWS Textract is by far the best OCR engine we've used, it does great with handwritten text

mcbetz · on Jan 3, 2024

I found this detailed comparison of OCRs (both open source and cloud services) super helpful: https://source.opennews.org/articles/our-search-best-ocr-too...

docTR comes out as strongest open solution.

haolez · on Jan 3, 2024

Looks nice! Do you know if they can do table structuring as well? Something similar to what Amazon Textract does[0].

[0]https://docs.aws.amazon.com/textract/latest/dg/how-it-works-...

beembeem · on Jan 3, 2024

I have found Tesseract to be both better than I expect (it feels great when it works most of the time) and worse than I expect (not quite enough correct data to fully rely on).

mdani · on Jan 3, 2024

Does anyone know what languages Apple supports? The docs don't have a list. Tesseract might be "meh" but it is probably the best open source option available for devnagari scripts or Persian, for example.

lelandfe · on Jan 3, 2024

I've used it on a number of Cyrillic languages (Russian, Bulgarian, etc), Hungarian, Turkish, along with the typical ones (Spanish, German, French, Italian, Portuguese). I've heard it supports Chinese. I just tried Persian and devnagari samples on my Mac and it could not do either.

sumedh · on Jan 3, 2024

Is there a tutorial on how to extract table from pdf or image for Apple Vision Framework. I tried the two links in your post and it just extracts the text without maintaining the table structure.

AWS textract provides sample python code to extract tables into csv which works great.

dkjaudyeqooe · on Jan 3, 2024

The best way I've found for extracting tables from PDFs in a well formatted way is Adobe's free online service:

https://www.adobe.com/acrobat/online/pdf-to-excel.html

mcbetz · on Jan 3, 2024

I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

sumedh · on Jan 3, 2024

Thanks will check it out.

Have you compared it with Textract?

BoppreH · on Jan 2, 2024

I tried doing something similar on Windows, and realized that PowerToys[1], a Microsoft project I already had installed, actually contains a very good OCR tool[2]. Just press Win+Shift+T and select the area to scan, and the text will be copied to the clipboard.

[1] https://learn.microsoft.com/en-us/windows/powertoys/

[2] https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

mywacaday · on Jan 2, 2024

I use autohotkey + powertoys to append screenshot data to a CSV, works great with it's own key mapping

melonamin · on Jan 3, 2024

I've built an opensource tool that gives you both CLI and a nice UI. It is free.

https://trex.ameba.co

chanandler_bong · on Jan 3, 2024

+1000 for Trex!! I use it daily, thank you for creating it!

I am impressed how it handles handwriting and crappy screen grabs.

082349872349872 · on Jan 3, 2024

It's not so well known that one of the original rationales for "offside rule" programming languages is that it works just as easily for handwritten code as it does for typed.

Will we ever have programming languages that are primarily designed to take input from whiteboard grabs? (ie where not only handwriting, but also placement, connectivity, and maybe shape are meaningful?)

hintymad · on Jan 2, 2024

I did notice that many Mac apps, including Safari and Preview and Notes, do OCR on images automatically. It's pretty neat that I can easily select text in an image and copy and paste it somewhere else.

sen · on Jan 3, 2024

It’s kinda ridiculous how good it is, you can even select text from inside a YouTube video while it’s playing (or pause if needed).

Also if it’s text of a URL/domain or a QR code (eg in a photo of a poster, or in a video) you can hold-press/hold-click to open the link directly from the image.

H4rryp0tt3r · on Jan 3, 2024

Thanks for sharing this! I had no clue about it.

lostlogin · on Jan 3, 2024

The photos apps too. It’s just so good at conferences or when you need a long string digitised (iso default router password!). Photo > select > copy > then paste on phone or Mac (via that actually awesome handoff feature).

tough · on Jan 2, 2024

I'm a huge fan of this little ocr tool isntalled through brew onto my macbook https://github.com/schappim/macOCR

nemosaltat · on Jan 2, 2024

Same, and for my purposes, I just wrap that utility in a macOS Shortcut I can click from my menu bar, or launch from Quicksilver.

bogeholm · on Jan 2, 2024

Quicksilver, now there’s a blast from the past! I don’t think I’ve installed it on any Mac in the past 5 years, but I used to love it.

What are the advantages over native macOS shortcuts these days?

schappim · on Jan 3, 2024

Great to hear! Shottr also has nice OCR these days.

schappim · on Jan 3, 2024

Awesome to hear!

novagameco · on Jan 2, 2024

On Windows I recommend text extractor from powertoys:

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

HelloImSteven · on Jan 2, 2024

I'll throw my solution into the mix: https://skaplanofficial.github.io/PyXA/tutorial/images.html#...

PyXA uses the Vision framework to extract text from one or more images at a time. It's only a small part of the package, so it might be overkill for a one-off operation, but it's an option.

wahnfrieden · on Jan 2, 2024

fyi you're using the old and less accurate api, VNRecognizeTextRequest

ImageAnalyzer is newer and much better

I bet this shortcut from OP is also using the older API under the hood

HelloImSteven · on Jan 2, 2024

ImageAnalyzer is Swift-only and has no corresponding Objective-C method, so it's not available in PyObJC. I can look into bridging it at some point.

gvkhna · on Jan 2, 2024

This would probably be pretty easy to do with swift and python processes running side by side with grpc.

andreasley · on Jan 2, 2024

macOS Ventura and newer actually have basic OCR functionality integrated into the Image Capture UI. When using an AirPrint-compatible scanner and scanning to PDF, the checkbox "OCR" is shown in the right pane.

gist · on Jan 2, 2024

To place contents in a file (not claiming this is the most efficient way but it works)

OCRTHISFILE="ocr-test.jpg"

shortcuts run ocr-text -i "${OCRTHISFILE}"

pbpaste > ${OCRTHISFILE}.txt

or to view output and place in file:

OCRTHISFILE="ocr-test.jpg"

shortcuts run ocr-text -i "${OCRTHISFILE}"

pbpaste | tee ${OCRTHISFILE}.txt

msxbel · on Jan 2, 2024

Or use MacOS shortcuts to output ocr text as file (Action: "Append to Text File")

gist · on Jan 3, 2024

Yes took a bit of fiddling but that does work thanks.

justinl33 · on Jan 2, 2024

Awesome! Is there a similar technique for the Apple vision ‘Copy Subject’ feature? I’ve become extremely reliant on it, but it feels very limited in access.

pimlottc · on Jan 2, 2024

I had to Google this, do you mean the feature in Photos on mobile where you can "extract" items from a picture and make them into stickers? Apple seems to call it "lifting subjects" [0] [1].

0: https://support.apple.com/guide/iphone/lift-a-subject-from-t...

1: https://developer.apple.com/videos/play/wwdc2023/10176/

EDIT: Try replacing the "Extract text" action with "Remove background". When running the shortcut, use "-o" to specify output image filename.

   shortcuts run remove-background -i ~/Downloads/portrait-beard.avif -o beard.jpg

TimeBearingDown · on Jan 2, 2024

Very cool, and seems handy!

I’ve always had good results from the Preview.app. I wonder how this engine compares for number of errors in a difficult source versus Free alternatives.

smrtinsert · on Jan 2, 2024

Yeah preview app is everything. I take screenshots now for deliverables.

est · on Jan 3, 2024

Speaking the need of OCRs, I found a comment relevant and quite funny

> we already have a common, portable data format for social media. It's screenshots of tweets

https://news.ycombinator.com/item?id=38841569

pugio · on Jan 3, 2024

I would really love an `ocrmypdf` like tool which uses Apple Vision to create searchable PDFs from scanned images. I've been searching every week or so for some kind of project but so far haven't found anything. Perhaps it's time to make it myself...

gregsadetsky · on Jan 3, 2024

that sounds bonkers useful!! you should definitely prototype the smallest version possible and publish it (and post it here as a Show HN!)

I know that I'd definitely use it!

dotsam · on Jan 3, 2024

I have played around with the OCR on my mac, and have been very impressed. It has been consistently better than tesseract for my purposes.

However, when creating a PDF from images using Preview and exporting using ‘Embed Text’ option to OCR, I have noticed the text is worse than if you OCR the exact same images using the shortcut above or using a script. Presumably Preview is using the Vision framework’s less accurate fast path when preparing the PDF. https://developer.apple.com/documentation/vision/recognizing...

srott · on Jan 2, 2024

you can use clipboard with pbpaste/pbcopy commands

ocr-text "$1" && pbpaste

llimllib · on Jan 2, 2024

It also outputs to the command line if you pipe it to cat

    shortcuts run ocr-text -i new-haven-pizza.jpg | cat

philsnow · on Jan 3, 2024

Oddly enough if you enable it as a "quick action", when you run it, Finder creates a file in the same directory as the image containing the OCRed text (and named according to the first line of OCRed text).

I went back into my shortcut and Shortcuts added a pseudo-action "Stop and output <copy to clipboard>; if there's nowhere to output: <Do Nothing>", and I would think that "Do Nothing" would mean don't create a file, but I guess Quick Actions has some kind of special meaning given that all the other ones seem to be intransitive actions, implying that the user wants a file as the output.

eigenvalue · on Jan 2, 2024

Weird, I couldn't get it to work on a bunch of different files, even using very simple file names. Kept getting this error:

Error: The operation couldn’t be completed. (WFBackgroundShortcutRunnerErrorDomain error 1.)

Oras · on Jan 2, 2024

I suppose you haven't renamed the new shortcut to `ocr-text`

eigenvalue · on Jan 3, 2024

I did do that.

stephenr · on Jan 3, 2024

The article was posted.. yesterday, and the entire reason given for not using the builtin Shortcuts sharing feature is... an article from 2 years ago, about a bug in the shortcuts hosting service, which has obviously been fixed.

I get that some people will want to create it from scratch themselves or incorporate the actual meat of it into a larger shortcut... but not sharing one that does what the article says, because of a bug 2 years ago, is a bit of a weird take.

gregsadetsky · on Jan 3, 2024

sorry, that link may have been a cheap shot... but I did try to export the shortcut I created, and kept getting an error about not being signed in to icloud...! and I am signed in to icloud. it's just so confusing.

why can't shortcuts be exported as ... shortcut files?

it's not ideal to have people recreate the shortcut step by step (which is what I ended up describing in my post) but... I couldn't find a better way..! :-)

if you'd be able to recreate the shortcut and share it, and post the link here (and/or email it to me), I'd love to place that in the blog article! thank you

stephenr · on Jan 3, 2024

It seemed to work on iOS (https://www.icloud.com/shortcuts/cd7d2c5e63d8482ab0618e163bb...)

I'll try it again on macOS when I'm back at my desk.

Edit: also works on macOS Sonoma (https://www.icloud.com/shortcuts/6216aa9072144846adcaae69a5a...) - this one has all input sources selected, the iOS created one has only images/media/pdfs/files/rich text selected for input.

schappim · on Jan 3, 2024

If you want to do this a lot easier use: https://github.com/schappim/macOCR

elpakal · on Jan 2, 2024

I don't know why but instead of pasting the text it copied to make sure it worked, I made it read it:

shortcuts run ocr-text -i <A PATH TO SOME IMAGE> | say -v Fred

djhn · on Jan 3, 2024

Does anyone know of a straightforward library or setup to scan newspapers and/or magazines and detect and extract images and advertisements?

mushufasa · on Jan 2, 2024

Very cool. Anyone know how this compares to AWS Textract in general? Does the Apple Vision framework support table recognition?

llimllib · on Jan 2, 2024

It looks like it does, but you need to handle it at a pretty low level, this shortcut won't get you there: https://developer.apple.com/videos/play/wwdc2019/234?time=19...

jmz1 · on Jan 3, 2024

Raycast (macOS only) is also nice as it's able to search images by text. It also allows you to copy text from those images. Quick official demo here: https://www.youtube.com/watch?v=c96IXGOo6E4

ggm · on Jan 3, 2024

How to interact with built in OCR via the cli? "Doing" something is (to me) which ocr tooling, what fonts it recognises, all the associated package management and tuning not "how I configure the gui and ui to let me use the tool they shipped with the os"

sigoden · on Jan 3, 2024

use LLMs (gpt-4-vision or LLaVA) with aichat

`aichat -f tmp/test.png -- output only text in the image`

https://github.com/sigoden/aichat

b__d · on Jan 5, 2024

The way I do this: It's built right into the macOS Screenshot app:

- Press CMD+SHIFT+4

- Draw square on screen where you want to extract the text from

- (Quickly) click on the preview image in the lower right corner

- Copy text from image

krudnicki · on Jan 3, 2024

I made a Shortcut + PHP to get text from a screenshot, ask ChatGPT to make a task name from text, and create new task in Clickup and attache a screenshot. Use it often.

rikafurude21 · on Jan 2, 2024

Are ios and macos shortcuts crosscompatible? I didnt know there was shortcuts for the mac, seems pretty powerful to be able to run them from the terminal too. Thanks OP

diegof79 · on Jan 2, 2024

Yes they are compatible as long you use actions available on both platforms. For example, you can use AppleScript or shell in macOS but it will not work on iOS. However, if you use cross platform apps shortcuts it works even when you write files into the iCloud folder. For example, I did a shortcut that takes today’s events from the Calendar and appends the list into a Markdown file in a Obsidian vault on iCloud. I use it to scaffold meeting notes, and it works on my phone too.

predictsoft · on Jan 2, 2024

On Windows, A9T9 does a great job of OCR'ing scanned JPEG files (and any JPEG file). It's also free.

I scanned about 100 A4 documents in just a couple of minutes.

minimaxir · on Jan 2, 2024

Surprisingly, the Extract Text from Image action is available on Intel Macs: normally, features like automatic-image-OCR is limited to Apple Silicon Macs.

stephenr · on Jan 3, 2024

It's almost as if the constant clucking about "planned obsolescence" and deliberately withholding features is a load of bollocks.

systemtrigger · on Jan 3, 2024

This works great for local files. I can't seem to modify the shortcut correctly for an image hosted at a public URL.

loevborg · on Jan 3, 2024

CleanShot X (which is great) also allows you to OCR from your screen ("Capture Text")

gvkhna · on Jan 2, 2024

Is there any benchmarks on speed/compute/accuracy anywhere comparing to tesseract v5?

cyberax · on Jan 2, 2024

It doesn't work for Chinese characters :(

CodeNest · on Jan 2, 2024

Python is quite basic and might not be very helpful for advanced users. It seems overly detailed for such a simple task.

geniium · on Jan 2, 2024

Have u guy tried ChatGpt or other alternative?