I use the same script as Dibby053, copied from stackoverflow but with some tweaks to work on kde,gnome and wayland as well as x11 and with some notifications on what state it is in.
I didn't test the x11/wayland check yet, but feel free to use it and report back.
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -rf "$1"
}
SCR_IMG=$(mktemp) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -r -o "$SCR_IMG.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG.png" "$SCR_IMG" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
else
xsel -b -i < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
fi
notify-send "Text extracted"
exit
I slightly modified your script to:
1. Clean up properly
2. Run spectacle in BG mode, so the window does not pop up after screenshotting.
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -r "$1"
}
SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
#notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
else
xsel -b -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
fi
notify-send "Text extracted"
exit
Also made some minor modifications: replaced `xsel` with `xclip` and added truncated version of the copied text to the `notify-send`:
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -r "$1"
}
SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
#notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -n -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
else
# xsel -b -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
xclip -selection clipboard -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
fi
# Notify the user what was copied but truncate the text to 100 characters
notify-send "Text extracted from image" "$(head -c 100 "$SCR_IMG/scr.txt")" || die "failed to send notification"
exit
I just frankenstein'd a few people's versions into my own MATE-based flavor.
For anyone running into barriers, mate-screenshot has no outfile `-f` option, so I worked around that by outputting through clipboard and capturing that with `xclip` (note, this is earlier in the script than the the xsel/xclip line in the parent and gp comments):
mate-screenshot -a -c && xclip -selection clipboard -t image/png -o > "$SCR_IMG/scr.png" || die "failed to take screenshot"
The other hiccup is that the dumped text file has two extraneous bytes '\x0a\x0c', so I truncated them with `head`:
(xclip -selection clipboard -i < <(head -c -2 "$SCR_IMG/scr.txt")) || die "failed to copy text to clipboard"
Might not be pretty, but it looks like this will work for me. Thank you all for this!
If you just put `set -o errexit -o pipefail -o nounset` in the first line after the shebang your script will have proper error-handling as well. Currently if any fails, notify-send will still be triggered.
This version looks nice and short, any thoughts on prober error reporting to the end user?
My version has more feedback for the user which was important because the user was somebody not familiar with linux/bash, but even my version "swallows" errors.
I added the `set pipefile...` suggested below, but I think mogrify only fails if the screenshot fails. Tesseract never fails if there is a valid input image, so realistically you only need one error message for the screenshot generation, unless you want to check whether the user misses any of the tools.
In the spirit of sharing, cuz I think this is a great script (thank you), I prefer using maim over scrot simply because it has a --nodrag option. Personally feels better when making selections from a trackpad. Click once, move cursor, click again.
I was using something like this for awhile, but I found tesseract did poorly quite often. That resize trick didn't seem to affect much. I'm not sure what pre-processing would make it better.
I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.
Having used Tesseract for OCR for other things, getting the right PSM helps but it's still rather terrible, especially for sans-serif fonts, which are common in UIs.
Granted there's a lot of ambiguity in sans serif fonts, lower-case "L", vertical bar, and upper-case "i" can even be pixel-identical, but I've seen tesseract turn
Chapter III
into
Chapter |l1
which really surprises me. In fact, for books, I run it through sed to replace vertical bar with upper-case "i" and it significantly improved recognition.
For my fellow Windows-using plebians, the official Microsoft PowerToys add-in [0] has a feature that does this (it's also been added to the stock screenshot tool, but I personally find the one keyboard shortcut in PowerToys more pleasant to use).
Snipping tool build in OCR works for multiple languages (English, Russian, Chinese, Japanese etc.) without the need to install any language OCR packs though
It's bugging me for a long time now. Is tesseract actually the state-of-the-art solution here?
I just really don't know, it feels like it's, uh, subpar. Isn't it? I never seriously worked in that domain, but it somehow felt to me in the 2019, that with all recent advancements in computer vision, text recognition must be essentially a solved problem. I'd expect it to be better than human. Yet I still cannot accurately convert a low-res scan (scan! not even a photo!) of a receipt with tesseract, especially if it isn't in English. Maybe I just cannot properly use it?
I use Tesseract semi-regularly and only rarely have recognition issues, including with receipt scans (or even photos). How are you specifically using it?
Myself I tried it probably 10-15 years ago on scanned scientific papers (decent scanning quality). The results were disappointing. The manual postprocessing required was not much less than typing it directly. So tesseract became a synonym of "not worth trying" to me.
Maybe things have improved over the years, so I should give it a new try. (No particular use case at the moment, but those tend to appear occasionally.)
It’s good now _if_ you OCR only scanned documents or otherwise have a lot of control over how you prepare the images before it’s OCR’ed. For more general purpose recognition with weird fonts and bad image quality EasyOCR gave me much better results
It's way better now. I used it 15 years ago and had to do quite a bit of preprocessing to get not-entirely-terrible results, but now I use it with great success and no preprocessing.
Being a Flatpak app, it will require desktop portals to fully work. That said, it worked absolutely fine out of the box for me with my existing xdg-desktop-portal-wlr setup. So, it should work fine in any X11 or Wayland setup where you have an xdg-desktop-portal setup that supports the Screenshot API.
The results are mixed, but not bad by any means. Cleanly readable text comes out mostly fine with maybe only whitespace issues and the occasional error, which makes this still potentially very useful for copying text out of error dialogs and whatnot. (Though, I've found that on Linux, error dialogs are far more likely to have selectable text in the first place. And on Windows, standard MessageBox responds to Ctrl+C.)
There is a utility available for macOS that extends beyond simply opening a document in Preview and attempting to select the text: https://github.com/schappim/macOCR
FWIW one can skip Preview and just do Cmd-Shift-3, click the thumbnail, and interact with the text in the quicklook. Then, delete the image (trashcan in top right). Cmd-A works, too. Here's me using it on that comment: https://imgur.com/a/q0NvcS6
I got a similar solution on iOS as a shortcut connected to the action button. Some apps doesn’t allow easy text copy. Or when it’s in a foreign language. It does:
- take screenshot
- extract text from it
- translate the text to english. Auto detects source language
- show both original and translated text in quick view where you can select and copy if desired.
This is fantastic thanks for sharing.
I have used it in the share sheet when tapping share on an image and it works but given I am already providing the image, the screenshot is redundant.
Caveat: This is a Flatpak and not all Linux distros ship with Flatpak. But I'll give it a whirl in my Fedora virtual machine. I've seen many flavors of this type of tool floating about, most of them leveraging Tesseract[0], and I've tried a few of them. It fails badly on grainy / noisy images or where the text is warped or skewed in some way. It will not solve CAPTCHAs for you!
...it's a pile of Vala code. What you probably mean is that the author did not make a package for your distribution, and there is no one else who had the time and inspiration to package it. You can be the maintainer you seek.
Surprises me to see I'm the first comment here to say: I just use GPT4 for this. Works perfectly, even for getting the Latex source of a formula you only have a screenshot of.
Probably quite the overkill in terms of energy efficiency for just image to text, but I only need this like once every two weeks or so.
Why is this posted now? The repo has seen no activity in the past two years, and the https certificate on the website is also obsolete since 2022 so I'm not sure it is still alive…
Sure, but at the same time if it's not included on distributions and not updated by upstream it's likely to have compatibility issues relatively quickly (GTK is particularly bad at maintaining compatibility between versions, even point releases).
Also code not being updated is something, TLS certificate not being renewed is in an other league in terms of lack of support of the project.
For macOS users, I'm the author of Textinator [0] a similar utility for macOS that uses Apple's Vision framework [1] for doing the OCR natively. Modern versions of macOS (since Sonoma) have a similar ability to copy text from images using the Live Text feature [2] but Textinator works on macOS Catalina and later and simplifies the "take screenshot, copy text to clipboard" workflow. It's also an example of how to build a native macOS app entirely in Python.
Is there anything that could handle indentation? I use very similar tool on Linux already (also available on Windows and Mac): https://dynobo.github.io/normcap/
thats pretty cool but definitely has a ways to go (in the example on github page even shows a few discrepancies between original and pasted text - seems to be mostly punctuation though)
I didn't see anything of the sort looking through the source code. I see it uses portals (or scrot) to take a screenshot, and spawns Tesseract as an external process.
How is OCR these days? Lately I'm seeing more deeplearning-based OCR, and it gives you significantly different results just by cropping the image differently.
Preview is one of the most underrated apps in macOS, and the one I miss the most when I use Linux or Windows. It's a great little toolbox for quick editing and convenience features.
I didn't realize it was underrated, but it's probably the best MacOS bundled software. If I could get a Linux equivalent, that would be fantastic. Viewing, editing, PDF filling, PDF signing (so useful), all in a fast and responsive tool is just incredible.
If anyone has anything near a Linux equivalent, please let me know.
If you're on a Pixel 7 and upwards or the latest Samsung phones there's also circle to search by holding the home button down. The OCR works quite well including English, Russian, Arabic, Japanese and I'm sure it works on other languages too.
If you're on Android 14 you can also copy text through the recents/overview menu simply by highlighting the text. And finally there's Google Photos if you don't have any of these features.
There's also Google lens if you're trying to copy text that isn't on your screen.
Another neat thing is when you copy the text on your iOS device, it appears in your clipboard on Mac, so you can just paste it. (Assuming both devices are on the same wifi/local network.
Just three hours ago I switched back to Linux after a few years on MacOS. The only thing missing was the amazing text copy tool I was using, "Rex" [1]. What a coincidence to see this post on the front page a few hours later!
Side note, what a breath of fresh air Gnome on Fedora is!
It's obviously personal opinion, but I think you made the best choice! (Gnome on Fedora). Welcome back!
It's remarkable how much more polished Gnome is from a few years ago. If you use 2FA TOTP, make sure to install Gnome Authenticator if you haven't already. If you use Aegis on Android (or a handful of other formats) it can import/export your seeds. It is downright luxurious having this on my laptop/desktop:
# If you haven't setup flathub yet
sudo flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
# Install Authenticator from flathub. Source: https://gitlab.gnome.org/World/Authenticator
sudo flatpak install flathub com.belmoussaoui.Authenticator
I'm running Monterey and have the feature, but it's only inside Preview, which means I need to either open the image in Preview or take a screenshot and then open a new image in Preview to paste the screenshot, before getting the text
Check out Textinator [0] which is an open source macOS app that watches for screenshots and automatically does text detection then copies text to clipboard. (Disclaimer: I'm the author). It works on macOS 10.15+
I didn't test the x11/wayland check yet, but feel free to use it and report back.
edit:Formatting