Hacker News new | past | comments | ask | show | jobs | submit login
TextSnatcher: Copy text from images, for the Linux Desktop (github.com/rajsolai)
418 points by nateb2022 9 months ago | hide | past | favorite | 102 comments



I use the same script as Dibby053, copied from stackoverflow but with some tweaks to work on kde,gnome and wayland as well as x11 and with some notifications on what state it is in.

I didn't test the x11/wayland check yet, but feel free to use it and report back.

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard

  die(){
  notify-send "$1"
  exit 1
  }
  cleanup(){
  [[ -n $1 ]] &&  rm -rf "$1"
  }

  SCR_IMG=$(mktemp)  || die "failed to take screenshot"

  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT

  notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -r -o "$SCR_IMG.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG.png" || die "failed to take screenshot"
  fi

  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG.png"  || die "failed to convert image"
  #should increase detection rate

  tesseract "$SCR_IMG.png" "$SCR_IMG" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
  wl-copy < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
  else
  xsel -b -i  < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
  fi
  notify-send "Text extracted"
  exit

edit:

Formatting


I slightly modified your script to: 1. Clean up properly 2. Run spectacle in BG mode, so the window does not pop up after screenshotting.

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard
  
  die(){
    notify-send "$1"
    exit 1
  }
  cleanup(){
    [[ -n $1 ]] && rm -r "$1"
  }
  
  SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
  
  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT
  
  #notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
  fi
  
  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png"  || die "failed to convert image"
  #should increase detection rate
  
  tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
    wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  else
    xsel -b -i  < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  fi
  notify-send "Text extracted"
  exit


This is great!

Also made some minor modifications: replaced `xsel` with `xclip` and added truncated version of the copied text to the `notify-send`:

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard

  die(){
    notify-send "$1"
    exit 1
  }
  cleanup(){
    [[ -n $1 ]] && rm -r "$1"
  }

  SCR_IMG=$(mktemp -d) || die "failed to take screenshot"

  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT

  #notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -n -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
  fi

  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png"  || die "failed to convert image"
  #should increase detection rate

  tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
    wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  else
    # xsel -b -i  < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
    xclip -selection clipboard -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"  
  fi
  # Notify the user what was copied but truncate the text to 100 characters
  notify-send "Text extracted from image" "$(head -c 100 "$SCR_IMG/scr.txt")" || die "failed to send notification"
  exit


I just frankenstein'd a few people's versions into my own MATE-based flavor.

For anyone running into barriers, mate-screenshot has no outfile `-f` option, so I worked around that by outputting through clipboard and capturing that with `xclip` (note, this is earlier in the script than the the xsel/xclip line in the parent and gp comments):

  mate-screenshot -a -c && xclip -selection clipboard -t image/png -o > "$SCR_IMG/scr.png" || die "failed to take screenshot"
The other hiccup is that the dumped text file has two extraneous bytes '\x0a\x0c', so I truncated them with `head`:

  (xclip -selection clipboard -i < <(head -c -2 "$SCR_IMG/scr.txt")) || die "failed to copy text to clipboard"
Might not be pretty, but it looks like this will work for me. Thank you all for this!


Good catch with spectacle, I thought I fixed that already.

Why did you remove the -f parameter?


I like all the error handling, but you could skip the temp files if you just pipe it through

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | fuzzel -d "$@")
    grim -g "$(slurp)" - | mogrify -modulate 100,0 -resize 400% png:- | tesseract -l eng+${lang} - - | wl-copy
    notify-send "Text extracted"


If you just put `set -o errexit -o pipefail -o nounset` in the first line after the shebang your script will have proper error-handling as well. Currently if any fails, notify-send will still be triggered.


This version looks nice and short, any thoughts on prober error reporting to the end user?

My version has more feedback for the user which was important because the user was somebody not familiar with linux/bash, but even my version "swallows" errors.


I added the `set pipefile...` suggested below, but I think mogrify only fails if the screenshot fails. Tesseract never fails if there is a valid input image, so realistically you only need one error message for the screenshot generation, unless you want to check whether the user misses any of the tools.


I also used the very same script until I stumbled upon this on hn [0].

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | dmenu "$@")
    maim -us | tesseract --dpi 145 -l eng+${lang} - - | xsel -bi

[0]: https://news.ycombinator.com/item?id=33704483#33705272


Ah just saw rjzzleep posted an updated version here. Happy to steal this one again :)


Looks nice


    # shellcheck disable=SC2064
    trap "cleanup '$SCR_IMG'" EXIT
While shellcheck can have false positives, and SCR_IMG probably doesn't have any characters which need escaping, it's not exactly wrong in this case.

The command passed to `trap` is evaluated normally, so variable expansions do take place.

    trap 'cleanup "$SCR_IMG"' EXIT
Will behave correctly, and the expansion of SCR_IMG won't be susceptible to issues relating to unquoted shell characters.

Alternatively, if you're using a modern bash (this probably won't work on a mac by default), then this is an option too:

    trap "cleanup ${SCR_IMG@Q}" EXIT


thanks for fixing and explaining that, I thought '' would work and forgot about escaping characters.


Binding a hotkey to `bash -c 'flameshot gui -s -r | tesseract - - | gxmessage -title "Decoded Data" -fn "Consolas 12" -wrap -geometry 640x480 -file -'` does the job for me.

I just press the hotkey (Super+O), drag the selection over whatever I want to OCR, then immediately get a popup dialog containing the captured text.


The Wayland leg works fine for me on gnome+wayland.


thanks!


A while back I copied from somewhere this script that does the job nicely.

  #!/bin/bash
  # Dependencies: tesseract-ocr imagemagick scrot xsel

  IMG=`mktemp`
  trap "rm $IMG*" EXIT

  scrot -s $IMG.png -q 100
  # increase image quality with option -q from default 75 to 100

  mogrify -modulate 100,0 -resize 400% $IMG.png
  #should increase detection rate

  tesseract $IMG.png $IMG &> /dev/null
  cat $IMG.txt | xsel -bi
  notify-send "Text copied" "$(cat $IMG.txt)"

  exit


In the spirit of sharing, cuz I think this is a great script (thank you), I prefer using maim over scrot simply because it has a --nodrag option. Personally feels better when making selections from a trackpad. Click once, move cursor, click again.

    maim -s --nodrag --quality=10 $IMG.png
10 is scrot's 100


Yet another variation I have been using for ages, using ImageMagick's `import` tool (which probably only works on X11)

    import "$tempfile"
    TEXT=`tesseract -l eng+deu "$tempfile" stdout`
    echo "$TEXT" | xsel -i -b


I was using something like this for awhile, but I found tesseract did poorly quite often. That resize trick didn't seem to affect much. I'm not sure what pre-processing would make it better.

I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.


The source is pretty straightforward - it's calling `scrot -s -o` to a temp file, and then `tessaract` with no further preprocessing.

https://github.com/RajSolai/TextSnatcher/blob/master/src/ser...


> I found tesseract did poorly quite often

The script calls Tesseract in default page segmentation mode (PSM 3). [1]

Depending on the input text, PSM mode 11 for disconnected text would probably work much better. That uses the flag "--psm 11".

[1] From the original repo: string tess_command = "tesseract " + file_path + " " + out_path + @" -l $lang" ;


Having used Tesseract for OCR for other things, getting the right PSM helps but it's still rather terrible, especially for sans-serif fonts, which are common in UIs.

Granted there's a lot of ambiguity in sans serif fonts, lower-case "L", vertical bar, and upper-case "i" can even be pixel-identical, but I've seen tesseract turn

  Chapter III
into

  Chapter |l1
which really surprises me. In fact, for books, I run it through sed to replace vertical bar with upper-case "i" and it significantly improved recognition.


I had a PowerShell script which did this as well, but alas, it was lost to time with the rest of my little scripts from my last job.

Apologies to all of my fellow Unix-Windows borderers.


  trap "rm $IMG*" EXIT
see https://www.shellcheck.net/wiki/SC2064

also, use mktemp -d and recursively delete the directory


This is perfect for me! Having a window with a button that I need to click is much worse than just binding a script to a hotkey.


For my fellow Windows-using plebians, the official Microsoft PowerToys add-in [0] has a feature that does this (it's also been added to the stock screenshot tool, but I personally find the one keyboard shortcut in PowerToys more pleasant to use).

[0] https://github.com/microsoft/PowerToys


Snipping tool build in OCR works for multiple languages (English, Russian, Chinese, Japanese etc.) without the need to install any language OCR packs though


Inbuilt snip tool does that too.

WIN+SHIFT+S

If it doesn't have the "Text actions" icon (dashed square with paragraph lines in it), you can update it via windows store to get the latest version.


It's bugging me for a long time now. Is tesseract actually the state-of-the-art solution here?

I just really don't know, it feels like it's, uh, subpar. Isn't it? I never seriously worked in that domain, but it somehow felt to me in the 2019, that with all recent advancements in computer vision, text recognition must be essentially a solved problem. I'd expect it to be better than human. Yet I still cannot accurately convert a low-res scan (scan! not even a photo!) of a receipt with tesseract, especially if it isn't in English. Maybe I just cannot properly use it?


I use Tesseract semi-regularly and only rarely have recognition issues, including with receipt scans (or even photos). How are you specifically using it?


I see tesseract mentioned more and more.

Myself I tried it probably 10-15 years ago on scanned scientific papers (decent scanning quality). The results were disappointing. The manual postprocessing required was not much less than typing it directly. So tesseract became a synonym of "not worth trying" to me.

Maybe things have improved over the years, so I should give it a new try. (No particular use case at the moment, but those tend to appear occasionally.)


It’s good now _if_ you OCR only scanned documents or otherwise have a lot of control over how you prepare the images before it’s OCR’ed. For more general purpose recognition with weird fonts and bad image quality EasyOCR gave me much better results


This project is including Tesseract 4.1.1 which is at least a couple years old.


Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.


It's way better now. I used it 15 years ago and had to do quite a bit of preprocessing to get not-entirely-terrible results, but now I use it with great success and no preprocessing.


First time I used it 3 to 4 years ago, it was good.


I gave it a try. Works pretty good.

Being a Flatpak app, it will require desktop portals to fully work. That said, it worked absolutely fine out of the box for me with my existing xdg-desktop-portal-wlr setup. So, it should work fine in any X11 or Wayland setup where you have an xdg-desktop-portal setup that supports the Screenshot API.

The results are mixed, but not bad by any means. Cleanly readable text comes out mostly fine with maybe only whitespace issues and the occasional error, which makes this still potentially very useful for copying text out of error dialogs and whatnot. (Though, I've found that on Linux, error dialogs are far more likely to have selectable text in the first place. And on Windows, standard MessageBox responds to Ctrl+C.)


The similar app I am using is Frog (https://getfrog.app) with great sucesss.


No AppImage, no .deb, not even brew.


It's on nixpkgs under name `gnome-frog` (for nix users)


There is a utility available for macOS that extends beyond simply opening a document in Preview and attempting to select the text: https://github.com/schappim/macOCR

I like the author.


FWIW one can skip Preview and just do Cmd-Shift-3, click the thumbnail, and interact with the text in the quicklook. Then, delete the image (trashcan in top right). Cmd-A works, too. Here's me using it on that comment: https://imgur.com/a/q0NvcS6


Thank You!


I got a similar solution on iOS as a shortcut connected to the action button. Some apps doesn’t allow easy text copy. Or when it’s in a foreign language. It does:

- take screenshot

- extract text from it

- translate the text to english. Auto detects source language

- show both original and translated text in quick view where you can select and copy if desired.

Here is a implementation you can try:

https://www.icloud.com/shortcuts/f420d24e4960415da1a43f230ab...

While on the subject of iOS. In recent versions when you open a photo in the photos app you can also select the text in the photo by hand and copy it.


This is fantastic thanks for sharing. I have used it in the share sheet when tapping share on an image and it works but given I am already providing the image, the screenshot is redundant.


> for the Linux Desktop

Caveat: This is a Flatpak and not all Linux distros ship with Flatpak. But I'll give it a whirl in my Fedora virtual machine. I've seen many flavors of this type of tool floating about, most of them leveraging Tesseract[0], and I've tried a few of them. It fails badly on grainy / noisy images or where the text is warped or skewed in some way. It will not solve CAPTCHAs for you!

[0] https://tesseract-ocr.github.io/tessdoc/Home.html


Which distros does flatpak not work on?


https://flatpak.org/setup/

Flatpak should work on every distro. However, it may not be included by default, so you need to install flatpak before installing this application.


Hannah Montana.


I'm sure it would work if you built it from source.


...it's a pile of Vala code. What you probably mean is that the author did not make a package for your distribution, and there is no one else who had the time and inspiration to package it. You can be the maintainer you seek.


Not really a Caveat, if it only had an Deb you could argue it doesn't work on non-Ubuntu/Debian, which is far bigger caveat.


Interesting, I've always resorted to using Google Lens via the phone for this purpose. And then using the "Copy to another device" feature of Chrome.


Would be great if flameshot had this feature. It's otherwise the best screenshot tool I've ever come across


Surprises me to see I'm the first comment here to say: I just use GPT4 for this. Works perfectly, even for getting the Latex source of a formula you only have a screenshot of.

Probably quite the overkill in terms of energy efficiency for just image to text, but I only need this like once every two weeks or so.


I'm using normcap[1] for this. The workflow feels a bit more polished (Though also not perfect) and the repo is still active.

[1] https://github.com/dynobo/normcap


Another variant of the scripts floating around that I've been using to scratch the same itch:

  #!/bin/bash
  # Performs Optical Character Recognition (OCR) on a freely chosen
  # screen area and copies the recognized text to the clipboard.
  #
  # Dependencies: sudo apt install gnome-screenshot tesseract-ocr xclip
  
  IMAGE_FILE="/tmp/ocr.png"
  
  gnome-screenshot --area --file "$IMAGE_FILE"
  tesseract "$IMAGE_FILE" - | xclip -rmlastnl -selection clipboard
  
  rm -f "$IMAGE_FILE"


Why is this posted now? The repo has seen no activity in the past two years, and the https certificate on the website is also obsolete since 2022 so I'm not sure it is still alive…


there's probably a lot of software that wasn't updated in the last 10 years that could still be really useful


Sure, but at the same time if it's not included on distributions and not updated by upstream it's likely to have compatibility issues relatively quickly (GTK is particularly bad at maintaining compatibility between versions, even point releases).

Also code not being updated is something, TLS certificate not being renewed is in an other league in terms of lack of support of the project.


fair points... but I feel like that it is problems that needs to be solved. Emulators and VMs do fix some of those.


Been using this little script for mac os to copy text out of images without going through Preview.app: https://github.com/nburns/utilities/blob/main/ocr

(would definitely appreciate feedback/critiques from any swifties out there)


I wrote a script a while back gluing wofi (for dispatching several screenshot related tasks), grim, and tesseract together.

https://robbyzambito.me/posts/tips-and-tricks-for-taking-scr...


Suggestion: You could just

    mkdir -p ~/Pictures/Screenshots
and not have to warn the user to create it.


For macOS users, I'm the author of Textinator [0] a similar utility for macOS that uses Apple's Vision framework [1] for doing the OCR natively. Modern versions of macOS (since Sonoma) have a similar ability to copy text from images using the Live Text feature [2] but Textinator works on macOS Catalina and later and simplifies the "take screenshot, copy text to clipboard" workflow. It's also an example of how to build a native macOS app entirely in Python.

[0]: https://github.com/RhetTbull/textinator

[1]: https://developer.apple.com/documentation/vision?language=ob...

[2]: https://support.apple.com/guide/preview/interact-with-text-i...


Is there anything that could handle indentation? I use very similar tool on Linux already (also available on Windows and Mac): https://dynobo.github.io/normcap/


was hoping to find a replacement for https://translate-image.com/

which promises to grab text from an image and then translate and make new image with same style..

But I've tried it many times with different images this week and keep getting same error.

Could be that Canva has this and I've not aware of it yet.

I've needed this this week, but may just open up affinity designer and make one manually at this point.


thats pretty cool but definitely has a ways to go (in the example on github page even shows a few discrepancies between original and pasted text - seems to be mostly punctuation though)

very nice though thanks for sharing!


Looks like it hasn't veen updated in a couple years.


A great tool. But it only use on Linux. I found Xclippy (https://xclippy.com/) tool. It available on Windows and MacOs


Note that this is alreay a (non-obvious) built-in feature on Mac and iPhone, called "Live Text". See these articles for examples:

https://support.apple.com/guide/preview/interact-with-text-i...

https://support.apple.com/guide/photos/interact-with-text-in...

https://support.apple.com/en-us/HT212630


On iPhone you can even search your images by text content.


For completeness sake, Samsung phones with the S-Pen can also OCR. That would be, the old Note series and now the S-Ultra phones.


very bad UI


It seems like this tool sends your screenshot to some sort of web service.

If that's really the case then obviously don't use it for personal data (invoices, love letters, legal proceedings, ...)


I didn't see anything of the sort looking through the source code. I see it uses portals (or scrot) to take a screenshot, and spawns Tesseract as an external process.

https://github.com/RajSolai/TextSnatcher/blob/9e67760d6c16ea...

Tesseract itself seems to be included in the Flatpak as you'd expect:

https://github.com/RajSolai/TextSnatcher/blob/master/manifes...

Where did you get that?


Why would it be using Tesseract if it also uses an external service? And who's paying for the service?


How is OCR these days? Lately I'm seeing more deeplearning-based OCR, and it gives you significantly different results just by cropping the image differently.


Preview on macOS does it automatically. No tools needed.


Preview is one of the most underrated apps in macOS, and the one I miss the most when I use Linux or Windows. It's a great little toolbox for quick editing and convenience features.


I didn't realize it was underrated, but it's probably the best MacOS bundled software. If I could get a Linux equivalent, that would be fantastic. Viewing, editing, PDF filling, PDF signing (so useful), all in a fast and responsive tool is just incredible.

If anyone has anything near a Linux equivalent, please let me know.


Except for the color and font picker. Omg so slow


Within Google photos mobile on iOS, you got an ocr.

I take a photo, grab the text and send it via WhatsApp web app.

Not easy as a clip to clipboard but I haven’t found any on windows.


If you're on a Pixel 7 and upwards or the latest Samsung phones there's also circle to search by holding the home button down. The OCR works quite well including English, Russian, Arabic, Japanese and I'm sure it works on other languages too.

If you're on Android 14 you can also copy text through the recents/overview menu simply by highlighting the text. And finally there's Google Photos if you don't have any of these features.

There's also Google lens if you're trying to copy text that isn't on your screen.


You can take a screenshot. Open that image file with Chrome and then do "Search Images with Google" . There you can grab the text.


On iOS it's system wide. It's an iOS and also a macOS feature


Another neat thing is when you copy the text on your iOS device, it appears in your clipboard on Mac, so you can just paste it. (Assuming both devices are on the same wifi/local network.


And you need to select the check box somewhere in the settings app for it to work.

Source: helped a friend with the feature recently, he didn’t know it exists.


It's a shame this is for elementaryOS as those apps typically do not work correctly on other basic Gnome systems.


Excellent, just like text extractor in windows powertoys! Love this!


Compiled on Deb 12.5.x - pretty cool. Thank you.


Does this work in languages besides english?


Vala , first time I hear about it !!!


Just three hours ago I switched back to Linux after a few years on MacOS. The only thing missing was the amazing text copy tool I was using, "Rex" [1]. What a coincidence to see this post on the front page a few hours later!

Side note, what a breath of fresh air Gnome on Fedora is!

[1] https://github.com/amebalabs/TRex


It's obviously personal opinion, but I think you made the best choice! (Gnome on Fedora). Welcome back!

It's remarkable how much more polished Gnome is from a few years ago. If you use 2FA TOTP, make sure to install Gnome Authenticator if you haven't already. If you use Aegis on Android (or a handful of other formats) it can import/export your seeds. It is downright luxurious having this on my laptop/desktop:

    # If you haven't setup flathub yet
    sudo flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
    
    # Install Authenticator from flathub.  Source:  https://gitlab.gnome.org/World/Authenticator
    sudo flatpak install flathub com.belmoussaoui.Authenticator


You can also add TOTP secrets to entries in KeePassXC and generate/copy codes there (Ctrl+T).


On a side note, sonoma and ios have this functionality built in now.


I'm running Monterey and have the feature, but it's only inside Preview, which means I need to either open the image in Preview or take a screenshot and then open a new image in Preview to paste the screenshot, before getting the text


Check out Textinator [0] which is an open source macOS app that watches for screenshots and automatically does text detection then copies text to clipboard. (Disclaimer: I'm the author). It works on macOS 10.15+

[0]: https://github.com/RhetTbull/textinator


should work in quicklook and safari at least since Ventura. Does for me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: