Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Digitizing photos of whiteboards using the command line (gist.github.com)
186 points by lelandbatey on April 3, 2014 | hide | past | favorite | 36 comments



Great work, thanks heaps for this! This'll help me clean up files quickly and easily, as we do the same thing a lot of the time too :)

NB: This works fine on the Windows version of ImageMagick, however if (like me) you want to wrap it in a batch file you must escape the -level command by changing the "60%,91%,0.1" to "60%%,91%%,0.1" or else the batch file will misinterpret it, since %1 refers to variable names (the same as bash scripts). If you don't, you'll just get a black PNG file.


I got excited when i saw the results since they looked awesome, such that I thought I could replace my quick&dirty transparent GIFs script[1], but apparently it has issues with low-resolution images.

[1] : convert -fill none -draw "matte 0,0 floodfill" -type optimize -colors 64 +dither -trim -fuzz 3% +repage -strip $1 $2

Nevertheless, it's a great technique which i've just saved for such specific cases. Thanks!


For those of you who would like to use this on smaller images, play around with "DoG:15,100,0". This string refers to implementing a difference of gaussian filter with inner radius of 15 and outer radius of 100 pixels. This is obviously tied to image size. The first number should always be smaller than the second, and my guess is that it should be on the order of the average width of the text. Just a estimation, I don't have the time to test it right now.


Heh, I remember this from when you posted it on reddit months ago [1]. Glad to see you're the same person and you didn't just take it from them. Very nice script, I've used it before with great success.

[1] http://www.reddit.com/r/commandline/comments/1weqnn/cli_onel...


Nice idea. It would be fantastic if this could be done on the phone during the image capture


There's a whole heap of iOS 'scanning' apps that do this sort of thing device-side.


...and Windows 8.1 phone.


Or Android.


Cool! I did something similar just recently for cleaning up scans of my laboratory notes (in a moleskin). I have an Automator action which watches the scan folder, automatically crops/splits the pages and cleans them up, then uploads them into Evernote. Gives me a searchable cloud backup of my lab notes!

Anyway, I cheated by using some of these actions: http://www.fmwconcepts.com/imagemagick/

In particular the "textcleaner" script proved useful.

I'm definitely going to have a look to see if this script offers advantages over my method.


Take a look at unpaper instead. It was designed for postprocessing scanned documents. The only thing it does not do is the Evernote upload:

https://github.com/Flameeyes/unpaper


This seems similar to what I've had to do with OpenCV in various iOS apps, but I think your results are more impressive than mine.

I had to sacrifice some quality in order to be able to do this with realtime video, but still, I should probably work out exactly what processing that command actually does and see if I can improve quality while still meeting the realtime requirement.

Thanks for sharing.


Looks much better than the original photos.

On a slightly related note, camscanner, an app for android (and IOS?) does something similar, but can also correct for the angle at which the photo was taken (I think it has to detect squares/rectangles in the photo). Does anyone know if this is possible using the command line too?



Looks like script "whiteboard" from the collection referenced by heisenzombie in this thread can apply relevant transformations: http://www.fmwconcepts.com/imagemagick/whiteboard/index.php


If you want to handle input file names including spaces, just wrap the $1 into parens:

#!/bin/bash

convert "$1" -morphology ...


Those look more like double-quotes than parens.


I'm doing an online whiteboard (spacedeck.com) and always wondering how to bridge the analog/physical world of whiteboard scribbles and sticky notes and software. How would you continue to work with these images? Attach them to a task? What is the workflow?


Over here - http://www.infoq.com/presentations/j-language - there is discussion in the video of removal of the background.



Here's the comment I posted on the Gist, copy-pasted to here:

@molven, @vibragiel: Wow, I feel more than a bit ridiculous after looking into this. Turns out, those original examples I made using GIMP (same basic process, tuned the parameters to make it look nice). I'd originally put this gist together several months ago, and I hadn't done a thorough enough check before submitting to HN.

HOWEVER, I've since created new examples actually using the bash script, and the gist has been updated accordingly.

You can get these images here:

> Input1 - http://i.imgur.com/27aDJ6b.jpg

> Input2 - http://i.imgur.com/LaRWFT4.jpg

> Output1 - http://i.imgur.com/xMxM8P2.png

> Output2 - http://i.imgur.com/E3XoM3e.png


github compressed the upload, get the originals from links from his response: https://gist.github.com/lelandbatey/8677901#comment-1204480


It's because you did not use the full resolution pictures, apparently.


Nice idea, would it be possible to do some OCR at the end?


I did experiment with using this to bring out the text in photos of books. Here's an example:

Input image - http://i.imgur.com/6o5FwxG.jpg

Output image - http://i.imgur.com/7OIOxfO.png

I tried to use vanilla Tesseract on it, but I had no luck getting anything usable out of it.


You have to compensate for the 3D deformation present in the captured image. There has been some impressive work in this area recently.

Some commercial OCR engines such as Nuance Capture SDK have built in functionality for this.


Take a look at Tesseract, which last time I looked was the best codebase to use in this area. It's part of Google's open source multilingual OCR suite, which is in two parts (layout analysis and actual OCR code being segregated): https://code.google.com/p/tesseract-ocr/


Is there a guide to using tesseract? When I last used it, I had trouble accurately recovering text from an image I just created with Times New Roman. There are probably some settings I'd need to change to get it to work properly.


Not sure. I looked at it years ago, during its early open source stages, and similarly concluded that it was nontrivial to get going. Should be easier now. IIRC in those days it had a very high volume mailing list for user support.


At the end the script needs to upload the image to mechanical turk for analysis.


Nice. Works well for drawings/comics.

http://imgur.com/a/Xvu7K


very nice, never experimented with modifying images via command line but was able to understand the process via shell parameters.

also photos of whiteboards always come out terrible, this cleans them up nicely.


I'd like to see similar script for blackboards (greenboards?).


If you have black-on-white then try adding -negate after the source image, it'll turn it to black-on-white.

Another approach I took recently was to combine this with -threshold (90 worked well in my case) and the Dilate morphology - I had a black and white set of plans printed using varying-sized dots.


Awesome, thanks!


This is actually neat!

Great job!


Very cool, I converted one and it worked well.

On a side note. I don't like gists. I didn't want to stray off topic so I put my complaints about gists here: https://news.ycombinator.com/item?id=7521600




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: