Hacker News new | past | comments | ask | show | jobs | submit login
Page Dewarping (2016) (mzucker.github.io)
205 points by Tomte 5 months ago | hide | past | favorite | 27 comments



I'd be wary of applying such a harsh threshold to the output as the author has done here. While it captures the page well enough for typical text, I have encountered many pages from Google Books where the threshold has mutilated illustrations or small footnotes to the point of making them unreadable. And if the Google Books scan is the only one available, you're completely out of luck.


Isn't the thresholding just for identifying landmarks to be used for choosing dewarping parameters? Presumably once found those parameters could be applied to the original image.


It’s clearly described as the last step, so it should be easy to omit it if desired.


It's 2024. Why my freaking document scanner app still doesn't have this built-in?


We use GeniusScan at school and it does that: https://blog.thegrizzlylabs.com/2024/03/genius-scan-7.16.htm....


It only does linear projection matrix correction and it needs user to mark page corners on screen


Ah, I didn't realize (I don't use that feature often enough). Thanks for pointing that out!


This is very nice. Having a low-dimension page warping model to optimize seems to be what makes this work.

This is a good YC-sized problem - weeks to market, a few hundred thousand dollars to launch. Apple has a phone app for this which requires too much manual adjustment, and Microsoft has Office Lens / Microsoft Lens, which has comments such as "The edges just eventually go crazy and look horrendous". So there's a market for one that Just Works. Exit by selling to the usual suspects.


I’ve scanned probably thousands of pages with ms lens over the years and it works quite reliably in my experience, at least on iPhone.


SwiftScan does pretty well in my experience.


I swear Google Drive used to do this properly but has gotten considerably worse over the past few years


I remember being fairly happy with Google Drive app, but I don't recall details of the old functionality either. Today, it is limited to using a quadrilateral envelope to frame the page, which is inadequate for anything other than flat single pages. Its shadows removal filter is pretty good though.


Product management felt it was not worth the tech risk (seems too complicated & mathy), and they would get better user metrics by instead building a more sophisticated model for timing notifications by trawling user’s social media activity. The decision makers decided to be rigorously data driven, in their effort to reduce churn.


vflat is good for this


After John Warnock stepped down as Adobe CEO, he ramped up his involvement at Octavo, a company dedicated to preserving rare historical books. A challenge they faced was de-curling scanned pages that could not be flattened.

https://en.m.wikipedia.org/wiki/Rare_Book_Room


This writeup was great. I might reference this at work as an example of how technical projects (and decisions) can be effectively documented.


I ran into a different issue while trying to make an app to scan color coded notes in college. The colors skewed from the top to the bottom of the page making it hard to reliably tell blue pens from green ones. At some point I should look at it again.


Assuming the white background skews in the same way a good trick is to make a copy of the image, blur it VERY widely, and divide the original image by the blurred version. This essentially removes any low frequency color/brightness shifts. I'll often use it to get rid of shadows etc on photographed paper, but it should work for a color gradient all the same.


Looks adequate. Seems, though, that the warp model is a little too global, in the sense that some of the more involved distortions of the paper are not captured by the model and this can be seen in the final result as a residual distortion.


Error in installation:

    ERROR: Could not find a version that satisfies the requirement cv2>=3.0 (from versions: none)
    ERROR: No matching distribution found for cv2>=3.0
Raised github issue.


Very cool stuff. I wish there was a good document scanning (dewarping, thresholding, pdf generation) app available on mobile.

Currently stuck with Adobe scan as gives the best result for me, which is still quite bad at dewarping


I have heard good things about Microsoft Lens, although when I try to open it on my phone it just freezes.


i have been using Microsoft Lens for ages and never disappoints


this is really interesting to read imho, thanks for sharing as i missed it in 2016 i guess. It's so nice to get the full, i have this problem, applied smart techniques and got a nice working solution. Fun and interesting read :-). Don't think i'll ever need anything like this ,but it's a great example of tackling a problem with some good methods, and cutting some corners where the output results and expectations allow it. well written and explained.


1. If you don't need to visualize the book and merely need to perform OCR, I bet you could skip this step.

2. Google solved this problem over ten years earlier. https://hardware.slashdot.org/story/09/05/15/1834246/how-goo...

3. If your manuscript is really valuable, you can dewarp it without contact using x-ray tomography. https://scrollprize.org/tutorial1


1) so try and recommend a software.

2) yeah they used hardware.

3) ChatGPT strong here?

4) this is nice and simple from 2016


Cool writeup!

Next, please do de-wrinkling of receipts...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: