Hacker News new | past | comments | ask | show | jobs | submit login

You are right that the quality of the scans is paramount! Unfortunately I don't have access to the physical books and have to work with the scans as they are (they're not good). But I will look at Scantailor, it looks interesting.

For now I reconstruct paragraphs in html but I could do markdown just as well (where paragraph breaks are marked by double line breaks, and single line breaks don't count).

Collaborative proofreading would be cool but it would require some way of properly tracking who wrote what, and I'm not sure what to use or if I should build a simple system from scratch. Do you have recommendations?






I got a copy of the 30-year old book from EBay or Amazon for $20, chopped the spine off, and fed it through a scanner. Doing that to a century-old book feels wrong!

ScanTailor was tricky to start with; dunno if there's a manual. I remember belatedly realizing that there's automation at each step, that one can then quickly skim and manually adjust.

For collaborative editing, git via GitHub worked for us. Tracking who did what, and when, is easy. It allowed for sweeping edits covering multiple chapters. Building some porcelain on top of that, for less technical folks, could be good.


> Pour obtenir un document de Gallica en haute définition, contacter utilisation.commerciale@bnf.fr.

roughly:

> To obtain a Gallica document in high definition, contact utilisation.commerciale@bnf.fr.

My expectations would be very low, but I'd reach out to them anyway.


Because you're creating webpages from the text, one option for collaborative notes/corrections is to use a Web Annotation system like Hypothes.is.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: