Hacker News new | past | comments | ask | show | jobs | submit login
Company Scans Your Books For a Dollar – Ship ‘Em In, Get a PDF via Email (singularityhub.com)
138 points by ph0rque on Aug 18, 2011 | hide | past | favorite | 94 comments



My grandmother, who passed away last week, was an author. She self-published two novels through a printer who typeset her books and printed a few hundred copies. I don't know what happened to the original text files she gave her printer and I know she never got a digital typeset copy.

I've been wanting to re-release her novels as ebooks, but haven't had a way to digitize them. This is perfect for me.


By the way, if anyone is interested, I'm also republishing my grandmother's short, inspirational writings on this blog: http://goldenwriter.caryme.com/.


Ideally you'd get something besides a PDF, which is a pain in the neck to turn into a 'real' eBook.


I'll probably OCR the PDF and see where I can go from there. It's a big jump start that I couldn't have easily done on my own.


The 1DollarScan website says they do the OCR for you. They also claim that the result is searchable, which of course it couldn't be if they didn't OCR it.


I may still want to OCR it myself for more control over the process (particularly in identifying and correcting OCR errors). In my experience, there's a difference in expectation of quality between OCR to make a PDF searchable and OCR to generate a standalone text file.


Why is that? .PDF isn't proprietary enough?


PDF is really a display format, whereas epub/mobi are more HTMLish in that they are not quite so specific in how they want things displayed, and thus can be flowed into different screen sizes pretty easily.


PDF is not proprietary.


I was really hoping that the Google Books project would solve this problem for regular people. I know they take collections from public libraries and OCR them. Why not do the same for private collections?


What a great thing to do.


I wonder how well they do with math books?

Even the original publishers have trouble with those. I was going to buy the ebook version of "Proofs from the Book", but downloaded the sample first from Amazon, and it was completely worthless. You'd have a line that would say something like (S in the following represents a capital Greek Sigma):

   S( ) <= S( )
where there were supposed to be things inside the parenthesis.

I checkout out a few other serious math books for Kindle, and although most weren't as bad as the above, I'd usually find some deal breaking errors in the handling of math symbols. For instance, I seem to recall one that would lose exponents of -1 if they were attached to a small letter. I didn't see anything else wrong...but it was an abstract algebra book, and it is customary to use multiplicative notation for groups, an so a^-1 is the customary notation for the inverse of a. Group theory gets very confusing when a^-1 gets replaced by a.

At this point, I don't think I'd buy a math book for Kindle even if the preview was flawless unless I was very sure that the preview included every mathematical symbol that would be used in the book, and in all the sizes they would be used in.


If it's a raw image with little to no OCR, you won't have that problem.


I've been reading quite a lot math books on a Kindle recently, but I didn't bother to convert them to a native format -- I've been using PDFs.

Because of that, I needed to convert a DjView file to PDF a few times. The thing is, I have yet to find a good djvu2pdf converter -- they create PDF files with ridiculous sizes, like 100 or even 500 MB and they are terribly slow. I got better results in terms of size when I tried printing from djvu viewer with PDF printer, and while the file size was good (10-40 MB) and quality was acceptable most of the time (depending on the initial quality of djvu), it was so goddamn slow and resource intensive that I stopped doing it altogether and went back to dead tree books -- converting ~400 pages djvu this way took about 3-4 hours on 2 years old laptop.

Also, my Kindle was really, really slow showing them -- turning pages took about 10 seconds. Everything else was fine, though -- math books aren't really meant to be read fast, so this was not a big problem.

Anyway, anybody knows a better way of doing that?


What I noticed about this company, after checking their website:

1) They list an "Accept direct shipment from Amazon" option (coming soon). Think about it: this means a book could be printed, sold, shipped to this company, scanned and destroyed without ever being read. Something is very odd here.

2) The business is located four blocks from my apartment. Maybe I'll try them out...


Re 1), what's odd is that current ebook pricing makes this an attractive option.


I can imagine a group of 20 students, buying a book, shipping directly from amazon to that company, and getting a copy for a very small fraction of the cost


While that's true, the post you are replying to was basically saying "I can imagine a single person buying a book, shipping directly from amazon to that company, and getting a copy for less than the ebook cost" which I think is a more extreme version of your scenario.


How ridiculous is that?


We live near each other -- according to their website, their business is right next to where I live.

Yet, the address is of a parking lot... (10 E 3rd Street in San Jose). Strange.


printed, sold, shipped to this company, scanned, and then resold


Not resold as physical books, no...since they cut the spine off the book so that they can scan it.

Resold as PDF files? Not by this company, anyway...I don't think that business model will work out legally.

Resold by shady third parties? Maybe.

Copied and bittorrented? Definitely.


> Resold as PDF files? Not by this company, anyway...I don't think that business model will work out legally.

I wonder. If the company holds onto the PDF and just sends out that PDF whenever someone mails in a book already scanned, isn't this the equivalent of those music services which simply make you prove through uploading music that you already have a copy and then give you access to their one stored version of that song? I think this has also been done with DVDs as well. So why not books?


Would be interesting to try sending in a book with notes on the margins, see if they get preserved.


If only this service would have been around in Fermat's time.


I hope not, because books, unlike DVDs and music, can be modified by the readers. What if I want my handwritten notes in the margins also scanned?


Weren't all those music lockers sued into oblivion? If they're smart, this company will take the hint.


You mean like Apple iCloud?


me too, I live very close, so I e-mailed them and asked if I could drop off and then pick up when they are complete. I have a lot of programming books that I could love scanned.


> and then pick up

Would you be okay with them no longer being bound? They saw off the binding so they can use them in a sheetfed scanner.


Yeah I was thinking about it and I am happy to part with most of the books as they are programming books.


How about sending those unwanted programming books here to kenya, as a donation or at a throwaway price? We could do with them.


re 1) it's very convenient for me as often shipping costs more than the book if you don't live in US or UK, not to mention the time factor.


Article doesn't mention a big audience that would be very interested in cheap book scanning: the print disabled. Blind and partially-sighted people are the most obvious members of this group but dyslexics and others who cannot read normal print would also benefit. Many books are of course already available through libraries for the blind but if you need less-popular or specialised texts you are out of luck.

With an aging population it seems this is a very natural niche.

Now if it were only OnePoundScan instead of OneDollarScan...


I think this is one of the hidden benefits of going digital with books. The selection of large print books is very weak, but with a kindle (or similar) suddenly every book is a large print book and many can be played as automated audio books as well.


I came here to say the same thing. I can't read printed books at all. Even large print ones are a struggle. I just have to have the back light of my iPad or computer screen to see what I'm reading, in addition to the increased font size. I have a lot of books that I love that aren't available in digital format, mostly due to their age. I think I may well be taking advantage of this service.


This doesn't OCR them, it's just an image, so the blind are still out of luck.


Well, it still helps - you can OCR a book file yourself, even if you can't see; just a few clicks and then go do something more interesting. Scanning your own book is possible, but requires sitting there throughout.

Edit: http://1dollarscan.com/pricing.php says "OCR stands for Optical Character Recognition. The PDFs will contain the OCR text layer behind the images to make the text searchable and selectable." This service is labelled free so I guess you get it (or can get it) for each scan.


The video indicated that the binding for the book was being cut off. I suspect that will prevent students from borrowing books from the library, getting them scanned via this service, and returning them.

It is sad that many hundreds of thousands of hours have been wasted by students photocopying books :(


OTOH, it does make it fairly attractive for 5 or 10 or 20 students to buy one copy and have it scanned.


Yes. But you might as well get your book of some torrent site, then.


> I suspect that will prevent students from borrowing books from the library, getting them scanned via this service, and returning them.

Now I'm curious, how much does a library charge for a lost book?


Dont go down that road - please. If you "borrow" someones book you rob one person of the privilege of reading it in the future. If you pull the same stunt with a library book you potentially rob many people well into the future. For some reason libraries are very slow to replace lost books. I guess just knowing with good certainty that the book is actually gone for good rather than placed in the wrong section would be a tough nut to crack.


I wouldn't. But it's not like I'm the only person who would think of this. So, really - are lost book fines low enough to incentivize students to steal them, ship them to get scanned, and pay the fine?

Especially in the case of required textbooks.


The normal fee is the cost of the replacement and a processing fee (University of Maryland and most of the public libraries I have dealt with). When I was at the university they would accept a replacement that you bought, but still charge the processing fee to get it back on the shelf.


My alma mater charges a minimum of $40; standard books look to be $104: http://www.lib.uwaterloo.ca/borrowing/sanctions.html


At the university library I work at its minimum $65 replacement charge depending on the actual replacement cost plus a $15 processing fee.


From my experience replacement value plus some extra fixed fee is common. So you might as well just purchase the book. That's cheaper.


It's alot quicker and less error prone to scan a stack of pages vs turning pages in a book.


I run a photo scanning company in Canada (http://photoscanning.ca) and I was seriously surprised at the number of people asking for this service. I get a ton of teachers that are looking to have their material on their laptop instead of lugging everything around, students wanting their texts on their iPads, etc. We haven't officially done it yet as I hadn't looked into the copyright issue as much as I should, but may in the future.

The digital photography method is certainly less destructive, but hugely more expensive & time consuming. A $15,000 scanner can do 120+ sheets per minute... The pricing for book pages seems very solid (keep in mind that scanning is quoted in impressions, so your 100 sheets is 50 pages double sided, if I'm not mistaken)


The way the current rules are written for fair-use, if two people send in an identical book, are they allowed to send the same PDF to both rather than scanning twice? I wonder how the governing bodies would view that. Side-note, I think it'd be interesting if if they collected/published data about what books are sent in, and from where.


> if two people send in an identical book, are they allowed to send the same PDF to both rather than scanning twice?

In somewhat-analogous circumstances almost 30 years ago, a court said "no." The case was Micro-Sparc, Inc. v. Amtype Corp., 592 F.Supp. 33, 34-35, 223 USPQ (BNA) 1210 (D. Mass. 1984). The defendant offered a keyboarding service: It typed in the source code of programs published in a hobbyist magazine, then sold disks to purchasers of the magazine. The court rejected a fair-use defense and held that this infringed the copyright in the programs. (Adapted from a chapter in a treatise I published long ago.)


After reading this case, it seems to me that a different section of the copyright code was tested here. I believe space shifting of media other than computer programs has a solid history...

That the subject was a computer program is what seemed to trip up the case in my opinion.


To be very accurate, they'd have to keep track of slip streamed versions. I had a maddening email debate with some friends some years ago over what the ARM said. We were all using the same "edition," but we finally worked out that they were actually different. Stroustrup or his publisher wanted to correct something without actually changing the edition number.

So for your scheme to work accurately they'd have to try to keep track of that.

I think it's too risky, and would go with "scan the exact book in."


I want to do this with all my books and paperwork (old bills, receipts, notes from college, photographs (crappy snapshots, really)). This was my plan:

Buy a ScanSnap s1500m for $420. It does double sided scans at 20 ppm but that doesn't include OCR time or paper jams. Let's guess it averages 500 pages an hour.

Pay a neighbor kid $10 an hour to cut and scan.

Even if I sold the scanner when I was done, this is still way cheaper. It's a shame they don't scan things besides books. Although, I'm not sure I want to send my old tax forms to a sweatshop...


> It's a shame they don't scan things besides books

It sounds like they do. Homepage says books/photos/business cards/documents/greeting cards.


This is a cool idea but it is a bit sad that it is necessary. Sad because it is wasteful. Books are written in a digital format. So we are using natural resources to make them, turning them into digital (which they already were, and probably in a better format then PDF like LaTeX, specially for technical books), and then destroying the physical copies after they consumed resources.

Of course it's necessary because not all publishers release digital versions or they release it with a ton of DRM, etc..

Still, it's a bit sad.


There's generations of books that were printed before the original book was in anything but analog form.


A lot of us are doing the same thing with CDs and DVDs. You can choose waste or DRM; so far you can't have neither.


Most music available for purchase these days doesn't have DRM but in the case of video, you are correct.


Would something similar make sense for say music cd's, dvd's etc? I for one would love to send someone my media collection, and have it ripped into lossless format and put on cloud. In order to get around piracy issues etc maybe the entire collection can be digitally signed by my public key...


cd-to-mp3 services have been around for a long time. i'm sure most of them have lossless encoding as an option.

googling "service to convert cd to mp3" just found a bunch of them.



Hmm, now what about that box of tapes and mix-tapes I have in storage?


This is a repeat article of a business borne in Japan fueled by iPads and lack in home shelf-space in big cities. http://www.wired.com/gadgetlab/2011/02/japanese-book-scannin...

I think they scan new books, once they have one scan in inventory, they don't have to scan it again. When you send them your book, it's proof you owned one and they send you a PDF. That is, they don't scan all individual books --so if yours had margin notes, that would be lost, I believe.


Book scanning is actually a great tool for University students. I know that a lot of the actual University libraries are trying to go digital, but in the mean time being able to quickly find specific passages by utilizing the find function makes for a much easier day at the old study hall. This sort of idea could certainly pick up traction around schools for the next few years.


I was all there, until it said 'PDF'. For my purposes I need a more accessible format.

Other than that, I could definitely see this helping with my foreign language studies. Especially Japanese, as the company appears to do a lot of books in that language.


What kind of format were you looking for? Would it be better to receive it in all three: PDF (w/ OCR performed), TIFF, and ASCII?


UTF8 preferred, since I'd be doing mostly Japanese. Shift-JIS would be acceptable. Basically just plain text. For books, anyhow. If I sent any comics, I'd want image files.

Though, after thinking about the cost of shipping, book, etc, I'm not sure I'd send much... It'd be only things that I really, really want to read and just haven't learned the vocab for yet. And there really isn't much of that.


They OCR it, so with the PDF you get whatever text data they are able to extract from it (don't know about the encoding, but that is easy to convert), plus additional information.


djvu + OCR and epub would be my preferred formats. The original tiff along with it would be nice too, in case there's a better lossy format in the future.


It appears that the service returns plain PDF files which are fairly easy to convert into more open formats.


Note that PDF is an open standard...


They chose a pretty bad name for their website.

What happens when they need to charge more (or less) than one dollar? Also, they're also locking themselves into the "scanning" market with the word 'scan' in their domain name.


> What happens when they need to charge more (or less) than one dollar?

Why would they need to charge more? And the nice thing about computer tech is that with dropping costs, you don't have to change the price, just the quantity each $1 buys you...

(Right now $1 only buys you 100 pages, which is an awful small book.)


Reminds me of another company with a (misspelled) number in its name. See https://secure.wikimedia.org/wikipedia/en/wiki/Googol


Seems a safe bet that scanning equipment and methods will decline in cost faster than the US dollar.


This would be a great business if you got to keep your book. Ie: instead of 1$, pay 5$ for the first 100 sheets but no spine-cutting and you get the book back intact after. Much more worth it!


They should have a student plan. I see this being very useful for college and university students like me. Your one tablet could contain all of your books and manual!


Do they OCR before converting to PDF? Having a PDF that is just a 'picture' isn't very helpful (can't search for text, change text size, etc).


yes, the pricing page on their site says OCR is included free.


That depends though. Isn't there a way for the PDF page to be an image, but for OCR to produce an index of keywords, so that you are viewing the image, but the index is searchable? This doesn't solve the increase/decrease text size issue (though it does solve the searching issue).


Yes, searchable PDF puts the OCRed text behind the image so that it's not visible but is still searchable.


Wow, maybe we'll finally get a digital edition of "The Four Steps to the Epiphany".


They're not scanning your books for a dollar. They're letting you pay them to give them books. Then they resell them at some discount. Speculation...but that's what I would do.


You'd have a real hard time defending a copyright lawsuit under that model.

The legal key to their business (and to similar businesses such as CD ripping services) is that it is arguably fair use for the owner of a particular copy to make additional copies for his own personal use.

Given that, it is not a copyright violation for a third party (them) to provide a service to help the owner of the copy to make those fair use copies.


The 1DollarScan site says they cut the spines off the books to scan them, so unless that isn't accurate, they aren't reselling the books.


If they get the same book to scan twice, why actually do the second scan? Why not just use the scan from the first copy of the book? And if that is the case, now they have a fresh copy of the book with its spine intact.

It would be a waste of the resources invested in the production of that physical book to not find some productive use for that copy.


Besides the possible copyright infringement problems mentioned elsewhere on this page, I occasionally write notes in my books. If I sent them an older book with a bunch of notes in it and got back a PDF that didn't have those notes, I'd be upset.

A potential compromise, if simply giving out the same PDF is legal, is them offering a big option saying "Our records indicate we have this book on file! Do you want us to immediately send you a PDF copy when we receive your book [potentially discounted pricing or even a percentage of resell profits], or would you like to wait for a scan?" So yeah, now they have to do something with the hard copies, either resell or donate.

Edit: No need to speculate, they claim to shred and recycle everything after processing.


Some people write notes on the margins of the page and underline key-phrases when reading. I don't think they would check each book to see if it was "clean".


How about you just let the customer indicate whether he has a clean book or not? Just charge more for `dirty' books.


You might be able to automate that.


Good point. I assume they would only rescan a book if it was a different edition than what they already had on file.


They have do be doing something with those books. If not then they should be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: