Hacker News new | past | comments | ask | show | jobs | submit login
How to Make a Full Auto Book Scanner (geocities.jp)
87 points by llambda on May 18, 2013 | hide | past | favorite | 47 comments



I've been scanning my (several thousand books) for over a year now. Using a flatbed scanner is completely impractical at 1 minute per page. I:

1. Use a stack slicer to cut the spine off 2. Feed it through a sheet feeding scanner 3. Check results 4. Do any necessary rescans and stitch the result together with pdftk 5. Chuck the (now destroyed) book into the recycling bin

I have around 60Gb of books now, and have emptied many, many shelves and boxes. It's nice that I can now put my entire library onto my laptop.

It doesn't take that long per book, maybe 5 minutes. The main time-wasters are when a book has damaged pages so the hopper feeder doesn't work right on it, or when the bookbinding glue has spread too far and pages are stuck together beyond where the spine cut was made.

The hopper scanner will scan both sides at once in a couple seconds.

I run the scanner at 400dpi, which is far better than any of the current ereaders will display them. But I figure it's future proofing them. My current ereader of choice is the Kobo Aura that has the retina e-ink display. It's well suited to reading scanned books because:

1. the retina display eliminates the jaggies 2. the glow screen works well in low light conditions 3. the larger screen size is suited to scanned book pages 4. the 32Gb microsd slot enables me to carry around at least half of my library :-)


> 1. Use a stack slicer to cut the spine off

Ouch. This really seems like a shame. I imagine you're not destroying any books of 'value', but still... Something about willingly destroying thousands of books... :/


I did the same thing, it is depressing, but even more so that it's often the only way to get a decent digital copy of a book:

http://41j.com/blog/2012/02/how-to-get-a-digital-copy-of-the...


One of the books was self-published, so I wrote the author and offered to "kindle-ize" it and we'd split the proceeds. He agreed, and we've both made a tiny profit.

Only 2 or 3% of my books are available in digital format.


No need to cut off the spine. Here's how Google Books did it:

http://hackaday.com/2012/11/16/google-books-team-open-source...

Hackaday's first image tells the whole story, but direct link here: https://code.google.com/p/linear-book-scanner/


Most business level copy machines will quickly scan hundreds or thousands of pages, both sides, and assemble into a PDF. If I was going to do this, yes, I would slice the spine off and either scan at work or rent a machine. I don't have any rare books. Just books I've collected over the years.

Unfortunately most of them are art/architecture books. I'd really like to have very high res original source ebooks. By very high-res I mean I'd like to have say 300dpi or higher at the size of the book or larger. I'm imagining having full color e-ink or some other awesome tech in the next 10 years that lets me view large coffee table art books at their original size or larger and unfortunately I think scanning them, even in high-res would not really do them justice.

Imagining full wall displays it would be awesome to be able to view architectural pictures wall size and not have them look lo-res and blurry.


Are you in the US and if so, on/near the East coast by any chance? I might have a use for a large batch of sliced pages...


Might be easier calling your local schools and asking if they're getting any new textbooks, and whether you can have the old ones.


Thanks, I haven't had much luck with that here in NYC but maybe I've been talking to the wrong people. Also I would guess that people on HN might have titles a little closer to what I'm looking for.


Probably not the place to ask, but what do you need them for? I think there's still places that sell books by the yard. You can also try estate/yard sales, and just offer ten bucks to haul all of their books away.


That's how I buy a lot of my books currently (estate sales, library clearance, etc) but regarding the sliced pages: I have a very large personal library that can't be damaged or marked up for a variety of reasons. It just so happens that I have three uses for well-sliced pages of (preferably) interesting books: for sculpture, for rebinding practice (what better source for testing out different glue mixtures or techniques), and lastly to spiral bind a few into markable workbooks (I used to do this with tattered textbooks and then give them away). I can't really go into it from my phone, but suffice to say I have a soft spot for recycling old books and see it as a possible opportunity to find a non-standard selection.

Edit: Also, if someone is just throwing them out that saves me money and therefor time trying to decide what to take.


Many second hand book shops have too much stock, to the extent that it causes them storage problems. If you ask them they will give you lots of (shitty) books for a low price.


I prefer OPs method of digitalising the books. I'd feel bad for destroying the books and throwing them away instead of giving them away to some charity once I got my digital copy.


At first I felt terrible about destroying the books. All I can say is one can get used to anything :-)

What helps is noticing that the vast majority go used on Amazon for less than a buck. Not even charities want to deal with those.


It's also sobering to ask the buyer at Half Price books how many of the books they actually plan to keep after they give you an estimate. A lot of them get recycled.


As an experiment, I took a carton of books to Half Price. They went through them, and offered to buy 3 of them for a few dollars. The rest they said they'd take for free, but wouldn't pay anything for them.

They had limited shelf space, and the books that didn't sell quickly went to the recycler.

The problem with selling books on Amazon is if it, plus packaging, is over 13 oz, you have to take it to the post office. This, of course, destroys any economic or environmental benefit to selling them for less than 15 or 20 bucks.

Very, very few used books are worth that much. I do have some, and they've sat for sale on Amazon for over a year. Books less than 13 oz I'll sell for a few bucks on Amazon, but they rarely sell, either.

So I cut & scan with no guilt about destroying the books. It's sad, but they are worthless.


> The problem with selling books on Amazon is if it, plus packaging, is over 13 oz, you have to take it to the post office. This, of course, destroys any economic or environmental benefit to selling them for less than 15 or 20 bucks.

It's clear that a lot of Amazon book sellers don't value their time as highly as you do. They're sitting on a lot of warehoused books and basically making their money off the difference between media mail shipping charges and the standard cost of shipping on Amazon.


Using FBA you can ship books in bulk to Amazon, which might make it more economical to sell some of the lower priced books without the overhead of individual shipping.


They turn a lot of them into books by the yard to sell for decoration too.


Which scanner are you using, if I may ask?


5 min per book both sides at the se time makes an impressive scanner. Is it a professional one?


Fujitsu fi-5120C

Newer models are available.

For the volume I am doing, professional grade equipment is a must. I also paid around $300 for a used pro stack slicer. Trying to cut the spine off with anything less is just impractical.


FWIW I do the same thing with my older, more destroyable text books, but I use a table saw to remove the spine and it works great.


I have a table saw, but I'm afraid of it. Though I did manage to seriously gash a finger with the stack slicer anyway. The blade on that is incredibly sharp.


I've read that the rollers will eventually wear, but that Fujitsu sells replacement roller kits that, while not inexpensive, thoroughly refurbish and solve this problem.

Any experience with this and/or observations on durability?


I've found that regularly wiping down the rollers with denatured alcohol seems to "restore" them, so no, I've never replaced the consumables, and I've run hundreds of thousands of pages through it.

The real problem is dirt gets behind the glass window and makes streaks on the scan. It's all sealed up and I can't figure out how to get behind it and clean it.

The other thing is, you can often find lightly used models on ebay that are cheap because they are missing things like the tray, hopper, and power brick. Just get one of those, and move over your other parts to it, and you've essentially got a new one ready to rock.


Is there any benefit to cutting the spine vs using heat to loosen the glue and remove the pages manually?

I have been doing this with old video game strategy guides for about 6 months now. To me it provides a better solution because I still have the spine and all the content which is printed very close to the book's seam. It does take a LOT longer to scan page-by-page, but I also get to keep the single pages instead of throwing them away.

I have also tried using an exacto knife but the results aren't as clean. I am still waiting for a better solution and your comment certainly does make me think.


Cutting the spine is quick and easy. Even if you remove the glue by some method, the pages are still sewn together. I have thousands of books to go through, so being fast is important.

Yes, it can be a problem with some pages that are printed very close to the spine. I keep an eye out for those and handle them separately. That's usually only an issue for paperbacks from the 50's and earlier.


I just love having all my books searchable, I use http://1dollarscan.com/ quite regularly for this purpose. It keeps these things really simple, esp. because it still takes me with regular book scanners (the non fully-automated ones) about half an hour scanning a 300 page book.


Wish I knew about these guys a year ago, when I got rid of most of my books to make room for a nursery.

My book search solution is to upload a list of ISBNs to google books and do "search within my library". It doesn't have full coverage, but is better than nothing.


I'd never heard of them before. I wonder if they keep the old scans and resell duplicate books. They could make a bunch of extra money on half.com or similar. Depending on how they're scanning the books it'd also save time. Then again, it could be that they're required to destroy the scanned book in order for it to be legal.


Having run a textbook scanning service (http://www.ptrfy.com/) for a while now, I can say assuredly that keeping master digital copies or the physical processed books is a substantial legal liability.


Would it be possible to avoid the problem by storing scanned books in an encrypted form where the key is some words from random page?


The process to digitise it they use ends up partially destroying the book; at least, it removes the spine ("step 2" of http://1dollarscan.com/works.php). They do mention that they keep the book around for 2 weeks (to allow rescans) but then they recycle it.

However, doing some more research, it seems that for certain items they can return it:

  Q. Are books ever returned?

  A. First of all, please understand that we do NOT return 
  any books after they are scanned. The books are recycled 
  after they are cut and this is part of our operation  
  practice.

  The exceptions for returns are for photos, material you 
  wrote yourself, material you own the copyright to and 
  only a few more. If one of these exceptions or a similar 
  exception applies to your order and you want the items 
  returned, then please purchase a return option($5) at the 
  time of your order.
(http://1dollarscan.com/faq.php)


Really wish this was around elsewhere. Apparently no such service exists in the UK, so when I purged all my dead trees, I just scanned the bar codes as I didn't have the time to scan the whole books (or figure out how to automate it).


I often buy books directly from amazon.com (even used ones) and get the seller to ship them to 1$scan. By creating an account you get a personal ID and as long that is visible on the address label the book is filed under your account.


Most of my books are far too old for bar codes.


For fully automated, non-destructive scanning of rare books and the like, check out the APT BookScan 2400: http://www.youtube.com/watch?v=gjm6dBNlPug

These guys (no affiliation) offer scanning services using the 2400: http://www.merrittgraphics.com/services/scanning/bookscan.ph...


That's awesome, how much ... wait, 'call us... download our credit application'?? Ouch, must be pricey.


According to some documents I found with Google, one of those machines costs something like 100,000-120,000€.


Ha, who knew. Geocities lives on in Japan.


That was the first thing I noticed about this post! :)


Partially OT: For people using Canon compact cameras together with the CHDK firmware in DIY imagers/scanners, which models are preferred these days? My A640 was stolen and I've been wondering what to replace it with.


Hi! You want to look at the Canon PowerShot A810.

That is what we currently ship our kits with at http://diybookscanner.eu .


This scanner design from last year is pretty neat too. https://code.google.com/p/linear-book-scanner/


Talking of scanning machines, this is pretty cool:

http://youtu.be/W1-2DmDmZgI


Spooky, by pure coincidence that video uses the same book (different edition) for their demo that my blog post does for 1DollarScan:

http://thomaspark.me/2012/12/digitizing-books-on-the-cheap-a...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: