Court rules book scanning is fair use, suggesting Google Books victory

bookworm_ · on Oct 11, 2012

"text mining"

Hmmm. It's not so easy to do this with ebooks viewed in a graphical ebook reader. Note he didn't say "text search". He said mining.

Does this suggest noncommercial library books will offer research capabilities that commercial ebooks will not?

I want my ebooks in ASCII format. And that certainly goes for textbooks.

less(1) is my "ebook reader".

magicalist · on Oct 11, 2012

> Does this suggest noncommercial library books will offer research capabilities that commercial ebooks will not?

If they are scanned noncommercial library books, I guess.

I'm not sure how this is really relevant though. Scanning for this use may be fair use (hooray! some sanity), but the publishers aren't providing an ASCII file to the libraries either, the libraries are making a copy...hence the need for a fair use decision.

pyre · on Oct 11, 2012

Are you suggesting that ebooks should be formatted in groff and devoid of page-breaks?

bookworm_ · on Oct 11, 2012

Should be a format that can be grep'd, made into PostScript, DejaVu, etc. Versatile. ASCII works well for that. Open, non-proprietary format.

Not only that, working with raw text is so much faster than anything else. less(1) is the most responsive "ebook reader" I've ever used. It never chokes on huge files. less -n

Ever try reading 100 or more PDF's in one sitting?

But with 100 ASCII docs, it's possible to skim them fast using less, or mine some text with some UNIX utility.

I like what this judge said. Digitizing books not just about putting the book on a "paper-like" screen with beautiful fonts, it's about enabling new utility. Working with the text.

TeMPOraL · on Oct 11, 2012

> Should be a format that can be grep'd, made into PostScript, DejaVu, etc. Versatile. ASCII works well for that. Open, non-proprietary format.

Org Mode.

http://orgmode.org/

hellrich · on Oct 11, 2012

I'm in for plaintext, but ASCII is ill suited for most languages.

pyre · on Oct 11, 2012

  > It never chokes on huge files

You've never tried to reach the end of a 200MB log file with a cold cache... :P

bookworm_ · on Oct 11, 2012

It can handle files that big no problem. Use the -n switch to supress page numbering.

But with a 200MB file, chances are you will only want to view part of it. Unless you have superhuman reading powers. ;)

Dissect the part you want with some filter (sed, awk, tail, whatever) and feed the result to less.

chris_mahan · on Oct 11, 2012

epub is xhtml+ wrapper

pyre · on Oct 11, 2012

If you're targeting grep as a tool to operate on your data, then HTML isn't the best 'raw' format as grep isn't HTML-aware. It can't ignore markup, or translate encoded values (e.g. & => &).

lucian1900 · on Oct 11, 2012

But ePub as a format is open and well understood, and there are XHTML parsers in every language you might care about.

bduerst · on Oct 11, 2012

I thought we were talking about viewing ebooks.

lucian1900 · on Oct 11, 2012

Yes, I don't want page-breaks in my ebooks. My reader is unlikely to be precisely the size the publisher put breaks on.

lallysingh · on Oct 11, 2012

Technology doesn't make this an either-or situation, and some time reading RFCs really makes the pain of ascii obvious.

Just ask for reflowable pdf. Figures, equations, cross-references (with hyperlinks), actual typography and design, full-text search, and the ability to adapt to your viewer.

If you don't like that implementation, then may I suggest looking into improvements to the algorithm, instead of brutally stripping away content at books to fit an unnecessarily crude model?

lucian1900 · on Oct 14, 2012

I don't understand how PDF could be reflowable.

I'd like a nice ePub, but again, no page-breaks. They'll be wrong for most devices anyway.

kalleboo · on Oct 11, 2012

Textbooks in pure ASCII sounds like a pain. No hyperlinks? No index?

TheGateKeeper · on Oct 11, 2012

As plane-Jane as can be sounds absolutely great for text piping etc.

drivebyacct2 · on Oct 11, 2012

What [major] eBook format keep their content in something other than ASCII (or Unicode or text of some kind)?

waterlesscloud · on Oct 11, 2012

Would copying a VHS tape to a digital format similarly be fair use? Why or why not?

danshapiro · on Oct 11, 2012

Based on reading the article (and no legal knowledge whatsoever), it would appear to depend entirely on who was doing it and why. If libraries were doing it so they could automatically subtitle the work and make it available to deaf viewers, then probably so. If you are doing it so you can hawk DVD versions on street corners for a buck apiece, likely not.

bruceboughton · on Oct 11, 2012

I think what waterlesscloud is getting at is: What about if you want to watch it yourself on your iPad/computer/TV without VCR?

waterlesscloud · on Oct 11, 2012

Well, that certainly. But also more. If it's legal to do this on an industrial scale with books, why not with movies or music?

I think there's some complicated reasoning in the decision, and I'm curious what it will mean in the bigger picture.

Why can't I (or a company) take any analog media and transform them to digital. I'm adding transformative functionality, after all. Digital search capabilities, for example.

Not for resale of the work, of course, but for my own vast database that I sell research access to.

And even if the work has already been translated to a digital format (DVD or CD for example), the decision seems to say there's still nothing wrong with me transferring from analog to digital. If there was, point 4 would be circular as the decision says.

So it seems like this says Google could go digitize a bunch of LPs, VHS tapes, and even celluloid film for similar projects. Google Film. Google Albums. Etc.

I don't really think this holds up, but I can't pin down why it wouldn't.

Bonus thought- what if I took software and made it run on a new platform. Is that also fair use under this decision? Wouldn't that also be transformative use?

Hmmm.

njharman · on Oct 11, 2012

I thought that (media shifting) along with time shifting was already decided to be fair use.

fluxon · on Oct 12, 2012

Dear Google,

Please resume scanning books, old magazines and newspapers, which you "paused" some time back. It was sensible to take a wait-and-see approach until this decision was reached.

Thank you.

mtgx · on Oct 11, 2012

I thought Google and the authors settled? What is this ruling for? Is this in another trial?

Avenger42 · on Oct 11, 2012

Google settled with the publishers [1]. There's still a case between Google and the Authors Guild.

[1]: http://www.washingtonpost.com/blogs/post-tech/post/google-se...

rada · on Oct 11, 2012

Different lawsuit against the libraries that gave those books to Google to scan.