Hacker News new | past | comments | ask | show | jobs | submit login
Court rules book scanning is fair use, suggesting Google Books victory (arstechnica.com)
167 points by abraham on Oct 11, 2012 | hide | past | favorite | 27 comments



"text mining"

Hmmm. It's not so easy to do this with ebooks viewed in a graphical ebook reader. Note he didn't say "text search". He said mining.

Does this suggest noncommercial library books will offer research capabilities that commercial ebooks will not?

I want my ebooks in ASCII format. And that certainly goes for textbooks.

less(1) is my "ebook reader".


> Does this suggest noncommercial library books will offer research capabilities that commercial ebooks will not?

If they are scanned noncommercial library books, I guess.

I'm not sure how this is really relevant though. Scanning for this use may be fair use (hooray! some sanity), but the publishers aren't providing an ASCII file to the libraries either, the libraries are making a copy...hence the need for a fair use decision.


Are you suggesting that ebooks should be formatted in groff and devoid of page-breaks?


Should be a format that can be grep'd, made into PostScript, DejaVu, etc. Versatile. ASCII works well for that. Open, non-proprietary format.

Not only that, working with raw text is so much faster than anything else. less(1) is the most responsive "ebook reader" I've ever used. It never chokes on huge files. less -n

Ever try reading 100 or more PDF's in one sitting?

But with 100 ASCII docs, it's possible to skim them fast using less, or mine some text with some UNIX utility.

I like what this judge said. Digitizing books not just about putting the book on a "paper-like" screen with beautiful fonts, it's about enabling new utility. Working with the text.


> Should be a format that can be grep'd, made into PostScript, DejaVu, etc. Versatile. ASCII works well for that. Open, non-proprietary format.

Org Mode.

http://orgmode.org/


I'm in for plaintext, but ASCII is ill suited for most languages.


  > It never chokes on huge files
You've never tried to reach the end of a 200MB log file with a cold cache... :P


It can handle files that big no problem. Use the -n switch to supress page numbering.

But with a 200MB file, chances are you will only want to view part of it. Unless you have superhuman reading powers. ;)

Dissect the part you want with some filter (sed, awk, tail, whatever) and feed the result to less.


epub is xhtml+ wrapper


If you're targeting grep as a tool to operate on your data, then HTML isn't the best 'raw' format as grep isn't HTML-aware. It can't ignore markup, or translate encoded values (e.g. & => &).


But ePub as a format is open and well understood, and there are XHTML parsers in every language you might care about.


I thought we were talking about viewing ebooks.


Yes, I don't want page-breaks in my ebooks. My reader is unlikely to be precisely the size the publisher put breaks on.


Technology doesn't make this an either-or situation, and some time reading RFCs really makes the pain of ascii obvious.

Just ask for reflowable pdf. Figures, equations, cross-references (with hyperlinks), actual typography and design, full-text search, and the ability to adapt to your viewer.

If you don't like that implementation, then may I suggest looking into improvements to the algorithm, instead of brutally stripping away content at books to fit an unnecessarily crude model?


I don't understand how PDF could be reflowable.

I'd like a nice ePub, but again, no page-breaks. They'll be wrong for most devices anyway.


Textbooks in pure ASCII sounds like a pain. No hyperlinks? No index?


As plane-Jane as can be sounds absolutely great for text piping etc.


What [major] eBook format keep their content in something other than ASCII (or Unicode or text of some kind)?


Would copying a VHS tape to a digital format similarly be fair use? Why or why not?


Based on reading the article (and no legal knowledge whatsoever), it would appear to depend entirely on who was doing it and why. If libraries were doing it so they could automatically subtitle the work and make it available to deaf viewers, then probably so. If you are doing it so you can hawk DVD versions on street corners for a buck apiece, likely not.


I think what waterlesscloud is getting at is: What about if you want to watch it yourself on your iPad/computer/TV without VCR?


Well, that certainly. But also more. If it's legal to do this on an industrial scale with books, why not with movies or music?

I think there's some complicated reasoning in the decision, and I'm curious what it will mean in the bigger picture.

Why can't I (or a company) take any analog media and transform them to digital. I'm adding transformative functionality, after all. Digital search capabilities, for example.

Not for resale of the work, of course, but for my own vast database that I sell research access to.

And even if the work has already been translated to a digital format (DVD or CD for example), the decision seems to say there's still nothing wrong with me transferring from analog to digital. If there was, point 4 would be circular as the decision says.

So it seems like this says Google could go digitize a bunch of LPs, VHS tapes, and even celluloid film for similar projects. Google Film. Google Albums. Etc.

I don't really think this holds up, but I can't pin down why it wouldn't.

Bonus thought- what if I took software and made it run on a new platform. Is that also fair use under this decision? Wouldn't that also be transformative use?

Hmmm.


I thought that (media shifting) along with time shifting was already decided to be fair use.


Dear Google,

Please resume scanning books, old magazines and newspapers, which you "paused" some time back. It was sensible to take a wait-and-see approach until this decision was reached.

Thank you.


I thought Google and the authors settled? What is this ruling for? Is this in another trial?


Google settled with the publishers [1]. There's still a case between Google and the Authors Guild.

[1]: http://www.washingtonpost.com/blogs/post-tech/post/google-se...


Different lawsuit against the libraries that gave those books to Google to scan.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: