This is an extremely important effort. The LibGen archive contains around 32 TBs...

oefrha · on Dec 3, 2019

Sounds like anyone with a seed box could donate some bandwidth and storage by leeching then seeding part of it? It would be nice if there’s a list of seeder/leecher counts (like TPB) or better yet of priority list of parts that need more seeders.

Edit: Found the other comment where you link to the seeding stats: https://docs.google.com/spreadsheets/d/1hqT7dVe8u09eatT93V2x...

namibj · on Dec 3, 2019

Or better yet, a RSS feed that plays nice with auto-retention and quota settings. It just delivers you a bunch of parts that are in need of seeders and you use your existing mechanism to help with it.

guidoism · on Dec 3, 2019

For important archives like this maybe we need some sort of turn-key solution for the masses? Like a Raspberry Pi image that maintains a partial mirror. Imagine if one could by a RPi and external HD, burn the image, and connect it to some random wifi network (at home, at work, at the library, etc).

Symbiote · on Dec 3, 2019

I'm not hosting a copy of this at work (where we easily have 32TB on old hardware) since distributing it is copyright infringement. The same goes for my home connection.

jdnenej · on Dec 4, 2019

Most people don't care. The chance anything at all bad will happen is so incredibly low.

avgeek23 · on Dec 4, 2019

This isn't even movies wherein some large studio's can send notices. I don't think publishing houses have that many funds to send so many legal notices.

Books are a safe bet to pirate

Freak_NL · on Dec 4, 2019

That's what people said about music and films too. You don't want to be the next Jammie Thomas.

This is an existential threat to the deep-pocketed likes of Elsevier et al. They will use the law to make an example of anyone too close to their sphere of influence; so if you are in the US or the EU; support the efforts of LibGen vocally and loudly, and contribute anonymously, but don't risk your neck to the extent where they can get a hold of you.

There are plenty of ways to support the effort safely though. Make sure people who wish to access scientific papers and books know where to go, and make sure your elected officials know about the need for publicly funded science to be published free of charge, open access, (retroactively too).

L_Rahman · on Dec 4, 2019

I'm guessing a pretty significant minority of HN's users maintain offshore seed boxes to get other copyrighted content and for them it might be pretty trivial to add partial peering of libgen content.

novok · on Dec 4, 2019

I think a turn-key solution for people living in not US/EU will still help the general health of the archive.

Vinnl · on Dec 4, 2019

At least the large academic publishers are sitting on enormous stacks of cash, so that argument doesn't fly.

dewey · on Dec 3, 2019

I just read the article and your comments here and I'm a bit unsure what's the difference to the Internet Archive. Is it that the IA can archive them but not make them public for legal reasons and The-Eye is more focused on keeping them online and accessible no matter what?

toomuchtodo · on Dec 3, 2019

Yes. It is extremely likely IA has the LibGen corpus archived, but darked (inaccessible), to prevent litigation.

jacquesm · on Dec 3, 2019

There are quite a few such copies, on the 'just in case' principle.

canuckintime · on Dec 3, 2019

> Lastly, you can always contribute books. If you buy a textbook or book, consider uploading it (and scanning it, should it be a physical book) in case it isn't already present in the database.

There's no easy solution for scanning physical books, is there?

toomuchtodo · on Dec 3, 2019

There are providers [1] that will destructively scan the book for you and return a PDF. If you want to preserve the book, you're stuck using a scanning rig [2]. The Internet Archive will also non-destructively scan as part of Open Library [3], but they only permit one checkout at a time of scanned works, and the latency can be high between sending them a book and it becoming available. FYI, 600 DPI is preferred for archival purposes.

[1] http://1dollarscan.com/ (no affiliation, just a satisfied customer, can't scan certain textbooks due to publisher threats of litigation)

[2] https://www.diybookscanner.org/

[3] https://openlibrary.org/help/faq

bumbledraven · on Dec 3, 2019

A big +1 for 1dollarscan.com. They've scanned many hundreds of books for me. The quality of the resulting PDFs is uniformly excellent, their turnaround time is fast, and their prices are cheap ($1 per 100 pages).

I've visited their office -- located in an inexpensive industrial district of San Jose -- on multiple occasions. They have a convenient process for receiving books in person.

I believe the owners are Japanese and the operation reminds me of the businesses I visited in Tokyo: quiet, neat, and über-efficient.

dunstad · on Dec 4, 2019

> quiet, neat, and über-efficient

I wish the same could be said for the Tokyo office I work in!

abawany · on Dec 4, 2019

I will add a vote for bookscan.us, which I have been using since 2013 or so. Very reasonable prices and great service.

clockman · on Dec 3, 2019

There are DIY book scanners (http://diybookscanner.org) and products such as the Fujitsu ScanSnap SV600. The SV600 has decent features like page-detection and finger-removal (I recommend using a pencil's eraser tip). I have personally used it to scan dozens of books, with satisfactory results.

agumonkey · on Dec 3, 2019

Just saw a father who had to do it fully manually for her blind daughter. I shall show your comment to him.

guidoism · on Dec 3, 2019

Scanning with your phone is getting easier. At a minimum you can take a pic of each of the pages. Software can clean up the images, sorta. It's not ideal but it's better than nothing.

userbinator · on Dec 4, 2019

I remember when "cammed" books were bottom-tier and basically limited to things like 0day releases, even when done with an expensive DSLR. It's amazing how much camera technology has progressed since then; in less than a second you can get a high-resolution, extremely readable image of each page.

I used to participate in the "bookz scene", well over a decade ago. Raiding the local public libraries --- borrowing as many books as we could --- and having "scanparties" to digitise and upload them was incredibly fun, and we did it for the thrill, never thinking that one day almost all of our releases would end up in LibGen.

sanxiyn · on Dec 3, 2019

I found vFlat to be magical in cleaning up book scan images you took with your phone.

https://play.google.com/store/apps/details?id=com.voyagerx.s...

bobcostas55 · on Dec 3, 2019

>This app is incompatible with your device.

my disappointment is immeasurable and my day is ruined

abawany · on Dec 4, 2019

I use bookscan.us for this purpose: I mail the physical book to them and they send me a file a few days later for a very reasonable price.

repiret · on Dec 4, 2019

Unfortunately it’s a destructive process.

mintplant · on Dec 3, 2019

Your local physical library may make a book scanner available. Mine does, with a posted 60-pages-at-a-time limit (though I don't know how this is enforced).

trogdor · on Dec 9, 2019

Mind explaining the origin of your 32 TB figure? I must be missing something enormous, but as far as I can tell the SciMag database dump is 9.3 GB, the LibGen non-fiction dump is 3.2 GB, and the LibGen fiction dump is 757 MB. That's a pretty huge divergence.

Source: http://gen.lib.rus.ec/dbdumps/

trogdor · on Dec 9, 2019

Oh, wait. I'm dumb. I see that your first link is a citation.

Continuing to be dense, why is there a difference between their "database dump" and the total of all the files they have?

nub · on Dec 10, 2019

The databases contain the metadata (authors, edition, ISBN, etc.) for the books.

Thus, 32 TB of books (over 2 million titles), 3.2 GB database.

trogdor · on Dec 10, 2019

Ah, that makes sense.

To make sure I'm understanding this correctly:

The Libgen Desktop application (which requires only a copy of the database) would then use the DB metadata to make LibGen locally searchable, and would only retrieve the individual books/papers on request?

0xdeadbeefbabe · on Dec 3, 2019

I guess it's stunningly obvious to everyone else, but how are you certain the replacement isn't worse than the original system. I already see comments about the curation problem, for example. What's the point in making bad information (duplicate information etc.) highly available? Why put so much faith in this donation strategy i.e. donating bandwidth or donating money?