Hacker News new | past | comments | ask | show | jobs | submit login

This is an extremely important effort. The LibGen archive contains around 32 TBs of books (by far the most common being scientific books and textbooks, with a healthy dose of non-STEM). The SciMag archive, backing up Sci-Hub, clocks in at around 67 TBs [0]. This is invaluable data that should not be lost. If you want to contribute, here's a few ways to do so.

If you wish to donate bandwidth or storage, I personally know of at least a few mirroring efforts. Please get in touch with me over at legatusR(at)protonmail(dot)com and I can help direct you towards those behind this effort.

If you don't have storage or bandwidth available, you can still help. Bookwarrior has requested help [1] in developing an HTTP-based decentralizing mechanism for LibGen's various forks. Those with experience in software may help make sure those invaluable archives are never lost.

Another way of contributing is by donating bitcoin, as both LibGen [2] and The-Eye [3] accept donations.

Lastly, you can always contribute books. If you buy a textbook or book, consider uploading it (and scanning it, should it be a physical book) in case it isn't already present in the database.

In any case, this effort has a noble goal, and I believe people of this community can contribute.

P.S. The "Pirate Bay of Science" is actually LibGen, and I favor a title change (I posted it this way as to comply with HN guidelines).

[0] http://185.39.10.101/stat.php

[1] https://imgur.com/a/gmLB5pm

[2] bitcoin:12hQANsSHXxyPPgkhoBMSyHpXmzgVbdDGd?label=libgen, as found at http://185.39.10.101/, listed in https://it.wikipedia.org/wiki/Library_Genesis

[3] Bitcoin address 3Mem5B2o3Qd2zAWEthJxUH28f7itbRttxM, as found in https://the-eye.eu/donate/. You can also buy merchandising from them at https://56k.pizza/.




Sounds like anyone with a seed box could donate some bandwidth and storage by leeching then seeding part of it? It would be nice if there’s a list of seeder/leecher counts (like TPB) or better yet of priority list of parts that need more seeders.

Edit: Found the other comment where you link to the seeding stats: https://docs.google.com/spreadsheets/d/1hqT7dVe8u09eatT93V2x...


Or better yet, a RSS feed that plays nice with auto-retention and quota settings. It just delivers you a bunch of parts that are in need of seeders and you use your existing mechanism to help with it.


For important archives like this maybe we need some sort of turn-key solution for the masses? Like a Raspberry Pi image that maintains a partial mirror. Imagine if one could by a RPi and external HD, burn the image, and connect it to some random wifi network (at home, at work, at the library, etc).


I'm not hosting a copy of this at work (where we easily have 32TB on old hardware) since distributing it is copyright infringement. The same goes for my home connection.


Most people don't care. The chance anything at all bad will happen is so incredibly low.


This isn't even movies wherein some large studio's can send notices. I don't think publishing houses have that many funds to send so many legal notices.

Books are a safe bet to pirate


That's what people said about music and films too. You don't want to be the next Jammie Thomas.

This is an existential threat to the deep-pocketed likes of Elsevier et al. They will use the law to make an example of anyone too close to their sphere of influence; so if you are in the US or the EU; support the efforts of LibGen vocally and loudly, and contribute anonymously, but don't risk your neck to the extent where they can get a hold of you.

There are plenty of ways to support the effort safely though. Make sure people who wish to access scientific papers and books know where to go, and make sure your elected officials know about the need for publicly funded science to be published free of charge, open access, (retroactively too).


I'm guessing a pretty significant minority of HN's users maintain offshore seed boxes to get other copyrighted content and for them it might be pretty trivial to add partial peering of libgen content.


I think a turn-key solution for people living in not US/EU will still help the general health of the archive.


At least the large academic publishers are sitting on enormous stacks of cash, so that argument doesn't fly.


I just read the article and your comments here and I'm a bit unsure what's the difference to the Internet Archive. Is it that the IA can archive them but not make them public for legal reasons and The-Eye is more focused on keeping them online and accessible no matter what?


Yes. It is extremely likely IA has the LibGen corpus archived, but darked (inaccessible), to prevent litigation.


There are quite a few such copies, on the 'just in case' principle.


> Lastly, you can always contribute books. If you buy a textbook or book, consider uploading it (and scanning it, should it be a physical book) in case it isn't already present in the database.

There's no easy solution for scanning physical books, is there?


There are providers [1] that will destructively scan the book for you and return a PDF. If you want to preserve the book, you're stuck using a scanning rig [2]. The Internet Archive will also non-destructively scan as part of Open Library [3], but they only permit one checkout at a time of scanned works, and the latency can be high between sending them a book and it becoming available. FYI, 600 DPI is preferred for archival purposes.

[1] http://1dollarscan.com/ (no affiliation, just a satisfied customer, can't scan certain textbooks due to publisher threats of litigation)

[2] https://www.diybookscanner.org/

[3] https://openlibrary.org/help/faq


A big +1 for 1dollarscan.com. They've scanned many hundreds of books for me. The quality of the resulting PDFs is uniformly excellent, their turnaround time is fast, and their prices are cheap ($1 per 100 pages).

I've visited their office -- located in an inexpensive industrial district of San Jose -- on multiple occasions. They have a convenient process for receiving books in person.

I believe the owners are Japanese and the operation reminds me of the businesses I visited in Tokyo: quiet, neat, and über-efficient.


> quiet, neat, and über-efficient

I wish the same could be said for the Tokyo office I work in!


I will add a vote for bookscan.us, which I have been using since 2013 or so. Very reasonable prices and great service.


There are DIY book scanners (http://diybookscanner.org) and products such as the Fujitsu ScanSnap SV600. The SV600 has decent features like page-detection and finger-removal (I recommend using a pencil's eraser tip). I have personally used it to scan dozens of books, with satisfactory results.


Just saw a father who had to do it fully manually for her blind daughter. I shall show your comment to him.


Scanning with your phone is getting easier. At a minimum you can take a pic of each of the pages. Software can clean up the images, sorta. It's not ideal but it's better than nothing.


I remember when "cammed" books were bottom-tier and basically limited to things like 0day releases, even when done with an expensive DSLR. It's amazing how much camera technology has progressed since then; in less than a second you can get a high-resolution, extremely readable image of each page.

I used to participate in the "bookz scene", well over a decade ago. Raiding the local public libraries --- borrowing as many books as we could --- and having "scanparties" to digitise and upload them was incredibly fun, and we did it for the thrill, never thinking that one day almost all of our releases would end up in LibGen.


I found vFlat to be magical in cleaning up book scan images you took with your phone.

https://play.google.com/store/apps/details?id=com.voyagerx.s...


>This app is incompatible with your device.

my disappointment is immeasurable and my day is ruined


I use bookscan.us for this purpose: I mail the physical book to them and they send me a file a few days later for a very reasonable price.


Unfortunately it’s a destructive process.


Your local physical library may make a book scanner available. Mine does, with a posted 60-pages-at-a-time limit (though I don't know how this is enforced).


Mind explaining the origin of your 32 TB figure? I must be missing something enormous, but as far as I can tell the SciMag database dump is 9.3 GB, the LibGen non-fiction dump is 3.2 GB, and the LibGen fiction dump is 757 MB. That's a pretty huge divergence.

Source: http://gen.lib.rus.ec/dbdumps/


Oh, wait. I'm dumb. I see that your first link is a citation.

Continuing to be dense, why is there a difference between their "database dump" and the total of all the files they have?


The databases contain the metadata (authors, edition, ISBN, etc.) for the books.

Thus, 32 TB of books (over 2 million titles), 3.2 GB database.


Ah, that makes sense.

To make sure I'm understanding this correctly:

The Libgen Desktop application (which requires only a copy of the database) would then use the DB metadata to make LibGen locally searchable, and would only retrieve the individual books/papers on request?


I guess it's stunningly obvious to everyone else, but how are you certain the replacement isn't worse than the original system. I already see comments about the curation problem, for example. What's the point in making bad information (duplicate information etc.) highly available? Why put so much faith in this donation strategy i.e. donating bandwidth or donating money?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: