ArchiveTeam has saved cached Reader feed data for 37.3M feeds so far, and even though this seems like a lot, it still doesn't include many of the feeds people are subscribed to. Hence the request for OPMLs/subscriptions.xml files.
If you're interested in being able to read old posts in some future feed reading software, or just like having the data preserved, you can upload your OPMLs and ArchiveTeam will make its best effort to grab the feeds.
Also, if anyone has billions of URLs that I can query, I could use them to infer feed URLs and save an incredible amount of stuff. See email in profile if you do.
Cool, installing the applications on docker on my dedi.
I have a couple of question though:
Will the data remain archived on my system after it is updated? And what format will that be in?
Will there be a public API to access this data once uploaded, or for services such as Feedly to import back entries from feeds? (I would hope they would support that, but the public API would be enough for me.)
After the greader*-grab programs upload data to the target server, it is removed from your machine. All of the data eventually ends up in WARCs at https://archive.org/details/archiveteam_greader
As for an API, someone will hopefully write one to directly seek into a megawarc in that archive.org collection, or import everything into their feed reading service.
No, the data will be uploaded - first to an staging server run by ivank/"ArchiveTeam". Then it will be uploaded to the Internet Archive (Some has been uploaded already: https://archive.org/details/archiveteam_greader)
No, not currently. But the Internet Archive will provide the raw data and anyone is free to setup such an API :-)
This isn't relevant (except tangentially to the Internet Archive's Wayback machine), but I'm curious about the ethics (or legal standing) of rehosting the Google Reader app (client side portions) with a re-implementation of the (internal) Google Reader API so that the app remains usable in an unchanging state.
I am sorry but I don't think people should upload their OPML like this.
Your feed collection is like your personal life. It should be private (so what if the URLs are public and general). By disclosing your collection, it is like you are living in a glass house.
Unless I'm missing something, I don't see a single helpful reason for this service.
Feel free to remove any feeds that make you insecure in submitting your feeds.
If you feel like you'd want to submit them, but hide.. I don't know, that some feeds might be relevant to each other? or something like that: then you could, split up your list of feeds and submit them in chunks that make sense to you. From different IPs or what not.
All of the historical data from going through all feeds through Google Reader, will be uploaded to the Internet Archive.
That means it will be available to everyone.
This is a non-profit service run by volunteers, that believe in saving data - because there's smart and creative people around the world (High concentration on HN) that can do good things with data.
I can think of one example: All of the new RSS Reader services could slurp this data in and provide you with a better service (and they won't know the feed URLs came from you)
The point is most feeds do not have all of their historical posts. Google Reader preserved old posts in the feeds. ArchiveTeam is taking our OPMLs and getting a copy of Reader's archive before it is taken down.
If you plug in a feed in InoReader, you get thousands of items from the past. It seems they are fetching the historical feeds (I can't tell how far back they go).
But why would ArchiveTeam wants to preserve the historical items in a feed if the feed does not belong to them in the first place (neither did it belong to Google)?
No, there is a difference. An old book at the brink of extinction still belongs to you. You can get third party services to preserve the book for you with the stipulation that your preservation work and the book carries decent privacy rights (it won't be broadcast to the world what you're doing). Remember the old days when you would go to a store to develop your camera roll? The service implied that your picture content is between you and the developer of the film.
It's fine if people don't see any privacy implication here by submitting their reading collection. But as far as my single individual point is concerned, I don't see why I should upload my OPML for the sake of preservation. I have hard time uploading it to any other Google replacement out there trying to compete.
Oh, so you're not really worried about people downloading the content of the blogs, you just don't want anyone to connect the list of blogs to you? Well, you can just take the URLs out of the OPML file, and submit them one at a time. You can even use different IP addresses if you're that paranoid.
Several thinks ArchiveTeam have grabbed a lot more historical than what's available at for example InoReader. I personally think so as well.
There's also no telling if InoReader is open for grabbing what they've grabbed already again. Meaning it's potentially behind closed doors.
ArchiveTeam submits the data to Internet Archive, which anyone can upload and download from. This data is continously being uploaded and made public and free. See https://archive.org/details/archiveteam_greader for example.
Anyone can do anything with that data. Your OPML files are not being submitted though. That's also being said on the linked site for this item.
> I am sorry but I don't think people should upload their OPML like this.
It's not all or nothing. OPML is easy to edit, and I did just that before I uploaded my OPML - deleted anything which might be a security issue. It took like 10 seconds.
If you're interested in being able to read old posts in some future feed reading software, or just like having the data preserved, you can upload your OPMLs and ArchiveTeam will make its best effort to grab the feeds.
More details: http://archiveteam.org/index.php?title=Google_Reader
7TB+ of compressed feed text: http://archive.org/details/archiveteam_greader
Also, if anyone has billions of URLs that I can query, I could use them to infer feed URLs and save an incredible amount of stuff. See email in profile if you do.