Hacker News new | past | comments | ask | show | jobs | submit login
All the documents that Google deleted from Google Groups, saved by Archive Team (archive.org)
151 points by sp332 on Feb 13, 2012 | hide | past | favorite | 21 comments



This guys and gals really do a great job of stepping up to help save our history. Often times they have very little notice making their job incredibly difficult.

If you ever run a service that contains user created data, please be responsible when shutting it down by providing a way to archive it.

Congrats


>If you ever run a service that contains user created data, please be responsible when shutting it down by providing a way to archive it.

I actually find this to be an interesting question. I certainly feel that the Google Groups stuff is likely worth saving. Project histories, answers to obscure technical problems, etc.

But what about something like grouphug.us? Or even Facebook? I've got to imagine there are things put out there in the irresponsibility of youth or the first blushes of adulthood that someone is practically aching to have disappear from the Internet. To have the Internet Archive come in and save it all might be... unfortunate for some.


  To have the Internet Archive come in and save it all might 
  be... unfortunate for some.
Say you write something at the age of 14 that's regrettable at the age of 24, acutely embarrassing at 34, and career-ending at 44.

That sucks. I feel you.

How about fifty years later? A hundred? Two hundred? A thousand?

Archive.org archives for the ages. There's a brief band of time when you're alive to be embarrassed by a statement, then the vast gulf of eternity where that archived stupidity may be the only evidence that you were ever alive at all.


When I think about this it makes me think about Pompeii. The sheer delight we have at discovering the preserved minutia of everyday life -- but 1000 years old. Of course, if you asked those people what they’d rather have, I bet they would have taken being forgotten.

For most, embarrassing Facebook posts are not life and death or even career ending. But I’ve got to think that most of us want better control over our legacy, even if that’s an unrealistic aim. I’m not saying that archiving it is a bad thing, I’m just wondering what level of control we’re allowed over what the future thinks of us.


> But what about something like grouphug.us? Or even Facebook? I've got to imagine there are things put out there in the irresponsibility of youth or the first blushes of adulthood that someone is practically aching to have disappear from the Internet.

Of course. And there's tons of awesome stuff in those sites - just like Google Groups. You're wrong just like all the people mocking Archive Team for caring about Geocities or Friendster are wrong, you just draw the line around something you have experience with, is all.

At these sizes, your intuitive impressions of average quality are completely irrelevant. It's all worth saving.


It's not about the quality of the content on sites like grouphug.us, but the nature of the content - things that the poster might be happy to be forgotten in 20 years time.

Previous generations youthful indiscretions tended not to be preserved for all time, for the most part.


Even better. All the stupid stuff you did on the Internet that you've forgotten that you did. There are no doubt numerous examples that could come to haunt me in later years. Stuff I don't even remember.

Part of the problem with putting your thoughts down in writing is that if you change your mind later the writing is still there. I think that in the years to come were going to see more scandals arise from this sort of thing.


My point is that collective "worth" is not the only value that people may wish to judge these actions by. People may have strong individual reactions to what's being saved. If an Interstate is built through my backyard, or a cemetery or whatever it might be worth it, but that doesn't mean that people directly affected won't try to argue that the worth of a project does not necessarily justify it.

If -- as a teenager -- I wrote something that I'm now embarrassed by (which I did), or if -- totally unbeknownst to me -- some unflattering pictures of me show up on Facebook (which have), I may not wish for them to be saved, but for them to disappear, along with the day, week or month of my life that produced them.

And frankly, I'm not mocking anyone. I'm trying to ask a question which I think it's totally fair to ask, who "owns" user generated data? Does the act of putting it out there make it effectively public domain, or can I, when cooler heads and soberer minds prevail, choose to recall it?


> But what about something like grouphug.us? Or even Facebook?

This may sound a bit enraged, but I HOPE that someday Facebook shuts down removing and all 850M users lose their information, messaging histories, etc. I think it's the only way to teach them to nnot trust the only copy of such things to an external entity.

PS. I _really_ archive all my emails offline (with backups, of course), exception for some mailing list. It's 15 yeasr worth of emails now.


If you hold the copyright to something, you can write to Archive and have them delete it.


Maybe you can have the Internet Archive delete it. But I seriously doubt that the Archive Team would respect your request. They work together sometimes, but they are not the same group.


I missed it -- why did Google decide to purge the documents in the first place? That can't be that short on disk space, can they?


Yeh, What happened to organizing the world's information? I guess that's archive.org's job.


I'm sure the have copies laying around if it's only a TB. But I'm sure google's goal was to shut down a service to cut costs and complexity.


Indeed. The price to keep the information is peanuts.

Take Amazon S3 (at its highest price point, <= 1 TiB):

http://aws.amazon.com/s3/pricing/

($0.125 per GB) * (1 terabyte) = 128 U.S. dollars per month.

I'm surprised Google doesn't want to keep the data available in order to data mine.


They likely do not want to maintain the interface, or the integration of the interface into the overall Google groups interface.



Google Groups used to be DejaNews, which itself was an archive of Usenet. So this a hugely important preservation of Internet history. Congratulations and thank you, Archive Team!


This archive isn't of all the Google Groups messages, but instead of the separate file upload area that each discussion group had.


Wonder when someone will create a torrent of this, so it can be stored in a distributed way


Did they get some kind of permission by Google or by the authors, or do they just hope that no-one will sue them for copyright infringement?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: