Historical programming-language groups disappearing from Google

jedberg · on July 28, 2020

It's funny, when I took a tour of the US Geological Survey, the curator of the collection hated Google (which was just a few blocks away). He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS.

But what happens when they get bored with map data and get rid of it?

He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.

coliveira · on July 28, 2020

Well, that's the problem with the whole internet. Remember those pages created in the 90s/early 2000s? People thought they were sharing information to the whole world. It turns out that most pages created in the 90s are now inaccessible or have been siloed by big corporations. The fact that we allowed corporations to take over the internet made it an inhospitable place for everyone else without corporate backing.

zrm · on July 28, 2020

I don't think it's any harder to create a website than it ever was. The problem seems to be that corporations have made it so easy to do it within their silos that people aren't willing to spend ten hours on something they could do in ten minutes, not realizing that they're going to spend a lot more than ten hours creating content which the company will then vaporize at random whenever they feel like it.

quaintdev · on July 28, 2020

A decade ago, there use to be celebrity websites which had forums, galleries, blogs now it's just Instagram. Hell so many prominent celebrities don't even own a domain name in their name. Also, it's not like the content has improved. Earlier their use to be HQ images in those celeb galleries now the highest resolution image is 1200x1200. The only thing that has improved is how easily a celebrity can reach millions everything else has gone downhill with respect to discussions, forums, galleries, blogs. Most of these are replaced by poor comments section.

It's not just celebrities, so many independent artists are putting up their talent on Instagram and I don't have access to any of it because I need an Instagram account for that. Instagram web version is forcing to sign up if you scroll 1 page down on a profile.

Sometimes I feel like we need to build cutting edge decentralized applications that will burn these walled gardens to the ground. /rant

r3trohack3r · on July 28, 2020

  The only thing that has improved is how easily a celebrity can reach millions

From Wikipedia:

  Celebrity is a reference to the fame and wide public recognition of an individual or a group

I'd posit that celebrities are celebrities by connecting with millions. A platform that offers a "celebrity" the ability to connect easily with millions seems to be worth more than a list of any other features.

zeckalpha · on July 28, 2020

What has changed is the meaning of the word connected.

justaguyhere · on July 28, 2020

If celebrities (with their fame, reach and money) can't bother owning their own domains, what chance does a normal guy/gal have? It would be trivial for celebrities to set up their own websites and share whatever stuff they are sharing. At minimum, they can do this in addition to whatever social media they are on.

It is as if we are all becoming lazy and/or many of us don't realize the harm in giving all our info to half a dozen super mega corps. Most of these mega corps aren't even distributed in the world, they are all American (except tiktok) which is another interesting angle.

This is going to happen (already happening?) in the webapps/apps world too. There are so many no-code tools popping up - most will die, the rest will get acquired by the mega corps. Made a great webapp that is successful? Now you are stuck with bubble/airtable/shopify/whatever. I cannot name many no-code tools that lets you export your application to be hosted independently.

I feel like we are on a path where in a few decades, a dozen or two corporations will control every single aspect of our lives - online especially, and probably offline too.

kelnos · on July 28, 2020

> If celebrities (with their fame, reach and money) can't bother owning their own domains, what chance does a normal guy/gal have?

This is a matter of demand, not capability. It seems most celebs just don't care about setting up their own stuff, and, really, why would they? There are free platforms out there that give them huge amounts of reach. Most of these people just don't need their own website. They may come to regret that decision later, but it's their decision to make.

If a normal guy/gal wants to set up their own domain and website, it's not hard for them to do so, certainly no harder than it was in the 90s/00s, and probably a lot easier. The "no-code" stuff certainly has lock-in disadvantages, but you can simply choose not to use them if you want. Yes, it's more work, but it was always more work to do it yourself, and always will be.

skybrian · on July 28, 2020

A lot of ordinary people don't post in public, or if they do it's not under their real name, and there is a growing trend of deleting it after a few days. They don't want you to have any data about them at all.

The modern alternative to Usenet is private Facebook groups that never get indexed.

MYEUHD · on July 28, 2020

>I don't have access to any of it because I need an Instagram account for that.

Check out https://bibliogram.art

aspenmayer · on July 29, 2020

https://github.com/cloudrac3r/bibliogram

https://sr.ht/~cadence/bibliogram/

JoblessWonder · on July 28, 2020

502 Bad Gateway

aspenmayer · on July 29, 2020

Old link. Here ya go

https://github.com/cloudrac3r/bibliogram

https://sr.ht/~cadence/bibliogram/

aspenmayer · on July 29, 2020

For what it’s worth, the original link is back up again.

imhoguy · on July 29, 2020

These celebrities will be forgotten as soon as their Instagram account is gone or followers appear to be dead accounts mostly. Some artistically influential will get to the mainstream thanks to their fans hoarding and sharing the data, like all these last century music/film stars who left tons of material in any medium, now digitized and shared/pirated. Although I get a bit worried about p2p sharing as multimedia rental walled gardens got too popular.

jychang · on July 29, 2020

The venn diagram of "celebs who are influential" and "fans who are data hoarders" don't overlap as much as you think. The latter tend to be a much nerdier group than the general population.

imhoguy · on July 29, 2020

Yeah, you may be right here. Also Instagram has low entry barrier, where super stars got enormous promotional budgets. Now it is full race to the bottom I would say, commoditization of popularity, massive celebritism :)

LdSGSgvupDV · on July 30, 2020

There are many projects to make network decentralized such as ipfs (InterPlanetary File System). When exposing these services to public, legality is a big issue.

Plenty of rights would involve in it (copyright, privacy, and so on). Also, all kind of crimes are another issue. It is hard for people to keep monitoring if contents are safe or not.

I think we need a network version of the War of Independence.

Yhippa · on July 28, 2020

Counterpoint: does the average internet user want to download a new app or go visit a different website each time they want to get these updates?

zozbot234 · on July 28, 2020

If the websites bother to export RSS/ATOM/ActivityPub feeds, they won't need to. They'll just "subscribe" to the stream in their preferred app/webservice, and get aggregated updates about everyone they care about.

smabie · on July 28, 2020

Sure, but what about commenting on the post? No one likes creating accounts for different websites in order to post one comment.

It's obvious that things like Twitter and Instagram provide value to celebrities and people who follow them. It's just that there are some serious externalities not factored in.

zozbot234 · on July 29, 2020

ActivityPub lets you mark a comment as a "reply" to a post (or any web resource), then all you need is a trackback-like facility to link from the post to comments on it. It doesn't need to be centralized.

lmm · on July 29, 2020

How do you despam that?

eyelidlessness · on July 29, 2020

Or de-nazi it for that matter?

_lqaf · on July 28, 2020

> like we need to build cutting edge decentralized applications

Yes, please! More and faster.

Centralized services are easy to build, because they offer an obvious location to do some of the things that are tricky to do in a distributed fashion. They are also, by the way, far easier to for small numbers of coordinating people to control, which makes them popular with corporations, authoritarians and sociopaths.

Decentralized services will rarely be the New Shiny that attracts all the 14 year olds for a few minutes. But, unlike email, you never hear anyone whining that Myspace won't go away.

freehunter · on July 28, 2020

We have tons of decentralized platforms. The amount is not the problem. The problem is the user experience. For users who don’t care about anything behind the scenes and only cares about the experience, why would they go out of their way to figure out Mastodon when Facebook has a one-step signup process?

The reason central platforms win is because they have to be dead simple to use in order to attract any users. Decentralized platforms get their initial users because of how cool the technology is, but those people (people like me and you) aren’t UX experts and don’t prioritize UX.

It has to be easier than the central platform, and the central platform has the benefit of millions/billions of dollars to throw at it. Which means the decentralized platforms have to work even harder to overcome that. It’s not impossible, but it does require engineers to overcome their desire to build cool things and instead focusing on building a user experience that’s better than what Facebook/Twitter/etc can provide.

zozbot234 · on July 28, 2020

Joining a Mastodon server is just as simple as signing up to Facebook. The difference is that no single Mastodon instance has centralized control over their users; you always get the option of signing up elsewhere, or using your own instance.

chipotle_coyote · on July 28, 2020

This is what I would have thought, but I've heard from more than one friend who was frustrated by having to choose an instance in the first place. What makes one better or worse than another? What if I choose wrong? What if I need to move?

This isn't difficult, per se, but it's not as easy as signing up for Twitter for the simple dumb reason that you don't need to make that choice. And the concerns about "wrong choices" aren't entirely unfounded; the first Mastodon instance I signed up for was effectively abandoned by its sysadmin. The decentralization still makes it hard to search for new users to follow compared to Twitter (checking right this moment, when I click on someone I follow to see who they follow, it only shows me people they follow on the same instance); depending on the Mastodon client I'm using, it can actually be a little hard to follow someone even when I find them if they're not on the same instance I am. Again, technically none of this is super difficult, but for a user who isn't philosophically committed to the fediverse, tiny little frustrations start to add up quickly.

cycloptic · on July 29, 2020

>What makes one better or worse than another?

Most public mastodon instances usually have an about page that describes what their intended audience is. You can also look through the public timeline to see what users are saying first before signing up. If you aren't sure, pick a larger general instance.

Despite all that, in my opinion you'll probably get more mileage out of joining an instance run by someone you know and trust.

>What if I choose wrong? What if I need to move?

On newer versions of mastodon there is already a migration option to import/export your data between servers. https://blog.joinmastodon.org/2019/06/how-to-migrate-from-on...

Regarding your second paragraph: If you'd like to fix bugs in your chosen mastodon client, I'm sure that would be welcomed.

thayne · on July 29, 2020

> you'll probably get more mileage out of joining an instance run by someone you know and trust.

What if you don't know or trust anyone that runs a Mastodon instance? And don't have the time/means/expertise/motivation to run one yourself?

cycloptic · on July 29, 2020

Find someone who does and make friends with them?

chipotle_coyote · on July 29, 2020

Well, you can see how that's a bit of a bigger ask than "go to Twitter.com and click SIGN UP," right? :)

People really, really want to argue that decentralization of social networks doesn't make things harder, but eventually the defense always shifts to "well, you have to be willing to jump through a few hoops if you believe in the advantages of a decentralized/indie internet, which you totally should," because the truth is that the decentralized way does make things harder. Personally, I do believe in the advantages of the IndieWeb, and I do think it's worth jumping through those hoops. I just think we need to acknowledge those hoops exist, and always be thinking about ways we can reduce the friction for people who say "I like all those ideas in theory, but in practice it's too frustrating."

freehunter · on July 28, 2020

It’s the “signing up elsewhere or using your own instance” that’s the problem. You’re joining a Mastodon server and if you want to go to another server you have to actually move things. It takes actual effort.

It seems like the Mastodon developers look at email and think “if it works for email it’ll work here” and don’t understand that people deal with email because they have to, not because they want to. I don’t want to have to change my email address when I switch providers and I don’t want to have to move all my stuff if my favorite Mastodon server decides to shut down.

That’s not a solution that’s just another problem. It’s bad user experience.

zozbot234 · on July 28, 2020

> and if you want to go to another server you have to actually move things. It takes actual effort.

Not sure how that's worse than something like Facebook, where you literally don't get that option. If you want your asserted identity to be reasonably secure and easy to assess for other users, you have to find a trusted host or do your own hosting; that's no different from any other service.

freehunter · on July 29, 2020

This is what I’m talking about with developers not understanding users. I even said “users don’t care what’s behind the scenes”. Any normal user who is looking to leave Facebook wants Facebook The Product but not Facebook The Company. They don’t care about hosting or asserting identity. They don’t want options, they want a product.

allarm · on July 28, 2020

People use email because it’s the best communication system existing. If someone doesn’t want to change his address he creates his own self-hosted mail server. Or just use his own domain name when using email service with thord party email providers. Email system is amazing, the issues with ux are in fact minimal, almost non-existent.

freehunter · on July 29, 2020

Is your dad, aunt, or grandma going to create a self hosted mail server?

Seriously this is exactly what I’m talking about. This is a textbook example of what I’m talking about.

No end user hosts their own email service.

wolco · on July 29, 2020

Sadly no one has made this easy and possible.

ziml77 · on July 30, 2020

Even if self-hosted email was easy, I don't find it to be a great idea.

I ran my own for a while using mail-in-a-box but ended up moving to fastmail because I didn't trust myself to maintain the setup. I need my email system to just work and that's likely the same for the majority of people.

(I validated that lack of trust in myself even with fastmail by not realizing for almost 3 days that I let my domain expire, thus causing emails to bounce with no way of me knowing that was happening)

wombatpm · on July 29, 2020

Ok so make mastodon as easy as gmail. I can use Gmail.com or my own domain. Let me have same flexibility with my internet identity. I wish the USPS had gotten into the official Internet identity game. One place to receive legal emails, store my private keys, public keys etc. protected by law.

petre · on July 29, 2020

Too bad spammers and big corps broke that one as well. Maybe if it worked as a publush subscribe system (MQTT) instead where the sender was responsible for storing and distributing content, then the spam problem would be somewhat fixed?

clusterfish · on July 28, 2020

A lot of people, clubs and businesses publish their content on Facebooks and Instagrams because those platforms are better for getting your content out to your followers and more people. They are being rational.

Where's the non-proprietory decentralized platform that lets me reach as many people as I can on Facebook? There isn't one.

Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Facebook is 16 years old. That was a lot of time to figure out an alternative solution, but all we have are experimental projects that rely on adoption that they don't have to be useful.

Corporations aren't going to change how they behave, but it's annoying that us techies are apparently incapable of beating them at our own game.

pbowyer · on July 28, 2020

> A lot of people, clubs and businesses publish their content on Facebooks and Instagrams because those platforms are better for getting your content out to your followers and more people. They are being rational.

I like trains, and I started a website back in 2001 for people to share their photos. It was reasonably popular. One of my drivers was taxonomy and archiving of images for future enthusiasts.

Today, it's dead. People post their photos on Facebook groups. They get attention, likes - all the stuff that matters to a human. A week later the photos are lost in the group, hard for anyone to find, no indexing, no exposure. The comments - from people who worked on the railways, knew people involved - useful to historians of the future, are fantastic. But if you can't find them, what point?

I get why Facebooks succeeded. For my site, I was a total geek: why would I dirty the site with anything social? Well, look who's laughing now.

DebtDeflation · on July 29, 2020

>People post their photos on Facebook groups. They get attention, likes - all the stuff that matters to a human. A week later the photos are lost in the group, hard for anyone to find, no indexing, no exposure.

Not even a week if you consider the target audience for what's posted as opposed to the poster. Algorithmic sorting and infinity scrolling have pretty much eliminated the ability to go back and look at something you saw a few days ago (unless the algorithm decides to boost it back into your feed).

gorgoiler · on July 29, 2020

I haven’t seen your train site, but the kind of content I imagine you produce would be, in my mind, akin to a reference book.

By contrast, Facebook is at best like a magazine, at worst a radio phone-in about trains.

Reference works in the form of websites have amazing value in and of themselves. I don’t think they need to be measured by social eyeballs when they attain an outright high level of quality.

I happen to be particularly fond of a reference website that is a taxonomy and history of British traffic lights:

https://beno.uk/trafficlight/

zrm · on July 28, 2020

> Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Newsfeed is RSS/Atom.

Identity / friends / followers are really one package, and it isn't a thing browsers can solve on their own, because people want the ability to do password resets etc. Also, decentralized identity is somewhat the opposite of this anyway -- people don't want to use the same "identity" for their parents and their friends and their boss.

The best way to do this is for sites to use email as identity, because it's common and gives you password resets, but people can create more than one and separate them as they like.

Which the technology to do already exists, but Facebook and Google made it easy and the free software equivalent takes several hours to get running. Which we could fix, but haven't (yet).

clusterfish · on July 28, 2020

Sure RSS exists, and I use it, but it's not even built-in to (most) browsers anymore. You open an RSS link in the browser and it spits out XML garbage. Wat.

RSS is sadly not enough on its own without the other puzzle pieces. Private feeds are not really a thing, it doesn't let you comment on or like or share the article to your friends, etc.

JetSpiegel · on July 29, 2020

That's because Google Reader existed, then they killed Google Reader.

ActivityPub solves most RSS limitations.

dredmorbius · on July 28, 2020

Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Because these compete with the interests of browser vendors, interests which finance a degree of development that dominates and ultimately stifles independent efforts.

Remember that Google pitched Google+ as an "identity service". They're now accomplishing this through Android, Doubleclick, GA, Gmail, and ReCaptcha, far more effectively. And sell ads on it.

Facebook isn't going to pay for social integration development by Mozilla: Zuck wants that pie to himself.

Channel monopolies would prefer RSS died and browsers (or apps) served their specific feed directly and exclusively.

Avicebron · on July 28, 2020

More often than not, the "game" trends toward market capture and acquire + kill or absorb business strategies. At a certain size that's hard for anyone to beat.

petre · on July 29, 2020

That's part of the problem: readers became followers. Don't forget to like and subscribe.

6ak74rfy · on July 29, 2020

I went through exactly this thinking recently when I wanted to setup a blog for myself (and migrate an existing one off of WordPress). I tried my best (and I think I succeeded) in ensuring that I am not locked into one vendor, and it was pretty much free.

Someone else mentioned that you can't reach as many people from your silo-ed website as you can if you go through social networks. I found one way you could get best of both worlds - through Medium's import feature[1]. But I don't yet know how effective that is.

Here's a short write-up in case anyone's interested: [2]

[1]: https://help.medium.com/hc/en-us/articles/214550207-Import-a...

[2]: https://ketanvijayvargiya.com/58-setup-blog-and-email-on-cus...

thayne · on July 29, 2020

I think that's only half of it. The other half is that it's easier for consumers to find content in the silos of the large corporations than content that exists outside of them.

mc32 · on July 29, 2020

Yep, it’s even worse. There are some things that don’t have a definitive answer, like many aspects of CoViD. Some pages with what I would categorize as “inquiring” get removed just because they don’t line up with the WHO. This isn’t about questioning vaccines, but rather the unsettled questions around this new disease. Just gets banned... it’s not like I have a stake in that fight _other_ than dismay at private censorship of opinions that don’t toe a given line. It’s rather frightening.

But... many of the same companies will fill your search results or fill affiliate pages with quackery ads just fine...

colonwqbang · on July 28, 2020

Why do you feel that it was the job of "corporations" to preserve and archive of every page forever?

In my country, all physical books and magazines which are published must be submitted to the government in X copies. The government then keeps an archive.

With webpages, the problem of obtaining X copies never existed. Why couldn't the government have archived webpages like it always did with books?

PaulDavisThe1st · on July 28, 2020

It is not that it is their job. The problem is the mismatch between the broader public understanding of the lifetime of "a webpage" and reality, when said "webpage" is inside a walled silo (and maybe even when it isn't).

colonwqbang · on July 28, 2020

I guess I don't see why anyone would expect their webpage to keep being published indefinitely with no contract or ongoing payment. Not many other things in the world work like that.

Services provided by a company typically don't survive the end of the contract to provide that service. If the company itself goes bankrupt, all services cease to be provdided immediately.

Typically the only organisations which can credibly commit to providing a service for more than a few years/decades is the government of a country, a well-funded foundation with a clearly specified mission, or similar.

rikroots · on July 28, 2020

In the UK, this job is handled by the British Library. They have a legal duty to collect annual snapshots of all websites using the .uk TLD https://www.bl.uk/collection-guides/uk-web-archive

metrokoi · on July 29, 2020

I believe you are misrepresenting the situation. No one expects corporations to archive and preserve all data, especially not data that they are not associated with.

However, if they create a monopoly on that data they have an obligation to preserve it, especially in the case of a corporation outright aquiring data instead of simply "out competing" for data. And as everyone mentions, of course they are in no way legally obligated to do so, but they are by any reasonable standard ethically obligated.

I do think that the government could and should archive data, but there is currently no system in place for doing so and likely will not be for a long time, if ever. Corporations would simply have to maintain the data that they already have.

r3trohack3r · on July 28, 2020

I'd argue this is a feature not a bug. The internet is a protocol for communication, not archival or retention. Any notion of persistence is owned by nodes in the network. Retrieval from an "archive" over the internet comes in the form of communication. The web introduced hypertext, and a protocol for exchanging hypertext, on top of this communication protocol. But again any notion of persistence of hypertext, and the "links" between hypertext documents, is the responsibility of the nodes in the network.

They were sharing information with the whole world, but in an ephemeral medium.

The web, and internet, is not an inhospitable place for anyone without corporate backing. You can host a somewhat reliable service on a raspberry pi over your home internet connection.

wolco · on July 29, 2020

You just can't be found. You can self host but unless someone finds you some other way you are excluded.

nullc · on July 28, 2020

> It turns out that most pages created in the 90s are now inaccessible

Some of that is because search engines have simply stopped returning them in results even though they're still online.

fiddlerwoaroof · on July 28, 2020

The issue is is that a web page only lasts as long as its funding does: private sites are great, but someone still has to pay for the server and, when they die, it’s probably going to just vanish, unless the internet archive got it.

opportune · on July 28, 2020

How are big corporations preventing a web server from serving content over HTTP in old-school HTML?

methodin · on July 28, 2020

Seems like a fundamental truth of Capitalism: privatization and ultimate destruction of anything that can be monetized. Certain things are impossible without money and to make money to have to generate or consume something which leads to a never-ending cycle.

FpUser · on July 28, 2020

I seem to have a distrust of corps imprinted in my brain since birth and had never fell for their candies/propaganda. All my stuff is always on my own servers with shadowing of course.

kerkeslager · on July 28, 2020

> He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

Jeez, that's horrifying. Literally just giving public assets to private corporations.

Polylactic_acid · on July 29, 2020

Public funded data should be publicly available which includes use by private corporations.

kerkeslager · on July 29, 2020

Agreed. You seem to be missing the part where this is no longer publicly available because the USGS no longer has the data.

kevindong · on July 29, 2020

I believe the original author meant that Google doesn't have to turn over aerial scans that Google commissioned (not the scans they took of the USGS files).

Furthermore, I'd be shocked if Google just kept the original copies that the USGS gave them.

emiliobumachar · on July 28, 2020

If getting it takes connections or prestige, then yes.

If any entity with a plausible use case could and still can get that data at the cost of the copy, I don't see why not. The whole "copying does not deprive the original owner" meme applies particularly to such public assets.

kerkeslager · on July 28, 2020

> If any entity with a plausible use case could and still can get that data at the cost of the copy, I don't see why not.

Can you point me to where I can download this data for the cost of a copy? Didn't think so.

emiliobumachar · on July 29, 2020

Did you try asking the USGS?

Google presumably also didn't have a website with a download button.

kerkeslager · on July 29, 2020

At a fundamental level, my complaint isn't about money or access, it's about ownership.

Yes, money and access are important. Yes, ownership has money and access implications. But ownership in itself is fundamental, and the money and access problems usually don't immediately follow the ownership problems, because if they did, nobody would give up ownership in the first place.

I've said it before and I'll it say again: corporations don't have the right to "innocent until proven guilty". We don't have to wait for corporations to do something wrong to do something about it. A lot of people on Hacker News seem to have this idea that we should wait to regulate until businesses follow incentives to the point of doing great harm, and then once they're when they do great harm, we should just say, "Oh it's not their fault, they were just following incentives."

I don't accept this. It's obvious that taking publicly owned data and making it privately owned will lead to Google placing that data behind a paywall, an ad-wall, or a simple loss of access if Google feels they can't monetize access. Even if Google maintains a relationship with the USGS such that we always have access, there's no reason for the USGS to spend our tax dollars to pay rent to Google as a middleman. We don't have to wait for Google to follow incentives to that point--we can see where this is going and we don't have to pretend we don't.

emiliobumachar · on July 29, 2020

I assume the USGS still holds the same data, no less accessible than before.

So, if Google has put their copy of the data behind a wall, or deleted it, it's no less accessible than if Google had never been allowed to make that copy.

riking · on Aug 2, 2020

but... it's not current data. the historical data loses value as it ages. Aerial scans of "one month ago" are now less accessible.

_8huj · on July 28, 2020

I don't like this but if a corporation is a person, they have the same right to it that the rest of the public has.

If the effort to USGS could be quantified in a cost, I'd expect Google to pay USGS to make the public data available?

It does sound awful. I don't know what the right answer is.

kerkeslager · on July 28, 2020

> I don't like this but if a corporation is a person, they have the same right to it that the rest of the public has.

1. A corporation is not a person. Corporations don't have rights, except inasmuch as the people within the corporation have rights.

2. The problem isn't that Google has access to the data, it's that USGS and the rest of the world no longer have access to the data, except on Google's terms.

deelowe · on July 29, 2020

Wasn't there a seminal surpreme court case that has been used as precident to show that corporations do have rights? Something tax related?

kerkeslager · on July 29, 2020

The supreme court also decided in Dred Scott v. Sandford that people of African descent imported into the United States and held as slaves were not included under the constitution and could never be citizens.

The supreme court was wrong on racism, and it's wrong on corporate personhood.

deelowe · on July 29, 2020

What's right and wrong is a purely human construct and changes over time. The point was that, currently, in the eyes of the law, corporations hold many of the same rights as individuals. This could change, but would require special circumstances not considered previously or a change to the law by congress.

xmprt · on July 28, 2020

Corporations aren't people. I can't get married to Google. If you can point to specific precedence of corporations being given access to certain data on grounds of their personhood then your argument makes sense but just because corporations are considered like people in the context of speech doesn't mean that applies literally everywhere else.

hanche · on July 29, 2020

Corporations aren’t people, but from a legal perspective, they have some rights in common with persons, which is why you often hear statements saying they are.

https://en.wikipedia.org/wiki/Corporate_personhood

kerkeslager · on July 29, 2020

The law is wrong.

hanche · on July 29, 2020

I never said otherwise.

kerkeslager · on July 29, 2020

Okay, but it's pretty tiresome to hear the supreme court opinion of corporate personhood brought up constantly as if it has any validity whatsoever.

hanche · on July 29, 2020

I can sympathize with that. Unfortunately, the opinion of the supreme court is a force to be reckoned with, whether we agree or not.

JetSpiegel · on July 29, 2020

> I can't get married to Google.

But they can still fuck you over, like when they banned my GMail account for no reason, with no warning nor explanation.

monadic2 · on July 28, 2020

Yea most of the wind about “taxpayer dollars being wasted” is just flatulence but this is a straight up robbery.

tingletech · on July 28, 2020

> But there was no agreement for Google to turn over their arial scans back to the USGS.

That was poor negotiation by USGS Solicitor's Office. Libraries participating in google digitization programs negotiated to keep copies of their scanned materials in the Hathi Trust Digital Library https://www.hathitrust.org

mywittyname · on July 28, 2020

You act like the Director of the USGS was acting in good faith. It's pretty likely that Eric Schmidt, or similar, already worked things out with high-level officials within the government and the USGS Director was not given any real decision making capabilities.

est31 · on July 28, 2020

There are laws for book publishers, requiring that they send copies to your local government's central library. In the US it's the library of congress. Some of the books they don't keep, but they do filter them by which books are important and which aren't. Maybe the same should be done for "viral" posts, such arial scans, and other data deemed important.

johannes1234321 · on July 28, 2020

In Germany the national library is also required by law to take care of digital ("körperlose Werke") media.

They are still figuring out what a good way for archival of those is and are quite selective in choice what they archive, but they plan to expand on that

German page: https://www.dnb.de/DE/Sammlungen/DigitaleSammlungen/dgitaleS... English page: https://www.dnb.de/EN/Sammlungen/DigitaleSammlungen/dgitaleS...

wahern · on July 28, 2020

Electronic works in the U.S. fall within the mandatory deposit statute, but then are excused by Copyright Office administrative rules. However, it seems they've slightly narrowed the electronic works exception (first in 2010, and tentatively for 2020) since the last time I looked: https://www.copyright.gov/rulemaking/ebookdeposit/

tingletech · on July 28, 2020

I didn't know that LoC has any discretion regarding keeping items mandatorily deposited for copyright registration. Do you have more information on this?

jeffbee · on July 28, 2020

The USGS is currently in the middle of an 8-year 1.1-billion-dollar program to develop a nationwide digital elevation model from aerial lidar. The data, which is freely accessible, is hosted on AWS. Cute story though. The hackernewses are going to eat it up.

https://www.usgs.gov/core-science-systems/ngp/3dep/3dep-data...

jcrawfordor · on July 28, 2020

USGS is in the process of collecting that data right now, it's not from the archives, and DEM is different from USGS aerials (which are photographs) and run out of a different USGS office. This is sort of irrelevant.

Making digital data publicly available is pretty new for USGS. Just a few years ago archived aerial imagery had to be ordered by mail and it was a pretty lengthy process. Topo maps (the earlier equivalent of the DEM data to which you refer) were generally ordered on paper as well up to five or so years ago, but they're in a lot more popular use so more third parties got into the business of distributing them. I've relied moderately heavily on both for some of my research and was a very painful process until just recently to get anything older than current. In the meantime, yes, Google had it all at some point, but mostly stopped using it or providing it because they obtained better quality imagery.

Fortunately USGS now has a slippy map for topo and an admittedly rather clunky ESRI query service for aerials.

jeffbee · on July 28, 2020

USGS has been providing free public access to DEM data for ages. The SRTM has been available via FTP since at least 15 years ago when I first started using it to render hillshaded maps. There's not a secret handshake needed.

https://dds.cr.usgs.gov/srtm/version2_1/

jcrawfordor · on July 28, 2020

and the DEM has always been in a native digital format. The whole problem here is that the aerials and conventional maps are not, they're on paper and film and fiche. It takes a lot of time and money to get it digitized and available and USGS was not able to do that for a long time. You could argue that Google's generous offer to digitize the EROS archives contributed to the delays on this.

Keep in mind that when we talk about the EROS archives we're talking about data that goes back to the 1930s and earlier for some product types.

For a long time I got the topo maps from the website of a state government bureau that had conveniently run them through their own large-format scanner and posted the TIFFs - USGS didn't get around to it for years after. It's hard to blame them too much as they had a shoestring budget.

Actually, for amusement value, that state agency appears to have removed the TIFFs from their website and now says that you can order the topo maps by mail for $8 a piece, which is what I used to have to do. I wonder if USGS got mad at them, which is a bit ironic since they don't mention that USGS themselves only recently started offering them online for free. For additional amusement value EarthExplorer, the fairly new service that lets you retrieve aerials online, has a banner up that downloads are intermittently broken and indeed I can't get it to work at the moment.

jeffbee · on July 28, 2020

Really I'm struggling with your statement about digital distribution being new for the USGS. We're talking about an agency that ran a finger service to inform people about recent earthquakes!

widforss · on July 28, 2020

I had to order some maps over Antarctica by fax a year ago. The USGS had a functional webshop, but it only served the US, everybody else had to fill out this form including debit card data and send it over. It turned ou my uni actually still had one (1) functional fax machine.

jedberg · on July 28, 2020

I took the tour 10 years ago. Obviously his objections were heard. I’m glad they listened to the guy.

ponker · on July 28, 2020

Companies are necessarily managed for the quarter and countries should be managed for the century.

TheOtherHobbes · on July 28, 2020

The planet should be managed for the centuries, plural.

But we're just not smart enough to understand that, never mind make it happen.

Instead we prefer to cling to the bizarre delusion that billions of individuals with competing interests will somehow spontaneously self-organise into the best of all possible worlds.

elmo2you · on July 29, 2020

That is indeed pretty much what it says on the tin.

However, to be fair, it has always been the most greedy and self-interested, with already the most disproportionate power to rig the game in their favor, that have been most vocally advocating this system. No surprise there, of course.

What fascinates me is how a majority of people, who certainly do not personally benefit from that system, have been made to believe that they do. Sure, political corruption, cultural indoctrination/propaganda, horrendous general education, and I can think of a few more .. but still I've always been amazed, how it appears to have canceled even basic logic reasoning among so many.

Who knows, maybe one day, it will turn out to not just "correlate" with an addiction to carbs/sugars, of which the country has plenty of problems with too. Junkies have always been easy to manipulate.

Until then, at least it still gives some hope that a growing number of people now realize that this system just doesn't work as it is advertised.

_abox · on July 29, 2020

And it's not even true. At least here in Europe (I can't comment on the US as I've never visited), Google Maps is really poor.

It's fine when you travel by car, but when I'm hiking through the hills I'm just walking through an empty square on Google Maps. Volunteer-driven OpenStreetMap is MUCH better. And there the data is actually open and safeguarded.

Governments should support that kind of project instead of corporate privacy-invading playtoys like Google Maps.

wyclif · on July 29, 2020

In another life, I was a land surveyor and I did a lot of LiDAR work as well as heavy use of USGS data. Almost anybody except the most blinkered in that industry would have seen this coming, I think. It's just one more data point that convinces me that Google should be broken up or at least not allowed to silo previously public categories of data.

parksy · on July 29, 2020

I had a similar almost to the letter conversation when I did some web work for a much smaller GIS firm back in the day, but wanted to add that in my experience this isn't just a google thing but an issue with governments and outsourcing in general.

Anecdotally, a close relative (and many others in her institute) designed entire curricula of learning modules for a government-owned nationwide technical college, back when online learning was newish, ~20 years ago (I think back when SCORM was fresh). These were tightly integrated into the traditional in-class offerings. A couple of years later a "trim the fat" government slashed internal capabilities and outsourced all "IT" hosting, management, etc.

All of the online learning modules (which would have cost millions in man-hours to develop) were literally handed over as "content" to a company who to this day offers them back to her institute under per-student licenses (that far exceed any "hosting" costs of these basically static resources) over a decade later. This company also profits off licensing to an array of pop-up online "institutes" that don't even approach the pedagogical context needed to ensure quality education outcomes from these resources.

Like a comedy of errors, from time to time some lecturer at her college will want to ask some question about the materials, their boss directs them to the company support (which is a paid service), after the issue escalates through the support tiers and they realise they need the expert knowledge of the author she'll get an email with the question, a process that can take days or weeks when the lecturer could have walked into the office next door and asked her directly, if the company hadn't stripped all author credits from the materials.

If the company decides to shift business models, or goes out of business, or is acquired and scuttled, these assets get blown to the winds.

There's a lot I could say about this situation, but essentially governments in general seem to devalue their assets at taxpayer expense, the IP of these assets could have been better handled rather than just giving it directly to the first company to win the contract all those years ago.

raldi · on July 28, 2020

> historical arial archives

A font of knowledge

pwdisswordfish2 · on July 29, 2020

Is this a valid reason for not using Go?

I am a holdout.

(Not suggesting I am "smarter" than Go users, but I can forsee issues with Go being controlled by Google.)

teddyh · on July 29, 2020

I will probably never use Go in its current situation.

https://news.ycombinator.com/item?id=8733705

cosmodisk · on July 29, 2020

I think most of us are young enough to live up to the point where we would look into the mirror asking ourselves what did we do 20 years ago and nobody will really remember because you know,a few bits here,a few bits there,and all it disappeared...

globular-toast · on July 29, 2020

Not smarter. Wiser.

It sounds awful that Google has the best mapping data in the US. In the UK Google's data is awful, worse than OpenStreetMap and much worse than Ordnance Survey, the national mapping agency.

Spooky23 · on July 29, 2020

The funny thing is that this happened already when Google bought DejaNews and broke the interface after a year.

elmo2you · on July 28, 2020

The underlying problem here might as well be considered a fundamental shortcoming of pure/fundamental capitalism. I make no claims about the value of alternatives, or even if there are any (better ones, that is).

Anything that is (no longer) of commercial value will be "phased out" and dismantled/destroyed. One might still stretch it a bit, by arguing that the commercial value of something can include its future potential value. But I personally know not a single commercial companies that ever choose that over short term cost reductions and "profit optimizations".

Luckily, there are governments who acknowledge this shortcoming and build structures to compensate for it. But when governments decide to leave (almost) everything to commercial markets, then the importance of anything and everything can and will only be measured by it's commercial (contemporary) value/profitability.

People have every right to vote for and support such a system. But then don't complain, when all that you will get is only what such system supports/provides.

irrational · on July 28, 2020

Isn't killing projects Google's key strength?

TheSpiceIsLife · on July 28, 2020

Like we have whitewashing and greenwashing, I propose the term:

Googlewashing - to proclaim “Google would never ...”

stcredzero · on July 28, 2020

He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS...But what happens when they get bored with map data and get rid of it?

Looks like he was a lot smarter than us.

If you would've asked me back when Google was new, and we all believed in "Don't be Evil," I would never have thought that Big Tech would end up being the Ministry of Truth and The Memory Hole.

synack · on July 28, 2020

Just recently I collected all of the archives of comp.lang.ada I could find and imported them into a public-inbox repository. There's a gap around 1992 that I couldn't find a copy of, but it's otherwise complete. It took a few days to get everything into the right format and get SpamAssassin dialed in, but it would certainly be possible to do this for the other comp.* groups if one had the patience.

https://archive.legitdata.co/

https://archive.legitdata.co/comp.lang.ada/

https://public-inbox.org/README.html

sneeuwpopsneeuw · on July 28, 2020

I would personally very much appreciate it if the ada recources could be placed or archived again on the internet. Lately I had the feeling even books where a better option for finding information about the language.

kazinator · on July 28, 2020

The vast majority of the spam content is injected into these newsgroups via Google Groups itself, and is not even seen on other NNTP servers.

Blocking posting access to these newsgroups from GG is generally a good thing for those newsgroups.

Not being able to search the archive is the unfortunate collateral damage though. Google is not obliged to provide a Usenet archive, I suppose.

Formerly obtained deep links to the content also do not work!

If you formely cited a comp.lang.lisp article by giving a direct link into Google Groups, people navigating it now get a permission error.

dependenttypes · on July 28, 2020

What would be a good free NNTP server or NNTP archive?

giancarlostoro · on July 28, 2020

The D programming language forums work as a NNTP server as well as web forums. I have in the past downloaded all content from the forum allowing me to have fully offline archives of threads. This is so underrated. I think NNTP could make forums much more superior although it feels like there arent many clients springing up AFAICT.

jcranmer · on July 28, 2020

Adding some new NNTP features to Thunderbird was my introduction to open-source software and ultimately led me to being one of the primary maintainers.

NNTP is a wonderful protocol, arguably the simplest of the 4 mailnews protocols (IMAP, POP, SMTP, and NNTP). While it seems to share the same basic format as RFC822 messages, it actually tends to avoid some of the more inane issues with the RFC822 formatting (generally prohibiting comments and whitespace folding).

Unfortunately, the internet by the early 2000s started turning more and more into an HTTP(S)-only zone. Usenet itself hemorrhaged its population base, especially as ISPs shut down their instances (e.g., because someone found one child porn instance somewhere in alt.binaries.*).

WalterBright · on July 29, 2020

We periodically hear calls to replace the D NNTP forums with "modern" forum software, but naaah, nobody does it better than NNTP!

Vladimir Panteleev did, though, write a web interface to NNTP:

https://forum.dlang.org/

which is also freely available:

https://github.com/CyberShadow/DFeed

giancarlostoro · on July 29, 2020

The only things I would ever modernize about the D forums if I were to ever bother is just CSS style or something, but honestly, they work, they're faster than most forums out there and aren't crashing all the time (I'm looking at you vBulletin!) so it's fantastic.

tptacek · on July 28, 2020

I'm a broken record on this, so you may have seen me point it out before, but NNTP started dying in the mid-late 1990s, when binaries took over. It was extraordinarily difficult to keep reliable full-feed binaries (NNTP is the dumbest conceivable way to share large binaries), and if you couldn't do that, customers would yell and ultimately abandon your service for a cheaper one, while opting for more centralized Internet NNTP services.

Ultimately I think the web would have eaten Usenet anyways, but it's a shame; we were Freenix-competitive (I think I independently invented the INN history cache), and that was some of the most fun I've had doing systems engineering work.

u801e · on July 29, 2020

> I'm a broken record on this, so you may have seen me point it out before, but NNTP started dying in the mid-late 1990s,

I didn't start posting to usenet before 1999 and was a regular poster in a few groups from that time up till around 2014. Excluding spam, what was the activity level of groups before and after eternal september?

> It was extraordinarily difficult to keep reliable full-feed binaries

I don't understand why ISPs wouldn't just limit their newsfeed to the text only newsgroups? Did peering arrangments require one to also provide binary newsgroup access? IME, ISP and university news servers had mediocre binary completion rates at best. If someone wanted binaries, they could always subscribe to one of the paid newsfeeds that provided better completion. So I don't really see the incentive for providing binary access at all at an ISP/educational institution level.

stubish · on July 29, 2020

That was exactly what Universities where doing here in the 90's. We didn't have physical disk space or bandwidth for a complete feed, but where able to provide all the non-binary groups in the main top levels (including alt.*).

The ISPs needed the binaries though, because that was all they were used for. People read the text on their Uni systems, where they got free dialup, rather than chew up their ISP connection time quotas.

u801e · on July 29, 2020

> People read the text on their Uni systems, where they got free dialup, rather than chew up their ISP connection time quotas.

At dialup speeds, the only practical binaries one could download would be images. mp3 files were practically the upper limit of file size one could download. Beyond that, articles would expire off the server before they could be downloaded.

Without broadband and better completion rates, which commercial ISPs didn't really provide (especially the latter), customers probably wouldn't really try using their usenet feed for that purpose when they had other alternatives for binaries.

I just used my ISP's usenet access for text groups until they discontinued it.

tptacek · on July 29, 2020

If you disabled binaries, customers would get angrier than if you simply didn't provide Usenet at all. Nobody that cared about Usenet would sign up to a provider that didn't provide full feeds. I agree that it's irrational, but it also destroyed Usenet, a couple years earlier than I think it would have otherwise.

u801e · on July 29, 2020

For what it's worth, I never called my ISP's customer support to complain about bad completion rates on Usenet. Logically, people would have found other ways to download what they wanted, either by finding an alternate source, or another usenet feed to replace or combine with their ISP's.

What really took down usenet was when Andrew Cuomo, back when he was the state attorney general of New York, made a deal with several major ISPs to restrict access to child porn via usenet.

This lead to many ISPs discontinuing their usenet service, which in turn lead decreased the number of people posting to text groups. Within a few years of that happening, practically all the regular posters in the groups I used to frequent just stopped posting. Those same groups now only have spam posted every several weeks based on what I've seen via google groups. Prior to that, these groups had plenty of active discussions going back to the mid '90s and earlier.

tptacek · on July 29, 2020

I was the tech lead at the most popular ISP in Chicago in the mid-late 1990s and I assure you that people complained, on Usenet, in email, and in phone calls. And we kept a full feed!

giancarlostoro · on July 28, 2020

Ah I see, I'm not more curious that you say it's really simple, I haven't read the spec much personally. I loved the concept of the D forums so much I intended to attempt to setup my own NNTP daemon from scratch, but it's in a bucket list of projects I want to try out, the only resource I could think of reading are the RFCs not sure if anybody else has documented Usenet otherwise.

kazinator · on July 29, 2020

What ever happened to grouplens? That is a protocol/system for collaborative filtering that used Usenet as a guinea pig.

Probably a bad idea, according to what we know now about echo chambers in social networking.

http://ccs.mit.edu/papers/CCSWP165.html

kazinator · on July 28, 2020

What you can do with NNTP is run a local NNTP caching server. Then connect to that server instead of the real one. Your caching server can retain articles as long as you want; much longer than the upstream server.

(Though mere long article retention is not necessarily the best archive interface, of course.)

Disclaimer: I'm not well-versed in the solutions in this space. Maybe there is some NNTP cacher out there that also has a web archive interface into it or whatever.

WalterBright · on July 29, 2020

Yes, and I have 100% of the D newsgroups archived back to the very first post. Anyone can get them from the D NNTP server. I also wrote a program to create static web pages from them:

https://github.com/DigitalMars/ngArchiver

and the generated pages:

https://digitalmars.com/d/archives/digitalmars/D/index.html

When we were working on the history of the D programming language paper, this was an invaluable resource.

drey08 · on July 29, 2020

Back when I was first learning about D (over 10 years ago), I crawled those archives and sorted all the posts based on the comment count. It revealed many interesting topics and the history of how D progressed as a language.

I wish forum.dlang.org had a quick way of browsing just the top list of the most commented posts.

giancarlostoro · on July 29, 2020

Well it is open sourced on GitHub you are welcome to make tickets. Maybe thats a small type of enhancement the D forums could benefit from. Filters that produce informative pages / results.

WalterBright · on July 29, 2020

Good idea. Never thought of that!

kazinator · on July 28, 2020

I've been using the NNTP server provided by https://www.aioe.org/ for quite a few years.

There is also https://www.eternal-september.org/ which I used.

AOIE requires no authentication. The Eternal September server requires account registration via the web site; then you use an authenticated NNTP connection.

There are other servers out there.

These sites do not provide any archive.

kev009 · on July 28, 2020

Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.

neilv · on July 28, 2020

When Google started, there was maybe an overall altruistic, visionary, principled culture among many pre-Web Internet-y people, and it looked like Google was of that same school of thought.

(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)

That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.

This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.

enneff · on July 28, 2020

Google didn’t kill Usenet; it was already pretty much dead. Web forums had all but taken their place (and where are their archives now? So much is lost).

If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.

If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.

dragonwriter · on July 28, 2020

> The usenet archives should really be made public

Given the nature of Usenet, they were if anyone wanted them.

wahern · on July 28, 2020

Various people sent their old tape reels and other backups to Deja News, which compiled everything. But Deja News never made freely available the individual archives or the collection, nor did Google. The oldest stuff is locked away by Google because the only hard copy was destroyed when sent to Deja News. As time wore on most of the remaining fragments that at one point could have been recompiled independently also disappeared.

What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.

DaiPlusPlus · on July 29, 2020

> What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.

Couldn't a copyright claim (or something under the GDPR or UK's DPA) be used to regain access to those though?

Just because something is published to a public forum doesn't mean you relinquish your rights.

wahern · on July 29, 2020

Copyright is a legal mechanism for restricting others from making copies, not for demanding they make copies for you. Off hand I'm unaware of any general legal mechanism to accomplish that outside of a contract or promise.

DaiPlusPlus · on July 29, 2020

That’s why I suggested the DPA which does allow for rightsholders to request copies of data pertinent to themselves - I’d argue that usenet postings would fall under that scope.

erik_seaberg · on July 29, 2020

Doesn't that just create an incentive to destroy the archive before GDPR authorities can shake them down over it?

DaiPlusPlus · on July 31, 2020

Perhaps - but it also creates an incentive for companies to destroy inappropriately-held and collected personal data they have no business possessing.

The DPA isn’t new - it was created in 1988 - and UK ISPs had Usenet/NNTP servers long after that.

erik_seaberg · on July 28, 2020

Google acquired probably the biggest searchable archive, Deja News. What we needed was some kind of self-sustaining org with a strict charter to preserve the archive no matter what.

mjevans · on July 28, 2020

Archive.org ?

erik_seaberg · on July 29, 2020

Maybe, though making themselves a target of book publishers may have risked their other responsibilities.

pfortuny · on July 28, 2020

They were until they were not.

eternalban · on July 28, 2020

> they don't care about usenet.

They cared enough about to kill it.

HenryKissinger · on July 28, 2020

Controversial question: Why should we preserve code that no one uses anymore? Why should we not allow some information to be simply lost?

jlokier · on July 28, 2020

Because it's a cultural artifact, of its time. It's history. And some people would like to be able to read it, or do other things with it.

Personally I'd like to be able to link to my own posts from that time, for when people asked me what I used to do. But I can't find them any more.

These groups are mostly not code. They are conversations, design discussions, ideological discussions, jokes, that sort of thing.

Like what we have now in social media, except back then there was pretty much only Usenet, and it had a very different feel than the current social networks.

They are where things ideas like the smiley, and free and open source software, and utopian ideas of internet culture were developed. All the early internet memes. And of course all the knowledge people shared.

Conducted in public at the time and thought to be archived for the long term.

zxcvbn4038 · on July 28, 2020

Wonder what people will think in a hundred years when they read that everyone believed the universe was made up almost entirely of invisible and intangible matter? It'll be some future generation's flat earth joke.

reaperducer · on July 28, 2020

This past Sunday's New York Times noted that until the 1860's, almost all reputable scientists insisted that pandas were a myth.

ornxka · on July 28, 2020

As someone else pointed out, losing information is bad because we can't know what value it might have in the future, only what value it has to us today. A lot of things from the past that we are certain had no value to people at the time (such as literal garbage heaps) are of immense value to historians today in understanding the past and the context within which those "worthless" things existed.

You're right though that a decision will probably have to be made at some point about what to keep and what to toss (how big is YouTube, exactly? Are we really going to keep every video, in its original resolution, forever?), but this is just plaintext, it takes up almost no space. The decision doesn't even have to be made, since it's easy to find the means to store this, so why bother making it? Kicking the can down the road is actually the best decision in this case, since the people of the future will (hopefully) have a clearer understanding about what was important in our own past than we do currently.

johnfn · on July 28, 2020

Why should we preserve old websites that no one uses? Why bother with historical documentation at all?

It's because, at the time, you don't know what information is going to be important and what is just garbage. Documents that are apparently useless today could become fascinating tomorrow.

ghaff · on July 28, 2020

No, it's a reasonable question. We're not going to preserve, certainly not in a findable way, every piece of digital flotsam that has ever been summoned into existence. In general, we probably should save what we can of Usenet for historical value as balanced against the fact that the archives are tiny in the scheme of things. They're probably also messy but that's probably OK.

Interestingly, when some people saved a great deal of the Usenet archives pre-Deja News, one of them said something to the effect of they wished they had prioritized saving social discussions and so forth because, by and large, saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.

nitrogen · on July 28, 2020

saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.

Honestly even that sounds pretty fascinating:

It could help someone gather stats on the nature, frequency, and severity of bugs over time and across companies from another angle.

It could provide a fresh perspective on modern OSes by showing how historic OSes did things.

And it might be good material for a course on the history of software engineering practices, showing classes of bugs that have been eliminated, and styles of development and customer support that worked or didn't work.

ghaff · on July 28, 2020

I suspect the information would be too fragmentary to extract anything statistically useful in it. But, yes, there are possibly historically interesting nuggets in those sorts of topics.

Here's the article I was thinking of by the way. https://www.salon.com/2002/01/08/saving_usenet/

enneff · on July 29, 2020

Why not? Our capacity for storage has been increasing exponentially such that yesterday’s data is basically of negligible size compared to what we are producing today. There’s no reason to delete history.

ghaff · on July 29, 2020

So no one is keeping you from doing so. No reason to hope some one will do it.

enneff · on July 29, 2020

Indeed! That's why I regularly donate to the Internet Archive. :-)

ghaff · on July 29, 2020

Which is a very laudable response! With the caveat that pack-ratting everything is going to be an endless treadmill. I certainly favor preservation but at some point you do have to consider what you're saving and why.

minerjoe · on July 28, 2020

You assumption "no one uses anymore" is glaringly wrong in this case.

Those archives are full of useful and informative information.

Not everthing changes fast. Common Lisp has been around for 30 years basically unchanged. The discussions back there can be truly informative for today.

It does take time to wade thought it, but people have been collecting (via the google archive, when it existed, sigh) curated lists.

https://www.xach.com/naggum/articles/ https://www.xach.com/rpw3/articles/

jedberg · on July 28, 2020

For the same reason we don't just tear down the pyramids and build condos there.

There are still interesting things to be learned from ancient artifacts.

joshuamorton · on July 28, 2020

But we do tear down old condos to build new ones. Should we also endeavor to retain every geocities and myspace page?

And if not, what makes comp.lang more like the pyramids than geocities?

nitrogen · on July 28, 2020

Should we also endeavor to retain every geocities and myspace page?

Yes: https://www.archiveteam.org/index.php?title=GeoCities

Digital data is not exclusionary in physical space like condos. And even random myspace pages with hacked stylesheets show the common culture of an era.

pfortuny · on July 28, 2020

Do you know about cuneiform? Lots of what is known are just ledgers and exercise books...

Never forget that we do not know the future.

Zenst · on July 28, 2020

Future digital tourism.

That or risk future archaeologists thinking COBOL was some God of the time and the natives built large metal obelisks in dedicated worship temples.

rolph · on July 28, 2020

why do mennonites and other such groups use low/deprecated technologies? partially due to religious creed, but also because when the electricity is gone, oil lamps still function, and horses dont need a petrol pump to keep running.

likewise many people are clinging to the local operating system rather than moving to the SAAS model.

so what happens if we lose the oldschool languages and platforms entirely, for whatever reason ?

if TBTF corporations are somehow hobbled or neutralized, we need old hand tools to build a tech newtopia from the rubble. if those tools are destroyed then we are beholden to a system that stands on very thin ice.

Avicebron · on July 28, 2020

I would add to this that not all forward progress is necessarily good or well thought out. If there is value in an old thing that hasn't been unlocked yet, and it is lost to history, we become collectively worse for wear. Things like Lisp are old and pretty darn cool to have as an option.

I second the need to rebuild from the rubble is often overlooked, especially by corporations driven by profit centered goals.

bordercases · on July 28, 2020

The thought process and conversations that produced the code give insight into how to more generally produce code of that kind. Typically code currently in use is in continuity with code that was previously used, either as a system dependency or conceptual dependency. So it's still useful to have history around, like it would be to have comments in current code.

sgillen · on July 28, 2020

Well I think it’s ok in general for some information to be lost, but I think a lot of HN users value this specific information.

quantified · on Aug 1, 2020

I’m sad to see that this was downvoted, it’s contains the key questions. I think they have good answers.

1) Eventually, everything will be lost anyway. The original print of King Kong is gone. A fire at Universal Studios wiped out the masters for a lot of music at once https://en.wikipedia.org/wiki/2008_Universal_fire . Floods destroy family photos all the time. But those are examples of the forces of decay, of natural entropy, of error. The Library of Alexandria probably contained a lot of useless crap but also nuggets we’d want to know today. Information is memories, useful information is useful memories, and there’s no compelling REASON to lose it. Other sections of usenet history were wiped out when Google acquired it (a lot of comp.database.olap content I had a hand in) and groups of people just lost a knowledge base.

2) It’s not simply code that no one uses anymore. It’s a knowledge base on how and why, debates over constructs and usage that are useful beyond code-sharing snippets a la Stack Overflow.

3) There is an argument for letting some information get lost or at least super-obscure, but it’s hard to see this being a good example. Tide Pod Challenge videos come to mind. GDPR and right to be forgotten mandate something akin to information loss.

4) I posted this elsewhere but I’ll share here too: there was a comment made on the original article about preserving prior art for IP (patent) purposes. That alone is in the public interest. Irrelevant to your questions in general, but pertinent to each of them in this case.

JetSpiegel · on July 29, 2020

It belongs in a museum!

jeffbee · on July 28, 2020

The fact that nobody had enough fucks to give to archive these groups tells you everything you need to know about decentralized peer-to-peer proof-of-work blockchain nerd hobbies. This content exists on a completely open peer-to-peer content distribution network and here you are whining that one company -- the company that already rescued this archive in a midnight U-Haul run 20 years ago -- failed to archive it.

dabockster · on July 29, 2020

Seriously! I have the same issue with a lot of modern online communities/projects too. They all assume whatever platform they're currently publishing on will be there forever.

Brb archiving my Twitter posts

perl4ever · on July 29, 2020

>The fact that nobody had enough fucks to give to archive these groups

Well, you assume. Maybe it was just decentralized enough you haven't heard about it.

none10287 · on July 28, 2020

Google has bought dejanews and has profited immensely from open source and open information.

So I do think they have an obligation either a) to make the whole archive available for anyone or b) maintain it properly.

Properly means restoring the fast UI from around 2004.

imglorp · on July 28, 2020

If you found a human at Google instead of a bot, it would probably say their only obligation is to their shareholders.

It's probably not a good idea to depend on a public company to steward an important community.

Does the Internet Archive have copies of all the old stuff at least?

lstodd · on July 28, 2020

Their only obligation, if we take for granted that there are any humans left at Google, is keeping the aforementioned bots powered.

Which is sad, but expected.

dependenttypes · on July 28, 2020

There are quite a few humans at Google, both in HN and at twitter. Sadly all of them that I talked with seemed like people that I would not want to interact with again.

wegs · on July 28, 2020

[flagged]

skinkestek · on July 28, 2020

Warning: contains snark. Not all googlers are bad but I am grumpy. There's now one you despise so much as the one you loved that betrayed you and all that.

-------------

Busily making sure it feels even more lame to try to give them feedback? Or wondering what thriving ecosystem they can destroy next after

- destroying rss,

- participating in destroying federated messaging,

- trying to kill all independent browser engines and replace it with a nerfed on that "sadly" can't block ads

Some ideas:

Maybe they can come up with a more opaque way to shut down peoples accounts?

Or maybe a more sneaky way to befriend the Chinese government?

imglorp · on July 28, 2020

And that whole AMP thing.

And don't forget about that whole extension of the government thing. https://wikileaks.org/google-is-not-what-it-seems/

dependenttypes · on July 28, 2020

[flagged]

throwaway45349 · on July 29, 2020

> It is also annoying how all google employees on HN suddenly disappear on threads that talk about the new cool unethical thing that google decided to do.

We're here and do pay attention. Plenty of Googlers write comments criticizing the company here.

armitron · on July 29, 2020

Which amount to little more than nothing. Google is destroying the open Internet and everyone that works there is complicit.

throwaway45349 · on July 29, 2020

Like it or not, Google is a part of the Open Internet.

Example: AMP was created to protect the free flow of information which was being silo'd into apps. Consumers of news were abandoning the open (mobile) web due to the godawful performance and advertising issues. Google saw the existential threat of everything being locked down within Facebook and Reddit. Is this preferable?

zentiggr · on July 28, 2020

Wait, Google feels any obligations at all? I thought they only made decisions based on what's most likely to maximize their growth?

specialist · on July 28, 2020

"... their only obligation is to their shareholders."

That'd be an improvement.

Page & Brin retain controlling interest, despite their minority stake.

nine_k · on July 28, 2020

How did it profit from the Usenet archives? Genuinely curious.

goatinaboat · on July 28, 2020

How did it profit from the Usenet archives? Genuinely curious.

Dejanews was the seed material for Google Groups, any profit derived from that (ads) was from content posted to Usenet by people who never intended for it to be used for that.

joshuamorton · on July 28, 2020

Groups doesn't (and didn't ever?) Show ads as far as I know. So you're reaching for second or third order effects at best.