These columns are an accidental historical source just to know what the rumors *...

tivert · 2024-03-06T17:32:33 1709746353

> At least historians of our era will have terabytes of rumors in Facebook/Twitter to examine. (Or will they?)

I'm almost certain they won't. IIRC, Facebook and Twitter are very resistant to scraping. Eventually they'll shut down or pivot, and all their existing data will go poof.

Plus people think about social media differently than newspapers. Newspapers were an open public record, and some effort was always made to archive them (e.g. the local library binding them into books or microfilming them). Social media is this weird amalgam of public and private, that people are more jealously guarding from the public.

jetrink · 2024-03-06T17:46:59 1709747219

The Library of Congress has an archive of every tweet from 2006 to 2017. In 2018, they began selectively archiving tweets. I can't find how selective they are exactly, but as far as I can tell, the project is still ongoing.

1. https://blogs.loc.gov/loc/2017/12/update-on-the-twitter-arch...

anigbrowl · 2024-03-06T19:31:03 1709753463

API access and scraping viability has changed significantly since then.

roenxi · 2024-03-06T23:36:54 1709768214

To be fair, the importance of those tweets in shaping the public conversation might also die because of that. In the high old days era of the 2010s I could go and read what interesting people were thinking about today on Twitter. Now if I pick a random account I know is active I see what they were saying in 2022 for some reason.

They're relying very heavily on their existing network of users if that is not some weird thing that only affects me. I'd assume most readers don't create accounts.

nerdponx · 2024-03-06T16:33:34 1709742814

It will all be "accidentally" deleted when someone decides they don't want to pay the storage bill anymore, like what happened with Myspace.

emidoots · 2024-03-06T16:58:15 1709744295

That, combined with the proliferation of information being distributed as text-in-pictures or voice/text-in-video form will make it computationally difficult to search.

philipkglass · 2024-03-06T17:02:51 1709744571

Optical character recognition and audio transcription are improving at such a pace that I don't think this will be a significant barrier for future historians. Even now, on a computer with modest resources (e.g. laptop without dedicated GPU), whisper.cpp makes it practical to transcribe hours of podcast audio or other speech. And the transcription only needs to be done once.

https://github.com/ggerganov/whisper.cpp

bobthepanda · 2024-03-06T19:42:17 1709754137

you would probably run into the bigger issue of how to search for what is truly relevant. will the researchers of 3000 be able to tell the difference between AI or content-farm clickbait vs decent primary and secondary sources?

even today we have issues determining if what sources from antiquity say is true.

Tadpole9181 · 2024-03-06T23:29:36 1709767776

This is not a novel problem, historians already deal with this. Written history is rife with lies, inaccuracies, huge holes (often artificially created), and propaganda. The scale may be larger, but the fundamental issue is the same.

bobthepanda · 2024-03-07T18:52:33 1709837553

the problem with AI generated content is that this starts extending into areas that didn't really have much of a reason to lie in the first place.

as a general example, there wasn't much propaganda value in lying about food, so for the most part we can trust those descriptions to be true, particularly in guides intended to be written as cookbooks. in 2024, we have legion accounts making fake recipes for the sole purpose of getting clicks for ad dollars.

gamepsys · 2024-03-06T17:00:31 1709744431

In the world where general computing continues to become cheaper over time both of those are not problems in the long term.

user3939382 · 2024-03-06T17:14:48 1709745288

Wait, someone deleted my MySpace?

ta1243 · 2024-03-06T17:16:40 1709745400

https://www.theguardian.com/technology/2019/mar/18/myspace-l...

Myspace, the once mighty social network, has lost every single piece of content uploaded to its site before 2016, including millions of songs, photos and videos with no other home on the internet.

The company is blaming a faulty server migration for the mass deletion, which appears to have happened more than a year ago, when the first reports appeared of users unable to access older content. The company has confirmed to online archivists that music has been lost permanently, dashing hopes that a backup could be used to permanently protect the collection for future generations.

...

gpspake · 2024-03-06T17:42:48 1709746968

Tragic. Archive the stuff you love. There was a ton of music history there that's just gone forever.

ansmithz42 · 2024-03-06T20:20:52 1709756452

Don't trust the "Cloud" for your backups.

phlipski · 2024-03-06T19:32:35 1709753555

People are still USING MySpace?!?!

shiroiushi · 2024-03-07T03:31:49 1709782309

People stopped using it for regular stuff ages ago, when they migrated to Facebook. However, it became a haven for music lovers, with indie musicians and the like using it to communicate with their fans. They probably should have renamed it "MyMusic" or "MusicSpace" or something like that.

chaostheory · 2024-03-06T16:35:23 1709742923

As long as non-profits like the Internet Archive exist, the probability is higher.

diggan · 2024-03-06T17:11:43 1709745103

Too bad just Internet Archive isn't enough, we also need anarchists like ArchiveTeam to actually mirror all the things, not just the things that agree to be mirrored.

Internet Archive generally allow websites to control if they get indexed/mirrored or not, via the robots.txt, so websites can decide for themselves.

Luckily, we have other grassroots movements like ArchiveTeam that doesn't care and archives anything deemed valuable to be archived, website owners be damned.

dredmorbius · 2024-03-06T21:28:10 1709760490

IA have increasingly ignored robots.txt since the mid-2010s, going on at least eight years now:

<https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...>

<https://blog.archive.org/2016/12/17/robots-txt-gov-mil-websi...>

throw10920 · 2024-03-07T04:01:35 1709784095

> Too bad just Internet Archive isn't enough, we also need anarchists like ArchiveTeam to actually mirror all the things, not just the things that agree to be mirrored.

No? This is wrong? The site owner is the person who morally decides whether their site should get mirrored or not.

It's stuff like this that makes me actively cheer forward otherwise-harmful things like Google's web attestation efforts.

paul_funyun · 2024-03-07T09:15:01 1709802901

Disagree. They gave up a legitimate moral claim when they posted whatever on the public facing internet.

throw10920 · 2024-03-07T13:12:01 1709817121

The EU disagrees with you with its "right to be forgotten".

Alternatively - you make an excellent case for the paywalling of vast swaths of the internet.

tivert · 2024-03-06T17:35:52 1709746552

> As long as non-profits like the Internet Archive exist, the probability is higher.

No, as good as it is, the Internet Archive in a single point of failure. Which was put on stark display when they decided a few years ago to pick a legal fight over copyright that they could never win and that put their organization at risk.

Also, I've tried to use the Internet Archive to grab Facebook posts. It doesn't work, even for public ones (all I got was pages and pages of the Facebook login screen).

odyssey7 · 2024-03-06T20:42:49 1709757769

Content sharing via stories features, like those in Instagram and Snapchat, is more ephemeral than some earlier methods, leaving less of a public trail to analyze.

simpletone · 2024-03-06T17:22:41 1709745761

> These columns are an accidental historical source

Since when were newspapers and magazines considered to be accidental historical sources? They were part of the war effort. It was literally state war propaganda in an ongoing war.

layer8 · 2024-03-06T17:32:10 1709746330

You didn’t read the article.

smellf · 2024-03-06T18:30:12 1709749812

Nor even the comment they were replying to, lol.

dredmorbius · 2024-03-06T22:49:52 1709765392

<https://news.ycombinator.com/item?id=39434929>