These columns are an accidental historical source just to know what the rumors were, since they would almost never be written down otherwise. At least historians of our era will have terabytes of rumors in Facebook/Twitter to examine. (Or will they?)
> At least historians of our era will have terabytes of rumors in Facebook/Twitter to examine. (Or will they?)
I'm almost certain they won't. IIRC, Facebook and Twitter are very resistant to scraping. Eventually they'll shut down or pivot, and all their existing data will go poof.
Plus people think about social media differently than newspapers. Newspapers were an open public record, and some effort was always made to archive them (e.g. the local library binding them into books or microfilming them). Social media is this weird amalgam of public and private, that people are more jealously guarding from the public.
The Library of Congress has an archive of every tweet from 2006 to 2017. In 2018, they began selectively archiving tweets. I can't find how selective they are exactly, but as far as I can tell, the project is still ongoing.
To be fair, the importance of those tweets in shaping the public conversation might also die because of that. In the high old days era of the 2010s I could go and read what interesting people were thinking about today on Twitter. Now if I pick a random account I know is active I see what they were saying in 2022 for some reason.
They're relying very heavily on their existing network of users if that is not some weird thing that only affects me. I'd assume most readers don't create accounts.
That, combined with the proliferation of information being distributed as text-in-pictures or voice/text-in-video form will make it computationally difficult to search.
Optical character recognition and audio transcription are improving at such a pace that I don't think this will be a significant barrier for future historians. Even now, on a computer with modest resources (e.g. laptop without dedicated GPU), whisper.cpp makes it practical to transcribe hours of podcast audio or other speech. And the transcription only needs to be done once.
you would probably run into the bigger issue of how to search for what is truly relevant. will the researchers of 3000 be able to tell the difference between AI or content-farm clickbait vs decent primary and secondary sources?
even today we have issues determining if what sources from antiquity say is true.
This is not a novel problem, historians already deal with this. Written history is rife with lies, inaccuracies, huge holes (often artificially created), and propaganda. The scale may be larger, but the fundamental issue is the same.
the problem with AI generated content is that this starts extending into areas that didn't really have much of a reason to lie in the first place.
as a general example, there wasn't much propaganda value in lying about food, so for the most part we can trust those descriptions to be true, particularly in guides intended to be written as cookbooks. in 2024, we have legion accounts making fake recipes for the sole purpose of getting clicks for ad dollars.
Myspace, the once mighty social network, has lost every single piece of content uploaded to its site before 2016, including millions of songs, photos and videos with no other home on the internet.
The company is blaming a faulty server migration for the mass deletion, which appears to have happened more than a year ago, when the first reports appeared of users unable to access older content. The company has confirmed to online archivists that music has been lost permanently, dashing hopes that a backup could be used to permanently protect the collection for future generations.
People stopped using it for regular stuff ages ago, when they migrated to Facebook. However, it became a haven for music lovers, with indie musicians and the like using it to communicate with their fans. They probably should have renamed it "MyMusic" or "MusicSpace" or something like that.
Too bad just Internet Archive isn't enough, we also need anarchists like ArchiveTeam to actually mirror all the things, not just the things that agree to be mirrored.
Internet Archive generally allow websites to control if they get indexed/mirrored or not, via the robots.txt, so websites can decide for themselves.
Luckily, we have other grassroots movements like ArchiveTeam that doesn't care and archives anything deemed valuable to be archived, website owners be damned.
> Too bad just Internet Archive isn't enough, we also need anarchists like ArchiveTeam to actually mirror all the things, not just the things that agree to be mirrored.
No? This is wrong? The site owner is the person who morally decides whether their site should get mirrored or not.
It's stuff like this that makes me actively cheer forward otherwise-harmful things like Google's web attestation efforts.
> As long as non-profits like the Internet Archive exist, the probability is higher.
No, as good as it is, the Internet Archive in a single point of failure. Which was put on stark display when they decided a few years ago to pick a legal fight over copyright that they could never win and that put their organization at risk.
Also, I've tried to use the Internet Archive to grab Facebook posts. It doesn't work, even for public ones (all I got was pages and pages of the Facebook login screen).
Content sharing via stories features, like those in Instagram and Snapchat, is more ephemeral than some earlier methods, leaving less of a public trail to analyze.
> These columns are an accidental historical source
Since when were newspapers and magazines considered to be accidental historical sources? They were part of the war effort. It was literally state war propaganda in an ongoing war.