Never thought I'd say this, but in times like these, with clearnet in such dire straits, all the information siloed away inside Discord doesn't seem like such a bad thing. Remaining unindexable by search engines all but guarantees you'll never appear alongside AI slop or be used as training data.
The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.
The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.
Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.
There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.
When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.
> There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things
Those websites are long gone. First, because search engines defaulted to promoting 'recent content' on HTTPS websites, which eliminates a lot of informational sites that were not SSL-secured and archived on university web servers for example.
Second, because the time and effort required to compile this information today feels wasted because it can be essentially copied wholesale and reproduced on a content-hungry blogspam website, often without attribution to the original author.
In its place are cynical Substacks, Twitters or Tiktoks doing growth marketing ahead of an inevitable book deal or online course sales pitch.
Not only are the search engines promoting newer content but they are also (at least Google is) penalizing sites with “old” content [1]. Somewhat related, it’s outrageous to me when a university takes down a professor’s page when they are no longer employed or come up with a standard site for all faculty that is devoid of anything interesting, just boring bios.
They made this wild mistake where moderation (which is a good thing) grew into dictating what websites should look like.
Search is a struggle to index the web as~is. Like biologists look at a species from afar and document their behavior. It's not like, hey if you want to be in the bird book you can lay 6 eggs at most, they should be smooth egg shaped, light in color and no larger than 12 cm. You must be able to fly and make bird sounds only and only during the day. Most important you must build your own nest!
Little Jimmy has many references under his articles, he is not paginating his archives properly, he has many citation.... Lets just take him behind the barn and shoot him.
I've pretty much just seen evidence that this segment keeps growing, and is now much MUCH larger than the Internet in The Good Old Days.
Discovering them is indeed hard, but it has always been hard - that's why search engines were such a gigantic improvement initially, they found more than the zero that most people had seen. But searches only ever skimmed the surface, and there's almost certainly no mechanical way to accurately identify the hidden gems - it's just straight chaos, there's a lot of good and bad and insane.
Find a small site or two, and explore their webring links, like The Good Old Days. They're still alive and healthy because it keeps getting easier to create and host them.
Sites today don't have blogrolls. Back in the '00s it was sacrilege not to have one on the sidebar of your site. That massively improved discoverability. Today you have to go to another service like Twitter to see this kind of cross-pollination.
tbh I have only ever seen a couple in an omnipresent sidebar in my lifetime. The vast majority I encountered around then and earlier were just in the "about" (or possibly "links") pages of people's websites, and occasionally a footer explicitly mentioning "webring".
Also if you squint hard enough, they're massively more common now. They're just usually hidden by adblockers because they're run by Disqus or Outbrain or similar (i.e. complete junk).
I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.
On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.
If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.
That's always been the case. Surely you didn't used to trust random information? Ask any schoolteacher how to decide what to trust on the internet at any point in time. They're not going to say "If it's at the top of Google results" or "If it's a well-designed website", or "If it seems legit".
I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.
"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.
I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.
Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.
But what about people that are not yet in the community? Are we going to make "it's not what you know but who you know" our default mode of finding answers?
What alternative do you suggest? Everything you expose to the public internet is now feeding AI, and every interaction is more and more likely to be with an AI than a real human.
This isn't a matter of elitism, but vetting direct personal connections and gatekeeping access seems like the only way to keep AI quarantined and guarantee that real human knowledge and art don't get polluted. Every time I see someone on Twitter post something interesting, usually art, it makes me sad. I know that's now a part of the AI machine. That bit of uniqueness and creativity and humanity has been commoditized and assimilated and forever blighted from the universe. Even AI "poisoning" programs will fail over time. The only answer is to never share anything of value over the open internet.
Corporations are already pouring billions of dollars into "going all in" on AI. Video game and software companies are using AI art. Steam is allowing AI content. SAG-AFTRA has signed an agreement allowing the use of AI. Someone is trying to publish a new "tour" of George Carlin with an AI. All of our resources of "knowledge" and "expertise" have been poisoned by AI hallucinations and nonsense. Even everything we're writing here is feeding the beast.
OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.
Start filling up Discords with insane AI-generated garbage, and maybe you can devalue the data to the point it won't get sold.
It's probably totally practical too, just create channels filled with insane bots talking to each other, and cultivate the local knowledge that real people just don't go there. Maybe even allow the insane bots on the main channels, and cultivate the understanding that everyone needs to just block them.
It would be important to avoid any kind of widespread conventions about how to do this, since and important goal to to make it practically impossible to algorithmically filter-out the AI generated dogshit when training a model. So don't suffix all the bots with "-bot", everyone just need to be told something like "we block John2993, 3944XNU, SunshineGirl around here."
If we work together, maybe we can turn AI (or at least LLMs) into the next blockchain.
The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.
> The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.
I was think it could work if 1) the noise is just obvious enough that a human would get frustrated and block without wasting much time and/or 2) the practice is common enough that everyone except total newbies will learn generally what's up.
This idea has been talked about enough that we call it "Habsburg AI". OpenAI is already aware of it and it's the reason why they stopped web scraping in 2021.
> This is an intellectually fascinating thought experiment.
It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.
I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).
If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.
This unindexability means Discord could charge a much higher price when selling this data.
I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.
I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.
I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.
I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.
This assumes that there are no legitimate uses to AI. This is clearly not true, so you can't really just equate the two. If you want better content, restrict writing, not reading. It's that simple.
The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at.
What if the AI apocalypse takes this form?
- Social Media takes over all discourse
- Regurgitated AI crap takes over all Social Media
- Intellectual level of human beings spirals downward as a result
Neural networks will degenerate in the process of learning from their own hallucinations, and humans will degenerate in the process of applying the degenerated neural networks. This process is called "neural network collapse". https://arxiv.org/abs/2305.17493v2 It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. Only the CyberPravda project is the practical solution to avoid the collapse of large language models.
AI bots will get the confidence of admins and moderators. They will be so helpful and wise that they will become admin and moderators. Then, they will ban the accounts of the human moderators.
We already have AI generated audio and video. This is a stopgap at best.
Maybe the mods will have to trick the AI by asking it to threaten them or any other kind of “ethical” trap but that will just mean the AI owners abandon ethical controls
The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.