Never thought I'd say this, but in times like these, with clearnet in such dire ...

titzer · on Jan 11, 2024

The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.

Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.

plagiarist · on Jan 11, 2024

There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.

When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.

rchaud · on Jan 11, 2024

> There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things

Those websites are long gone. First, because search engines defaulted to promoting 'recent content' on HTTPS websites, which eliminates a lot of informational sites that were not SSL-secured and archived on university web servers for example.

Second, because the time and effort required to compile this information today feels wasted because it can be essentially copied wholesale and reproduced on a content-hungry blogspam website, often without attribution to the original author.

In its place are cynical Substacks, Twitters or Tiktoks doing growth marketing ahead of an inevitable book deal or online course sales pitch.

mangodrunk · on Jan 11, 2024

Not only are the search engines promoting newer content but they are also (at least Google is) penalizing sites with “old” content [1]. Somewhat related, it’s outrageous to me when a university takes down a professor’s page when they are no longer employed or come up with a standard site for all faculty that is devoid of anything interesting, just boring bios.

[1] https://news.ycombinator.com/item?id=37068464

josh-sematic · on Jan 11, 2024

This search engine is designed to find such sites: https://search.marginalia.nu/

6510 · on Jan 12, 2024

They made this wild mistake where moderation (which is a good thing) grew into dictating what websites should look like.

Search is a struggle to index the web as~is. Like biologists look at a species from afar and document their behavior. It's not like, hey if you want to be in the bird book you can lay 6 eggs at most, they should be smooth egg shaped, light in color and no larger than 12 cm. You must be able to fly and make bird sounds only and only during the day. Most important you must build your own nest!

Little Jimmy has many references under his articles, he is not paginating his archives properly, he has many citation.... Lets just take him behind the barn and shoot him.

waveBidder · on Jan 11, 2024

such websites still get made all the time, they're just not useful for Google to surface. https://blog.kagi.com/small-web

Groxx · on Jan 11, 2024

I've pretty much just seen evidence that this segment keeps growing, and is now much MUCH larger than the Internet in The Good Old Days.

Discovering them is indeed hard, but it has always been hard - that's why search engines were such a gigantic improvement initially, they found more than the zero that most people had seen. But searches only ever skimmed the surface, and there's almost certainly no mechanical way to accurately identify the hidden gems - it's just straight chaos, there's a lot of good and bad and insane.

Find a small site or two, and explore their webring links, like The Good Old Days. They're still alive and healthy because it keeps getting easier to create and host them.

rchaud · on Jan 11, 2024

Sites today don't have blogrolls. Back in the '00s it was sacrilege not to have one on the sidebar of your site. That massively improved discoverability. Today you have to go to another service like Twitter to see this kind of cross-pollination.

Groxx · on Jan 11, 2024

tbh I have only ever seen a couple in an omnipresent sidebar in my lifetime. The vast majority I encountered around then and earlier were just in the "about" (or possibly "links") pages of people's websites, and occasionally a footer explicitly mentioning "webring".

Also if you squint hard enough, they're massively more common now. They're just usually hidden by adblockers because they're run by Disqus or Outbrain or similar (i.e. complete junk).

stemlord · on Jan 11, 2024

Parent didn't say that they don't still get made, just that they are now much more difficult to discover which you repeated

sanroot99 · on Jan 14, 2024

What are current search strategy one employees to find such articles. Nowadays I find it increasingly hard to find such articles.

nicbou · on Jan 11, 2024

I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.

On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

mrkramer · on Jan 11, 2024

>On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

>If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

Exactly; open web is better because everything is public and "easy" to find....well if you have a good search engine.

Deep web is huge: Facebook, Instagram, Discord etc. and unfortunately unsearchable.

nerdponx · on Jan 11, 2024

Right, the issue is not that people don't appreciate good content. The issue is that it's harder for people to find it.

It's an entrenching of the existing phenomenon where the only way to know what to trust on the Web is word of mouth.

EnigmaFlare · on Jan 11, 2024

That's always been the case. Surely you didn't used to trust random information? Ask any schoolteacher how to decide what to trust on the internet at any point in time. They're not going to say "If it's at the top of Google results" or "If it's a well-designed website", or "If it seems legit".

nonameiguess · on Jan 11, 2024

I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.

krapp · on Jan 11, 2024

"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.

I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.

Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.

nicbou · on Jan 11, 2024

But what about people that are not yet in the community? Are we going to make "it's not what you know but who you know" our default mode of finding answers?

krapp · on Jan 11, 2024

What alternative do you suggest? Everything you expose to the public internet is now feeding AI, and every interaction is more and more likely to be with an AI than a real human.

This isn't a matter of elitism, but vetting direct personal connections and gatekeeping access seems like the only way to keep AI quarantined and guarantee that real human knowledge and art don't get polluted. Every time I see someone on Twitter post something interesting, usually art, it makes me sad. I know that's now a part of the AI machine. That bit of uniqueness and creativity and humanity has been commoditized and assimilated and forever blighted from the universe. Even AI "poisoning" programs will fail over time. The only answer is to never share anything of value over the open internet.

Corporations are already pouring billions of dollars into "going all in" on AI. Video game and software companies are using AI art. Steam is allowing AI content. SAG-AFTRA has signed an agreement allowing the use of AI. Someone is trying to publish a new "tour" of George Carlin with an AI. All of our resources of "knowledge" and "expertise" have been poisoned by AI hallucinations and nonsense. Even everything we're writing here is feeding the beast.

nicbou · on Jan 12, 2024

I'm fine with feeding AI if I absolutely have to. I'm not fine with feeding only AI.

nemomarx · on Jan 11, 2024

Well, unless Discord starts selling it to ai companies right?

kaetemi · on Jan 11, 2024

OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.

queuebert · on Jan 11, 2024

No, that's never happened before. You're crazy.

tivert · on Jan 11, 2024

> No, that's never happened before. You're crazy.

Start filling up Discords with insane AI-generated garbage, and maybe you can devalue the data to the point it won't get sold.

It's probably totally practical too, just create channels filled with insane bots talking to each other, and cultivate the local knowledge that real people just don't go there. Maybe even allow the insane bots on the main channels, and cultivate the understanding that everyone needs to just block them.

It would be important to avoid any kind of widespread conventions about how to do this, since and important goal to to make it practically impossible to algorithmically filter-out the AI generated dogshit when training a model. So don't suffix all the bots with "-bot", everyone just need to be told something like "we block John2993, 3944XNU, SunshineGirl around here."

If we work together, maybe we can turn AI (or at least LLMs) into the next blockchain.

klondike_klive · on Jan 11, 2024

The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

tivert · on Jan 11, 2024

> The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

I was think it could work if 1) the noise is just obvious enough that a human would get frustrated and block without wasting much time and/or 2) the practice is common enough that everyone except total newbies will learn generally what's up.

throwaway29812 · on Jan 11, 2024

> The equivalent of "locals don't go to this area after dark"?

We have this with human online places already it's called 4chan

kmeisthax · on Jan 11, 2024

This idea has been talked about enough that we call it "Habsburg AI". OpenAI is already aware of it and it's the reason why they stopped web scraping in 2021.

galdosdi · on Jan 11, 2024

This is an intellectually fascinating thought experiment.

tivert · on Jan 11, 2024

> This is an intellectually fascinating thought experiment.

It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.

I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).

kaashif · on Jan 11, 2024

*until

If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.

This unindexability means Discord could charge a much higher price when selling this data.

rchaud · on Jan 11, 2024

Imagine the rich economic insights we could get from a Discord AI trained on billions of messages in crypto shitcoin channels. /s

wussboy · on Jan 11, 2024

They wouldn’t…would they? /s

KoolKat23 · on Jan 11, 2024

I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.

unglaublich · on Jan 11, 2024

Differentiation becomes harder the better AIs perform, which is currently bound by data availability and quality.

CaptainFever · on Jan 11, 2024

Relevant XKCD: https://xkcd.com/810/

welder · on Jan 11, 2024

Discord is searchable: https://www.answeroverflow.com/

raesene9 · on Jan 11, 2024

I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.

CaptainFever · on Jan 11, 2024

I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.

I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.

krapp · on Jan 11, 2024

Read access to humans means read access to AIs. We can't stop the cancer but we can at least try to slow its spread.

CaptainFever · on Jan 12, 2024

This assumes that there are no legitimate uses to AI. This is clearly not true, so you can't really just equate the two. If you want better content, restrict writing, not reading. It's that simple.

stcredzero · on Jan 11, 2024

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at.

What if the AI apocalypse takes this form?

    - Social Media takes over all discourse
    - Regurgitated AI crap takes over all Social Media
    - Intellectual level of human beings spirals downward as a result

TimurSadekov · on Jan 12, 2024

Neural networks will degenerate in the process of learning from their own hallucinations, and humans will degenerate in the process of applying the degenerated neural networks. This process is called "neural network collapse". https://arxiv.org/abs/2305.17493v2 It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. Only the CyberPravda project is the practical solution to avoid the collapse of large language models.

broscillator · on Jan 11, 2024

Discord will die and there's no way that I'm aware of to easily export all that information.

pawelmurias · on Jan 11, 2024

AI spam bots will invade discord.

TillE · on Jan 11, 2024

And they'll get banned by moderators. Ultimately that's the key ingredient in any good strategy here: human curation.

elzbardico · on Jan 11, 2024

AI bots will get the confidence of admins and moderators. They will be so helpful and wise that they will become admin and moderators. Then, they will ban the accounts of the human moderators.

BizarreByte · on Jan 11, 2024

Mods will ban them and new users will be forced to verify via voice/video chat/livestream.

mingus88 · on Jan 11, 2024

We already have AI generated audio and video. This is a stopgap at best.

Maybe the mods will have to trick the AI by asking it to threaten them or any other kind of “ethical” trap but that will just mean the AI owners abandon ethical controls

htrp · on Jan 11, 2024

Voight-Kampff test

krapp · on Jan 11, 2024

Not actually a real thing.

leschak · on Jan 11, 2024

they will sell. the big guys are gobbling up _anything_ they can get their hands on.