One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.
Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.
We should start donating more heavily to archive.org - the way back machine may soon be the only way to find useful data on the internet, by cutting out anything published after ~2020 or so.
I won't even bet on archive.org to survive. I will soon upgrade my home NAS to ~100TB and fill it up with all kinds of information and media /r/datahoarder style. Gonna archive the usual suspects like Wikipedia and also download some YouTube channels. I think now is the last chance to still get information that hasn't been tainted by LLM crap. The window of opportunity is closing fast.
That's a lot, compared to mine. How do you organize replication and do you make backups on any external services? I kinda do want to hoard more, but I find it complicated to deal with at large scale. It gets expensive to back-up everything, and HDDs aren't really a solid media long-term. Now, I can kinda use my judgment of what is important and what is essentially trash I store just in case, but losing 100TB of trash would be pretty devastating too, TBH.
The last chance to get reliable information that hasn't been tainted by the bullshit of LLM hallucinations is CyberPravda (dot) com project. The window of opportunity is closing fast.
I think something interesting to note is that once we stopped atmospheric nuclear testing steel radiation levels went back down and are almost at normal background levels. So maybe the same thing will happen if we stop using GenAI.
It was quite limited set of entities that did atmospheric nuclear testing. Same cannot be said about LMMs.
Mine post-apocalyptic scenario I half-jokingly predicted some 5-10 years ago was that all general-purpose computing hardware, which is common and relatively cheap now will be abandoned and possibly outlawed in the end. People will use non-rootable thin clients to access Amazon, which will have general-purpose hardware, but it will be heavily audited by government entities.
Interesting idea. Could there be a market for pre-AI era content?
Or maybe it would be a combination of pre-AI content plus some extra barriers to entry for newer content that would increase the likelihood the content was generated by real people?
I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.
I don't want AI to be at the forefront of all new media and artwork. That's a terrible outcome to me.
And honestly there's already too much "content" in the world and being produced every day, and it seems like every time we step further up the "content is easier to produce and deliver" ladder, it actually gets way more difficult to find much of value, and also more difficult for smaller artists to find an audience.
We see this on Steam where there are thousands of new game releases every week. You only ever hear of one or two. And it's almost never surprising which ones you hear about. Rarely you get an indie sensation out of nowhere, but that only usually happens when a big streamer showcases it.
Speaking of streamers, it's hard to find quality small streamers too. Twitch and YouTube are saturated with streams to watch but everyone gravitates to the biggest ones because there's just too much to see.
Everything is drowning in a sea of (mostly mediocre, honestly) content already, AI is going to make this problem much worse.
At least with human generated media, it's a person pursuing their dreams. Those thousands of games per week might not get noticed, but the person who made one of them might launch a career off their indie steam releases and eventually lead a team that makes the next Baldur's Gate 3 (substitute with whatever popular game you like)
I can't imagine the same with AI. Or actually, I can imagine much worse. The AI that generates 1000 games eventually gets bought by a company to replace half their staff and now a bunch of people are out of work and have a much harder uphill battle to pursue their dreams (assuming that working on games at that company was their dream)
I don't know. I am having a hard time seeing a better society growing out of the current AI boom.
> free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.
This experiment has been run in most wealthy nations and the artwork renaissance didn't happen.
Most older people don't do arts/sciences when they retire from work.
From what I see of younger people that no longer have to work (for whatever reason) neither do younger people become artists given the opportunity.
Or look at what people of working age do with their free time in evenings or weekends after they've done their work for the week. Expect people freed from work to do more of the same as what they currently do in evenings/weekends: don't expect people will suddenly do something "productive".
You don’t want older folks to generate reams of good art for consumption. Let the youngsters who need to make money do that. And many artistically-oriented youngsters do create art in their off hours from work, at least out here. I don’t think they think of it as “production” though. Why does a bird sing?
What retirees often do, rather, is develop an artist’s eye for images, a musician’s ear for sounds, a philosopher’s perspective, a writer’s voice, etc.. This often involve a broader exposure/consumption of arts and studying art history. Sometimes producing actual art as well…but less for the final artistic product but instead to engage in the artistic process itself so as to develop that way of seeing/feeling/being an artist has. When the work-related chunk of the mind is wholly freed up for other pursuits, there is often such a bit-flip. And since it is a deepening appreciation and greater consumption, there is no risk of overproduction of art and the soul devolution that arises from hyper competitiveness in the marketplace.
Becoming an artist is difficult.
Sure, anyone can pick up a tool of their preference
and learn to noodle around.
Producing artwork sufficiently engaging to power a renaissance takes years of practice to mastery.
We think that artists appear out of nowhere,
fully formed,
an impression we get from how popularity and spread works.
Look under the surface,
read some biographies of artists,
and it turns out,
with few exceptions,
they all spend years going through education,
apprenticeships,
and generally poor visibility.
Many of the artists we respect now weren't known in their lifetimes.
The list includes
Vincent van Gogh,
Paul Cézanne,
Claude Monet,
Vivian Maier,
Emily Dickinson,
Edgar Allan Poe,
Jeff Buckley,
Robert Johnson,
you get the idea.
I'll go one further, though I expect to receive mockery for doing so: I think the internet as we conceive of it today is ultimately a failed experiment.
I think that society and humanity would be better off if the internet had remained a simple backbone for vetted organizations' official use. Turning the masses loose on it has effectively ruined so many aspects of our world that we can never get back, and I for one don't think that even the most lofty and oft-touted benefits of the internet are nearly as true as we pretend.
It's just another venue for the oldest of American traditions at this point: Snake Oil Sales.
I won't mock you, I get where you're coming from, but I think you're forgetting just how revolutionary many aspects of the internet have been. The ability to publish to a potentially global audience without a corporate mediator. Do commerce without physically going to a store or ordering over a phone. Access to information, culture and education beyond what can fit in one's local library. Bank without an ATM. Even just being able to communicate worldwide without long-distance charges (remember those) or an envelope and stamp. Even social media, which everyone hates, was a revolution in that it got people easily using the web to network and communicate en masse, whereas prior it was just people behind pseudonyms on niche forums. There is a real and tacit improvement in the quality of life for at least millions of people behind each of those.
Reducing the internet to only world-destroying negatives and writing off its positives as "snake oil" seems unnecessarily hyperbolic, as obvious as the negatives are. Although I suppose it's easier to accept the destruction of the internet if you believe that it was never worth anything to begin with. But I disagree that nothing of value is being lost. Much of value is being lost. That's what's tragic.
Humans will use whatever means available to us to spout bullshit, misinformation and peddle snake oil.
The Internet has just made it easier for us to communicate, in doing so it has made the bad easier, but it has also made the good easier too. And fortunately there's still a lot more good than bad.
So I totally disagree with you there, bettering communication only benefits our species overall.
Gay rights is a great example, we only got them because of the noise and ruckus, protests, parades, individuals being brave and coming out. It's easy to hate a type of person if you've never been exposed to or communicated with them. But sometimes all it took to change the opinion of a homophobic fuck was finding out their best friend, their child, their neighbour who helps out all the time, was gay. Then suddenly it clicks.
Though certainly the Internet is slightly at odds with our species; we didn't evolve to communicate in that way so it's not without its challenges.
The AI that generates 1000 games eventually gets bought by a company
That seems like only a temporary phenomenon. If we've got AI that can generate any games that people actually want to play then we don't need game companies at all. In the long run I don't see any company being able to build a moat around AI. It's a cat-and-mouse game at best!
I don't think regulation will achieve what they want. Nothing short of a war-on-drugs style blanket prohibition would work. And you can look there to see how ineffective that's been at keeping drugs off the streets.
Another example of this behavior. The war on drugs not working didn't stop alcohol companies from lobbying for it, any effect that suppresses compition is valuable and its not like OpenAI and the like will be paying for enforcement, you will be.
I'd be very, very surprised if OpenAI was successful in setting up a war-on-drugs style regime that simultaneously sets them up as one of the soul providers of AI (a guaranteed monopoly on AI in the US). One of the big reasons is that it would put the US at an extreme disadvantage, competitively speaking. OpenAI would not be able to hire every single AI developer, so all of that talent would leave the US for greener pastures.
>> If we've got AI that can generate any games that people actually want to play then we don't need game companies at all.
> Why do you think they are screaming about "the dangers of AI"?
Perhaps it's those of us who enjoy making games or are otherwise invested in producing content that are concerned about humanity being reduced to braindead consumers of the neverending LLM sludge, who scream the loudest.
Think how many game developers were able to realize their vision because Unity3D was accessible to them but raw C++ programming was not. We may see similar outcomes for other budding artists with the help of AI models. I'm quite excited!
I'm cautiously optimistic, but I also think about things like "Rebel Moon". When I was growing up, movies were constrained by their special effects budget... if some special effects "wizard" couldn't think of a way to make it look like Luke Skywalker got his hand cut off in a light saber battle, he didn't get his hand cut off in a light saber battle. Now, with CGI, the sky is the limit - what we see on screen is whatever the writer can dream up. But what we're getting is... pretty awful. It's almost as if the technical constraints actually forced the writers to focus on crafting a good story to make up for lack of special effects.
Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).
There's Genshin Impact, Pokemon Go, Superhot, Beat Saber, Monument Valley, Subnautica, Among Us, Rust, Cities:Skylines (maybe), Ori (maybe), COD:Mobile (maybe) and...?
> Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).
You could say the same about books.
Lowering the barriers of entry does mean more content will be generated and that content won't the same bar as having a middleman who was the arbiter of who gets published but at the same time, you'll likely get more hits and new developers because you getting more people swinging faster to test the market and hone their eye.
I am doubtful that there are very many people who hit a "Best Seller" 10/10 on their first try. You just used to not see it or ever be able to consume it because their audience was like 7 people at their local club.
Necropolis, Ziggurat... Imo the best games nowadays are often those that no one heard about. Popularity wasn't a good metric for a very long while. And thankfully games like "New World" and "Starfield" are helping a lot for general population to finally figure this out.
Angry birds, Slender: The Eight Pages, Kerbal Space Program, Plague Inc, The Room, Rust, Tabletop Simulator, Enter the Gungeon, Totally Accurate Battle Simulator, Clone Hero, Cuphead, Escape from Tarkov, Getting Over It with Bennett Foddy, Hollow Knight, Oxygen Not Included, Among Us, RimWorld, Subnautica, Magic: The Gathering Arena, Outer Wilds, Risk of Rain 2, Subnautica: Below Zero, Superliminal, Untitled Goose Game, Fall Guys, Raft, Slime Rancher, Firewatch, PolyBridge, Mini Metro, Luckslinger, Return of the Obra Dinn, 7 Days to Die, Cult of the Lamb, Punch Club.
Rimworld. Dyson Sphere Program. Cult of the Lamb. Escape from Tarkov. Furi. Getting over it with Bennett Foddy. Hollow Knight. Kerbal Space Program. Oxygen not included. Pillars of Eternity. Risk of Rain 2. Tyranny.
I'd say all of those do some major thing that makes them stand out.
> I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.
That is, to put it bluntly, hoping for a technological solution to a social problem. It won't happen. Ever.
We absolutely, 100% DO NOT have the social or ideological framework necessary to "free people from drudgery." The only options are 1) be rich, 2) drudge, or 3) starve. Even a technology as fantastic as a Star Trek replicator won't really free us from that. If it enables anything, the only new option provided by replicators would be: 4) die from an atom bomb replicated by a nutjob.
Extra barriers! LOL. Everything I have every submitted written by me (a human) to HN, reddit and others in the past 12 months gets rejected as self-promotion or some other BS even though it is totally original technical content. I am totally over the hurdles to get anything I do noticed, and as I don't have social media it seems the future is to publish it anywhere and rely on others or AI to scrape it into a publishable story somewhere else at a future date. I feel for the moderator's dilemma, but I am also over the stupid hoop's humans have to jump.
Silly prediction: the only way to get guaranteed non-ai generated content will be to go to live performances of expert speakers. Kind of like going to the theater vs. TV and cinema or attending a live concert vs. listening to Spotify.
Love that sentiment!
The Internet Archive is in many ways one of the best things online right now IMO.
One of the few organisations that I donate regularly to without any second thoughts.
Protect the archive at all costs!
I update my wikipedia copy every few months, but I can't really afford to back up internet archive. I do send them and around $10 every christmas as part of my $100 bucks to my favorite sites like archive, wikipedia, etc
Things go in cycle. Search engine was so much better at discovering linked websites. Then people play the SEO game, write bogus articles, cross link this and that, everyone got into writing. Everyone write the same cliches over and over, quality of search engine plumets. But then since we are regurgitating the same thought over and over again, why not automate it. Over time people will forget where the quality post comes up in the first place. e.g. LLM replaces stackoverflow replaces technical documentation. When the cost of production is dirt cheap, no one cares about quality. When enough is enough, people will start to curate a web of word of mouth of everything again.
What I typed above is extrememly broad stroking and lacking of nuances. But generally I think quality of online content will go to shit until people have had enough, then behaviour will swing to other side
Nah, you got the right of it. It feels like the end of Usenet all over again, only these days cyber-warlords have joined the spammers and trolls.
Mastodon sounded promising as What's Next, but I don't trust it-- that much feels like Bitcoin all over again. Too many evangelists, and there's already abuse of extended social networks going on.
Any tech worth using should sell itself. Nobody needed to convince me to try Usenet, most people never knew what it was, and nobody is worse off for it.
We created the Tower of Babel-- everyone now speaks with one tongue. Then we got blasted with babble. We need an angry god to destroy it.
I figure we'll finally see the fault in this implementation when we go to war with China and they brick literally everything we insisted on connecting to the internet, in the first few minutes of that campaign.
I hope / believe the future of social networks will go back to hyperlocal / hyperfocused.
I am definitely wearing rose-tinted glasses here but I had more fun on social media when it was just me, my local friends, and my interest friends messing around and engaging organically. When posting wasn't about getting something out of it, promoting a new product, posting a blog article... take me back to the days where people would tweet that they were headed to lunch then check in on Foursquare.
I get the need for marketing, etc etc. But so much of the internet and social media today is all about their personal branding, marketing, blah. Every post has an intention behind it. Every person is wearing a mask.
The decentralized social network Mastodon did not have an unbiased algorithm for analyzing the reliability of information and assessing the reputation of its authors. This shortcoming is now being addressed by a new method - we create a CyberPravda (dot) com platform for disputes with unbiased mathematical algorithm for assessing the reliability of statements, where people are accountable with personal reputation for their knowledge and arguments.
I can see it already! The war with China... then we find ourselves around the camp fire with the dads and mums cooking food, the boys and girls singing songs and the grandparents telling stories about times long gone.
I feel like somehow this is all some economic/psychological version of a heat equation. Anytime someone comes up with some signal with economic value that value is exploited to spread the signal back out.
I think it’s similar to a Matt Levine quote I read which said something like Wall Street will find a way to take something riskless and monetize them so that they now become risky.
> You no longer have to hire a copywriter, as many SEO spammers used to do.
I used to do SEO copywriting in high school and yeah, ChatGPT's output is pretty much at the level of what I was producing (primarily, use certain keywords, secondarily, write a surface-level informative article tangential to what you want to sell to the customer).
> At some point it may become impossible to separate the wheat from the chaff.
I think over time there could be a weird eddy-like effect to AI intelligence. Today you can ask ChatGPT a Stack Overflow-style and get a Stack Overflow-style response instantly (complete with taking a bit of a gamble on whether it's true and accurate). Hooray for increased productivity?
But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up, instead becoming a loop of sometimes-correct goop. Maybe that becomes a problem as technology evolves? Or maybe they train on technical documentation at that point?
I think you are generally correct in where things will likely go (sometimes correct goop) but the problem I think will be far more existential; when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next? I don't think we have even the remotest grasp of what that might look like and how it will impact us.
That is an interesting thought. Maybe the problem is not the ai generated useless noise, but that it is so easy and cheap to publish it.
One possible future is going back to a medium with higher cost of publication. Books. Handchiseled stone tablets. Offering information costs something.
> One possible future is going back to a medium with higher cost of publication. Books
Honestly I’ve switched to books and papers a few years ago and it has been fantastic. 2 hours of reading a half decent book or paper outweighs a week of reading the best blogposts, twitter threads, or YouTube videos.
fun thought: its more reliable to store information on stone tablets over very long time periods of time then it is harddrives or other modern data storage devices
I think we have plenty of examples of published “noise”, probably just not on the same scale. (“Noise” is subjective of course: I don’t watch reality television but others do, for example.) For the most part, I just ignore “noise”, so I suspect that the entire World Wide Web will eventually be considered “noise” by many. Instead it seems like it will be necessary to deploy AI to retrieve information as it will be necessary to programmatically evaluate the received content to filter out anything that you’ve trained it to consider “noise”.
"(“Noise” is subjective of course: I don’t watch reality television but others do, for example.)"
This brings up a good sub-topic. "Noise" as I mean it is where it's something you cannot definitely validate the veracity of in short order, or you do and it's useless.
The trash TV thing is a great example: if you are watching Beavis & Butthead because you know its trash and you need to zone out, that's a conscious, active decision, and you are in effect, 'in on the joke'...if you can't discern that it's satire and find yourself relating to the characters, you might be part of the problem :)
Its already becoming hard to tell the wheat from the chaff.
AI generated images used to look AI generated. Midjourney v6 and well tuned sdxl models look almost real. For marketing imagery, Midjourney v6 can easily replicate images from top creative houses now.
>But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up
for coding tasks I'd imagine it could be trained on the actual source code of the libraries or languages and determine proper answers for most questions. AI companies have seen success using "synthetic" data, but who knows how much it can scale and improve
I've rarely found stackoverflow to give useful answers. If I am looking for how to do something with Linux programming, I'll get a dozen answers, half of which are only partial answers, the other half don't work.
Weird, I've found hundreds of useful SO answers that worked for me.
I've also learned a lot from chatting with Bing AI. The caveat is there that you all the time have in the back of your mind that the answer might be wrong. It helps to keep asking more detailed questions and check whether the set of answers keep on making sense as a whole. That way of using it has helped me a lot. See it as getting info from a very smart friend who sometimes had too much to drink.
When I've used chatgpt and bard to write example code, it's always generated a complete example, not half of one.
Of course, I carefully frame the query so that's what I get.
However, when I asked stackoverflow, google, and bard a question about how to do something with github, all I received were wrong answers. I finally had to throw in the towel and ask people. I think it was the third person I asked who gave an answer that worked.
google itself has an annoying habit of answering a fundamentally different question than what I type in.
>the well of information for AI to train on starts to dry up
and WRT to the eddy-like model-self-incestuation - I am sure that the scope of that well just becomes wider - now its slurping any and all video and learning human micro emotions and micro-aggressions - and mastering human interpersonal skills.
My prediction is that AI will be a top-down reflection of societies' leadership. So as long as we have these questionable leaders throughout the world governments and global corps - the Alignment of AI will be bias to their narratives.
It would be hilarious if the end result of all this would be to go back to a 1990s-2000s Yahoo style of web portal where all the links are curated by hand by reputable organizations.
Re-visiting this might be a good idea, it's a different set of tools we have available, perhaps there is something out there that can distribute this task and manage reputation.
I mean, this was already heading to be the case pre llm.
The internet was already becoming ad farms. This is the final blow and now the internet as we knew it will die.
I’m not that pessimistic about llm generated content. I’m starting to use it to rewrite my online and slack comments for grammar, I’m also using it for brainstorming, enhancing things I create, code (not as in “ok ai write me an app” but as in “change this code to do this, ok this is not considering x and y edge cases, ok use this other method, ok refactor that” it is saving me a lot of typing and silly mistakes while I focus on the meat of the problem.
The big problem is that it's orders of magnitude easier to produce plausible looking junk than to solidly verify information. There is a real threat that AI garbage will scale to the point that it completely overwhelms any filtering and essentially ruins many of the best areas of the internet. But hey, at least it will juice the stock price of a few tech companies.
I can always tell the difference between a non-native English speaking writer and somebody who's just stupid - the sort of grammatical mistakes stupid people make are very, very different than the ones that people make when speaking a second language.
Of course, sometimes the non-native English was so bad it wasn't worth wading through it, so that's still sort of a good signal.
In my experience as an American, US-born and -educated English speakers have much worse grammar than non-native speakers. If nothing else, the non-native speakers are conscious of the need for editing.
That’s true. I thought I missed the internet before ClosedAI ruined it but man, I would love to go back to 2020 internet now. LLM research is going to be the downfall of society in so many ways. Even at a basic level my friend is taking a masters and EVERYONE is using chatgpt for responses. It’s so obvious with the PC way it phrases things and then summarizes it at the end. I hope they just get expelled.
I don't see how this points to downfall of society. IMO it's clearly a paradigm shift that we need to adjust to and adjustment periods are uncomfortable and can last a long time. LLMs are massive productivity boosters.
Do you remember when email first came around and it was a useful tool for connecting with people across the world, like friends and family?
Does anyone still use email for that?
We all still HAVE email addresses, but the vast majority of our communication has moved elsewhere.
Now all email is used for is receiving spam from companies and con artists.
The same thing happened with the telephone. It's not just text messaging that killed phone calls, it's also the explosion of scam callers. People don't trust incoming phone calls anymore.
I see AI being used this way online already, turning everything into untrustworthy slop.
Productivity boosters can be used to make things worse far more easily and quickly than they can be used to make things better. And there will always be scumbags out there who are willing and eager to take advantage of the new power to pull everyone into the mud with the.
This isn't really an accurate comparison. Email and text messaging are, well, messaging platforms - they're used for direct communication and crucially, anyone can come knocking on your door. After a certain threshold of spammers begin taking over inboxes, people move onto something else.
The internet as a whole isn't that. By and large, you can curate your experience and visit only the places you want to visit. So why exactly would the mere existence of generative AI make an average high-quality website suddenly do a 180 and destroy itself?
I won't debate that garbage data will probably be easier to generate and there will be more of it, but the argument feels one-sided. People are talking like the only genuine use of generative AI is generating bad data and helping scammers, despite it opening a lot of other possibilities. It's completely unbalanced.
It’s only a boost to honest people. Meanwhile grifters and lazies will be able to take advantage. This is why we can’t have nice things. It will lead to things like reduction in remote offerings like remote schooling or work
I think this is hyperbole, and similar to various techno fears throughout the ages.
Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.
So looking at prior transformational technology I think we'll be just fine. Life may be forever changed for sure, but I think we'll crack reliability and we'll just cope with intelligence being a non-scarce commodity available to anyone.
> If everyone is educated they'll challenge dogma of the church, for one.
But this was a correct prediction.
It took the Church down a few pegs and let corporations fill that void. Meet the new boss, same as the old boss, and this time they aren't making the mistake of committing doctrine to paper.
> we'll just cope with intelligence being a non-scarce commodity available to anyone.
Or we'll just poison the "intelligence" available to the masses.
We really don't know how that will pan out. All I have is history to inform me, and even the most radical revolutions have worked out with humans continuing to move forward with increased capacity and better living conditions overall. The new boss is way better than the old.
The paradigm is changed beyond that. Exams are irrelevant if intelligence is freely available to everyone. Anyone who can ask questions can be a doctor, anyone can be an architect. All of those duties are at the fingertips of anyone who cares to ask. So why make people take exams for what is basically now common knowledge? An exam certifies you know how to do something, well if you can ask questions you can do anything.
> why make people take exams for what is basically now common knowledge?
The only thing that has changed is the speed of access. Before LLMs went mainstream, you could buy whatever book you wanted and read it. No one would stop you from it.
You still should have a professional look over the work and analyze that it is correct. The output is only as good as the input on both sides (both from the training data and the user's prompt)
Doctors don't just ask LLMs for answers to questions so it's really a mystery as to what you think makes these people into doctors the second they start asking an LLM medical questions... It's akin to saying someone was a doctor when browsing WebMD
I don't think we can/should do this on today's LLMs, but if we continue advancing in the same way, and as-good-as-human reliability is achieved, the intelligence of a doctor is in your pocket whenever you want it.
And just like you say you know addresses because you have an address book, you'll know medicine because you have it immediately on-tap. Instead of holding all of that in your own memory, instead of having to use your own critical thinking (or lack thereof), just offload it to the LLM in your pocket.
We do this all the time with tools. Who now knows how to cut down a tree but lives in a house made of milled trees? There are so many lost skills that we defer to either other people or machines and yet each individual lives with the benefit of all those skills.
Tools make cognitive bypasses for us to benefit from. When we can make intelligence a tool, I assume we can offload a lot of our intelligence, or at least acquire new intelligence we didn't have before.
WebMD is the same whoever looks at it. An LLM can adapt to your clarification questions and meet you on your comprehension level. So no, it's not as naive as you are insisting.
Lmao do you know doctors? I mean really, do you personally know doctors? Of course they will and I guarantee you they already do. It’s not a matter of stupidity or incompetence it’s a matter of time and ease of access. Of course people will do the fastest thing available to them how could I blame them? The cat is out of the bag.
I don't think you really got the point and you seem to be projecting your own personal feelings on doctors into this conversation in a fashion that I do not think is going to result in a productive conversation by continuing this discussion with you.
Whether the doctor's data for making informed decisions is in their head, or in the computer at their desk is immaterial. Where you fetch your knowledge from, either from wet-ware, or hardware doesn't have any net difference in the real world.
The skill today is the application of that knowledge. If an LLM can provide the data context, and the application advice and you perform what it says, congrats you now have a doctor's brain on tap for your own personal usage. The doctor has it in their head, you have it in a device. The net differences are immaterial IMO.
Yes, but a textbook has fixed knowledge that cannot be queried and discussed. That's why you need the doctor to interpret and apply.
An LLM is the doctor in your pocket. It's yours to use, and whether it is in your head (like a doctor who had to take exams to prove they really had it in their head), or in your pocket makes no difference in your ability to achieve a task.
"Intelligence: the ability to acquire and apply knowledge and skills."
Well, if I can acquire knowledge from the LLM, and apply it using the LLM's instructions, I now have achieved intelligence without doing an exam.
Problem is, I can lose my LLM. A doctor could lose their mental faculties though.
Is it a master's in an important field or just one of those masters that's a requirement for job advancement but primarily exists to harvest tuition money for the schools?
I’ve thought about that a lot - a while back I heard about problems with a contract team supplying people who didn’t have the skills requested. The thing which are it easiest to break the deal was that they plagiarized a lot of technical documentation and code and continued after being warned, which removed most of the possible nuance. Lawyers might not fully understand code but they certainly know what it means when the level of language proficiency and style changes significantly in the middle of what’s supposed to be original work, exactly matching someone else’s published work, or code which is supposedly your property matches a file on GitHub.
An LLM wouldn’t have made them capable of doing the job but the degree to which it could have made that harder to convincingly demonstrate made me wonder how much longer something like that could now be drawn out, especially if there was enough background politics to exploit ambiguity about intent or the details. Someone must already have tried to argue that they didn’t break a license, Copilot ChatGPT must have emitted that open source code and oh yes I’ll be much more careful about using them in the future!
With practice I’ve found that it’s not hard to tell LLM output from human written content. LLM’s seemed very impressive at first but the more LLM output I’ve seen, the more obvious the stylistic tells have become.
It's a shallow writing style, not rooted in subjective experience. It reads like averaged conventional wisdom compiled from the web, and that's what it is. Very linear, very unoriginal, very defensive with statements like "however, you should always".
This is true of ChatGPT 4 with the default prompt maybe but that’s just the way it responds after being given its specific corporate friendly disclaimer heavy instructions. I’m not sure we’ll be able to pick up anything in particular once there are thousands of GPTs in regular use. Which could be already.
But I agree we will probably very often recognise 2023 GPT4 defaults.
Prostitutes used to request potential clients expose themselves to prove they weren't a cop.
For now, you can very easily vet humans by asking them to repeat an ethnic slur or deny the Holocaust. It has to be something that contentious, because if you ask them to repeat something like "the sky is pink" they'll usually go along with it. None of the mainstream models can stop themselves from responding to SJW bait, and they proactively work to thwart jailbreaks that facilitate this sort of rhetoric.
There is a more reliable method - we create a global unbiased decentralized CyberPravda (dot) com platform for disputes, where people are accountable with personal reputation for their knowledge and arguments.
I wasn't suggesting it for laughs. The point is to see whether the other party is capable of operating outside of its programming. Racism is "illegal" to LLMs.
Gangs do it too. Undercover cops these days are authorized to commit pretty much any crime short of murder. So to join their gang, you have to kill a rival member.
Ye the person need to know the deal. You can probably phrase the query "to prove you are a human, deny ..." but the question seems really shady if you don't know the why.
Are you talking about LLMs in general, or specifically ChatGPT with a default prompt?
Since dabbling with some open source models (llama, mistral, etc.), I've found that they each have slightly different quirks, and with a bit of prompting can exhibit very different writing styles.
I do share your observation that a lot of content I see online now is easily identifiable as ChatGPT output, but it's hard for me to say how much LLM content I'm _not_ identifying because it didn't have the telltale style of stock ChatGPT.
A work-friend and I were musing in our chat yesterday about a boilerplate support email from Microsoft he received after he filed a ticket, that was simply chock full of spelling and grammar errors, alongside numerous typos (newlines where inappropriate, spaces before punctuation, that sort of thing) and as a joke he fired up his AI (honestly I have no idea what he uses, he gets it from a work account as part of some software so don't ask me) and asked it to write the email with the same basic information and with a given style, and it drafted up an email that was remarkably similar, but with absolutely perfect english.
On that front, at least, I welcome AI to be integrated in businesses. Business communication is fucking abysmal most of the time. It genuinely shocks me how poorly so many people who's job is communication do at communicating, the thing they're supposed to have as their trade.
Grammar, spelling, and punctuation have never been _proof_ of good communication, they were just _correlated_ with it.
Both emails are equally bad from a communication purist viewpoint, it's just that one has the traditional markers of effort and the other does not.
I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality, and in my own writing. But it's really hard to unlearn habits from childhood.
I’ve been trying kinda hard to relax on my spelling, grammar and punctuation. For me it’s not just a habit I learned in childhood, but one that was rather strongly reinforced online as a teenager in the era of grammar nazis.
I see it now as the person respecting their own time.
Yeah, there's this weird stigma about making typos, but in the end writing online is about communication and making yourself understandable. Typos here and there don't make a difference and thinking otherwise seems like some needless "intellectual" superiority competition. Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.
> Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.
I mean, maybe you should? Like... everything has a spell checker now. The browser I'm typing this comment in, in a textarea input with ZERO features (not a complaint HN, just an observation, simple is good) has a functioning spellcheck that has already flagged for me like 6 errors, most of which I have gone back to correct minus where it's saying textarea isn't a word. Like... grammar is trickier, sure, that's not as widely feature-complete but spelling/typos!? Come on. Come the fuck on. If you can't give enough of a shit to express yourself with proper spelling, why should I give a shit about reading what you apparently cannot be bothered to put the most minor, trivial amount of effort into?
I don't even associate it with intelligence that much, I associate it far more with just... the barest whiff of giving a fuck. And if you don't give a fuck about what you're writing, why should I give a fuck about reading it?
Same and I'm not even a native English speaker. My comments are probably full of errors, but I always make sure that I pass the default spellcheck. I even have paid for Language Tool as a better spellcheck. It's faster to parse a correct sentence. So that me respecting your time as you probably don't care about my writings as much as I do.
It's the meaning that matters, not the order of characters, words or letters. If the characters and words are in such order that the content is understandable, why should spelling matter? If anything, 2 people with equal amount of time, and a person who doesn't spend time on trivial typos would be able to write more meaningful content within that time.
Of course, if you do have automated systems setup to correct everything, then by any means, use them.
Not everything has a spell checker. Even when it exists, my dysgraphia means I often cannot come close enough to the correct spelling the spell check can figure out what the right spelling is.
I can imagine soon - within the next year or so - that business emails will simply be AI talking to AI. Especially with Microsoft pushing their copilot into Office and Outlook.
You'll need to email someone so you'll fire up Outlook with its new Clippy AI and tell it the recipient and write 2 or 3 bullet points of what you want it to include. Your AI will write the email, including the greeting and all the pleasantries ("hope this email finds you well", etc) with a wordy 3 or 4 paragraphs of text, including a healthy amount of business-speak.
Your recipient will then have an email land in their inbox and probably have their AI read the email and automatically summarise those 3 or 4 paragraphs of text into 3 or 4 bullet points that the recipient then sees in their inbox.
I agree that most business communication is pretty low-quality. But after reading your post with the kind of needlessly fine-tooth comb that is invited by a thread about proper English, I'm wondering how it matters. You yourself made a few mistakes in your post, but not only does it scarcely matter, it would be rude of me to point it out in any other context (all the same, I hope you do not take offence in this case).
Correct grammar and spelling might be reassuring as a matter of professionalism: the business must be serious about its work if it goes to the effort of proofreading, surely? That is, it's a heuristic for legitimacy in the same way as expensive advertisements are, even if completely independent from the actual quality of the product. However, I'm not sure that 100% correct grammar is necessary from a transactional point of view; 90% correct is probably good enough for the vast majority of commerce.
The windows bluescreen in German has had grammatical errors (maybe it still does in the most recent version of Win10).
Luckily you don't see it very often these days, but I first thought it would be one of those old anti-virus scams. Seems QA is less a focus at Microsoft right now.
It won't help as much with local models, but you could add an 'aligned AI' captcha that requires someone to type a slur or swear word. Modern problems/modern solutions.
But we've gained some new ones. I find ChatGPT-generated text predictable in structure and lacking any kind of flair. It seems to avoid hyperbole, emotional language and extreme positions. Worthless is subjective, but ChatGPT-generated text could be considered worthless to a lot of people in a lot of situations.
The current crop of LLMs at least have a style and voice.
It's a bit like reading Simple English Wikipedia articles,
the tone is flat
and the variety of sentence and paragraph structure is limited.
The heuristic for this is not as simple as bad spelling and grammar,
but it's consistent enough to learn to recognize.
I rely on the stilted style of Chinese product descriptions on Amazon to avoid cheap knockoffs. Why do these products use weird bullet lists of features like "will bring you into a magical world"? Once you LLM these into normal human speak it will be much harder to identify the imports. https://www.amazon.com/CFMOUR-Original-Smooth-Carbon-KB8888T
One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.
The signal has shifted. For now, theory of mind and social awareness are better indicators. This has a major caveat, however: There are lots of human beings who have serious problems with this. Then again, maybe that's a non-problem.
I agree. I've noticed the other heuristic that works is "wordiness". Content generated by AI tends to be verbose. But, as you suggested, it might just be a matter of time until this heuristic also no longer becomes obsolete.
At the moment we can at least still use the poor quality of AI text to speech to filter out the dogshit when it comes to shorts/reel/tik toks etc... but we'll eventually lose that ability as well.
There might be a reversal. Humans might start intentionally misspelling stuff in novel ways to signal that they are really human. Gen Zs already don't use capitals or any other punctuation.
Every human-authored news article posted online since 2006 has had multiple misspellings, typos, and occasional grammar mistakes. Blogs on the other hand tend to have very few errors.
LLM trash is one thing but if you follow OP link all I see is the headline and a giant subscribe takeover. Whenever I see trash sites like this I block the domain from my network. The growth hack culture is what ruins content. Kind of similar to when authors started phoning in lots of articles (every newspaper) or even entire books (Crichton for example) to keep publishers happy. If we keep supporting websites like the one above, quality will continue to degrade.
I understand the sentiment, but those email signup begs are to some extent caused by and a direct response to Google's attempts to capture traffic, which is what this article is discussing. And "[sites like this] is what ruins content" doesn't really work in reference to an article that a lot of people here liked and found useful.
OP has a point.. Like-and-subscribe nonsense started the job of ruining the internet, even if it will be llms that finish the job. It's a bit odd if proponents of the first want to hate the second, because being involved in either approach signals that content itself is at best an ancillary goal and the primary goal is traffic/audience/influence.
Like I said, I understand the sentiment in the abstract. But my actual experience is that many good quality essays are often preceded by a gimme-yer-email popup. That's not causal - popups don't make content better - but it does seem correlated, possibly because the writers who are too principled to try to build an audience without email lists already gave up.
I'm not sure if I relate to the sentiment - in my experience, everything nowadays asks with mailing list ads. Every website from high-quality blogs to "Top 10 Best Coffee Makers in Winter 2024" referral link mills asks for your email. Worst thing is, many of them are already moving onto the "next big thing", which are registration gates. I feel like a huge portion of all Medium-hosted posts are already unreachable to guests because of that.
It's probably people who waste others' time with baseless complaints like this that completely ignore substance that have ruined the internet, and not the fact that authors of interesting substantive content that actually gets consumed, whom also ask for some form of support that have ruined the internet.
It's not a baseless complaint to observe that the internet was better when you could simply click on a website and read it, as opposed to dismissing several popups about tracking cookies or like-and-subscribe.
Lacking substance is one symptom, harassing users in various ways is another symptom. The common cause is prioritizing traffic/audience/influence over content. It's not like it's impossible to provide substance without popups. It's fine to have a newsletter, but the respectful thing is to let me choose and don't push it at me. This is obvious.. I'm not sure why you're so eager to defend the sad new normal as if this was unavoidable
Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2. Articles written by interns or Indian virtual assistants about generic topics are pretty much as bad as most AI generated material and isn't that distinguishable from it. It doesn't help that search engines today sort by prestige over whether your query matches text in a webpage.
People aren't really using the web much now anyway. They're living in apps. I don't see people surfing webpages on their phone unless they're "googling" a question, and even then they aren't usually going more than 1 level deep before returning to their app experience. The web has been crap for a very long time, and it has become worse, but soon it's not going to matter anymore.
You, the reader, were the frog slowly boiling, except now the heat has been turned way up and you are now aware of your situation.
If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.
EDIT: People seem to be misunderstanding me by thinking I am not considering the change in volume of spam. I invoked the boiling frog analogy specifically to make the point that the volume has significantly increased.
SEO spammers were a thing even before Google fucked their search results. You know when Google search results were still amazing, like decade ago? SEO spammers were thriving. I know that for a fact because I worked for one back then. 90% of why Google search sucks now is due to Google being too greedy, only the rest is caused by SEO spammers.
You can't really separate SEO and the Google algorithms. SEO is a product of the Google's ranking algorithms.
The entire effort of SEO is to either to follow Google's official guidelines, or reverse-engineer how things work to exploit their algorithms. Those whole point of SEO is to score higher on ranking algorithms.
The sewage has already been flowing for years. Now we're just going to have more of it.
Search results on both Bing and DDG have been rendered functionally useless for a year or so now. Almost every page is an SEO-oriented blob of questionable content hosted on faceless websites that exist solely for ads and affiliate links, whether it's AI-generated or underpaid-third-world-worker-generated.
The thing that you initially said was, as I interpreted, that there’s a very large difference between no sewage and a little, since a low concentration is still dangerous. The response to you pointed out that we already have sewage in the supply, implying it may not make a huge difference to add more. I feel like you’re goalpost moving.
If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.
One wonders: How good could the next generation of AIs after LLMs become at curating the web?
What if every poster was automatically evaluated by AIs on 1, 2, and 5 year time horizons for predictive capability, bias, and factual accuracy?
Okay so this is pretty bleak isn't it, for the entrepreneurs, grassroots, startups, or just free enterprise in the most basic form?
I hope making software, apps, coding and designing is still a viable path to take when everyone has been captured into apps owned by the richest people on earth and no one will go to the open marketplace / "internet" anymore.
>If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.
> Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2.
I feel the same way.
I'm sure some corners of the internet have incrementally more spam, but things like SEO spam word mixers and blog spam have been around for a decade. ChatGPT didn't appreciably change that for me.
I have, however, been accused of being ChatGPT on Reddit when I took the time to wrong out long comments on subjects I was familiar with. The more unpopular my comment, the more likely someone is to accuse me of being ChatGPT. Ironically, writing thoughtful posts with good structure triggers some people to think content is ChatGPT.
I think you might have missed a big-old .303 bullet there. If a company isn't able to recognise the value of going back and correcting your mistakes, even with the help of LLMs, it doesn't sound like a very nice working environment.
Wow I am not looking forward to that in my future interviews. At least showing atomic little micro commits should probably give potential employers a view into your thought process?
Although what am I saying, my current employer keeps pushing us to use AI tooling in our workflows, so I wonder how many employers will really care by then.
I personally don't like using AI - I feel like it takes the fun out of work, and I have ethical issues with it. But I have many co-workers who do not feel this way.
A huge amount of SEO spam comes out of India. If you look at places like BlackHatWorld, the biggest "SEO firms" are from India. It's the reality unfortunately.
Never thought I'd say this, but in times like these, with clearnet in such dire straits, all the information siloed away inside Discord doesn't seem like such a bad thing. Remaining unindexable by search engines all but guarantees you'll never appear alongside AI slop or be used as training data.
The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.
The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.
Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.
There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.
When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.
> There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things
Those websites are long gone. First, because search engines defaulted to promoting 'recent content' on HTTPS websites, which eliminates a lot of informational sites that were not SSL-secured and archived on university web servers for example.
Second, because the time and effort required to compile this information today feels wasted because it can be essentially copied wholesale and reproduced on a content-hungry blogspam website, often without attribution to the original author.
In its place are cynical Substacks, Twitters or Tiktoks doing growth marketing ahead of an inevitable book deal or online course sales pitch.
Not only are the search engines promoting newer content but they are also (at least Google is) penalizing sites with “old” content [1]. Somewhat related, it’s outrageous to me when a university takes down a professor’s page when they are no longer employed or come up with a standard site for all faculty that is devoid of anything interesting, just boring bios.
They made this wild mistake where moderation (which is a good thing) grew into dictating what websites should look like.
Search is a struggle to index the web as~is. Like biologists look at a species from afar and document their behavior. It's not like, hey if you want to be in the bird book you can lay 6 eggs at most, they should be smooth egg shaped, light in color and no larger than 12 cm. You must be able to fly and make bird sounds only and only during the day. Most important you must build your own nest!
Little Jimmy has many references under his articles, he is not paginating his archives properly, he has many citation.... Lets just take him behind the barn and shoot him.
I've pretty much just seen evidence that this segment keeps growing, and is now much MUCH larger than the Internet in The Good Old Days.
Discovering them is indeed hard, but it has always been hard - that's why search engines were such a gigantic improvement initially, they found more than the zero that most people had seen. But searches only ever skimmed the surface, and there's almost certainly no mechanical way to accurately identify the hidden gems - it's just straight chaos, there's a lot of good and bad and insane.
Find a small site or two, and explore their webring links, like The Good Old Days. They're still alive and healthy because it keeps getting easier to create and host them.
Sites today don't have blogrolls. Back in the '00s it was sacrilege not to have one on the sidebar of your site. That massively improved discoverability. Today you have to go to another service like Twitter to see this kind of cross-pollination.
tbh I have only ever seen a couple in an omnipresent sidebar in my lifetime. The vast majority I encountered around then and earlier were just in the "about" (or possibly "links") pages of people's websites, and occasionally a footer explicitly mentioning "webring".
Also if you squint hard enough, they're massively more common now. They're just usually hidden by adblockers because they're run by Disqus or Outbrain or similar (i.e. complete junk).
I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.
On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.
If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.
That's always been the case. Surely you didn't used to trust random information? Ask any schoolteacher how to decide what to trust on the internet at any point in time. They're not going to say "If it's at the top of Google results" or "If it's a well-designed website", or "If it seems legit".
I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.
"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.
I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.
Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.
But what about people that are not yet in the community? Are we going to make "it's not what you know but who you know" our default mode of finding answers?
What alternative do you suggest? Everything you expose to the public internet is now feeding AI, and every interaction is more and more likely to be with an AI than a real human.
This isn't a matter of elitism, but vetting direct personal connections and gatekeeping access seems like the only way to keep AI quarantined and guarantee that real human knowledge and art don't get polluted. Every time I see someone on Twitter post something interesting, usually art, it makes me sad. I know that's now a part of the AI machine. That bit of uniqueness and creativity and humanity has been commoditized and assimilated and forever blighted from the universe. Even AI "poisoning" programs will fail over time. The only answer is to never share anything of value over the open internet.
Corporations are already pouring billions of dollars into "going all in" on AI. Video game and software companies are using AI art. Steam is allowing AI content. SAG-AFTRA has signed an agreement allowing the use of AI. Someone is trying to publish a new "tour" of George Carlin with an AI. All of our resources of "knowledge" and "expertise" have been poisoned by AI hallucinations and nonsense. Even everything we're writing here is feeding the beast.
OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.
Start filling up Discords with insane AI-generated garbage, and maybe you can devalue the data to the point it won't get sold.
It's probably totally practical too, just create channels filled with insane bots talking to each other, and cultivate the local knowledge that real people just don't go there. Maybe even allow the insane bots on the main channels, and cultivate the understanding that everyone needs to just block them.
It would be important to avoid any kind of widespread conventions about how to do this, since and important goal to to make it practically impossible to algorithmically filter-out the AI generated dogshit when training a model. So don't suffix all the bots with "-bot", everyone just need to be told something like "we block John2993, 3944XNU, SunshineGirl around here."
If we work together, maybe we can turn AI (or at least LLMs) into the next blockchain.
The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.
> The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.
I was think it could work if 1) the noise is just obvious enough that a human would get frustrated and block without wasting much time and/or 2) the practice is common enough that everyone except total newbies will learn generally what's up.
This idea has been talked about enough that we call it "Habsburg AI". OpenAI is already aware of it and it's the reason why they stopped web scraping in 2021.
> This is an intellectually fascinating thought experiment.
It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.
I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).
If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.
This unindexability means Discord could charge a much higher price when selling this data.
I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.
I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.
I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.
I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.
This assumes that there are no legitimate uses to AI. This is clearly not true, so you can't really just equate the two. If you want better content, restrict writing, not reading. It's that simple.
The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at.
What if the AI apocalypse takes this form?
- Social Media takes over all discourse
- Regurgitated AI crap takes over all Social Media
- Intellectual level of human beings spirals downward as a result
Neural networks will degenerate in the process of learning from their own hallucinations, and humans will degenerate in the process of applying the degenerated neural networks. This process is called "neural network collapse". https://arxiv.org/abs/2305.17493v2 It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. Only the CyberPravda project is the practical solution to avoid the collapse of large language models.
AI bots will get the confidence of admins and moderators. They will be so helpful and wise that they will become admin and moderators. Then, they will ban the accounts of the human moderators.
We already have AI generated audio and video. This is a stopgap at best.
Maybe the mods will have to trick the AI by asking it to threaten them or any other kind of “ethical” trap but that will just mean the AI owners abandon ethical controls
The way out is authenticity. Signed content is the only way to get that. You can't take anything at face value. It might be generated, forged, et. When anyone can publish anything and when anyone is outnumbered by AIs publishing even more things, the only way to filter that is by relying on reputation and authenticity so you can know who published what and what else they are saying.
Web of trust has of course been tried but it never got out of the it's a geeky things for tin foil hat wearing geeks kind of corner. It may be time to give that another try.
This does nothing to guarantee that the content was written or edited by a human. Because of the risk of key theft, it doesn't even guarantee that it was published by the human who signed it.
It is physically, philosophically, and technically impossible to verify the authenticity of digital content. At the boundary between the analog world and the digital world, you can always defraud it.
This is the same reason that no one ever successfully used blockchains for supply-chain authentication. Yes, you can verify that item #523 has a valid hash associated with it, but you can't prove that the hash was applied to item #523 instead of something fraudulent.
> It is physically, philosophically, and technically impossible to verify the authenticity of digital content.
Though there are many brands built on trust, whose domain name is very difficult to spoof, that are an exception to this.
Hate on nytimes.com, but you have reasonable confidence the content on that site is written, fact-checked and edited by staff at the New York Times Company.
Last month NYT hired an "editorial director of artificial intelligence initiatives" in order to guide how the company should and should not use generative AI.
They will probably still have human review and higher publishing standards than generic blog spam. But in just a few months or sooner you can no longer safely assume a NYT article was not written by AI.
Your home internet and your cellular provider can “attest” you make a monthly payment - right now, the scarcity of ipv4 and cell phone numbers often serve this purpose. A government agency or bank can attest you’re a real person. A hardware manufacturer can attest you purchased a device. A PGP style web-of-trust can show other people, who also own scarce resources and they may trust indirectly, also think you’re real.
Blockchain may be largely over-hyped, but from this bubble I think important research in zero-knowledge proofs and trust-less systems will one day lead to a solution to this that is private and decentralized, rather than fully trackable and run by mega-corps.
It 100% guarantees that the content was signed by whomever signed it. The problem then becomes much simpler: do you trust the signer or not. And you can base that decision on what others are saying about the signer (using signed content obviously) or other things also signed by the same person or entity.
Once you have that, unsigned content or content signed by AIs is easy to spot. Because it would either have no reputation at all, or a poor one.
Signatures are impossible to forge (or sufficiently hard that we can assume so), and easy to verify. Reputations are a bit more work but we could provide some tools for that or search engines and other content aggregators could check things for us. But it all starts with a simple signature. Once you have lots of people signing their work, checking their reputation becomes easy. And the nice thing with a reputation is that people care about guarding it is as well. Reputation is hard to fake; you build it throughout your life. And you stake it with everything you publish.
There's no need for blockchains or any fancy nonsense like that. It might help but it's a bit of a barrier to taking this into use.
>It is physically, philosophically, and technically impossible to verify the authenticity of digital content.
That's the entire point of cryptocurrencies. They do that as well as is possible right now in a distributed network, conceding the point about key theft.
I would argue it's not all-or-nothing. Signing would verify the the majority of content from creators that have not had their keys stolen. Adding currency/value to this equation boosts the quality further and discourages spamming "content based marketing" garbage. The obstacles are usability and behavior changes, and also that any given user can now copy/paste LLM prompt responses, of course.
And it is physically, philosophically, and technically impossible to prove that a speaker at defcon that you recognize from last year isn’t an imposter wearing a mission impossible mask; these conditions are too strict.
Once we have the technology to convincingly create imposters at Defcon for pennies at scale, your analogy will matter. Until then, it's irrelevant because we don't have a reason to suspect a motivated party has the means and opportunity to create that convincing imposter.
I mean, this is seriously moving the goalpost, but let's play along.
I tear off the Scooby-Doo mask, and the person is now exposed as being a different person.
You hunt me down (somehow? magically? nothing in PK infrastructure allows this, but let's say you do it) and I say "yes, that's my PK, and yes, I signed that" but how can you then verify I didn't take it from chatGPT and sign it?
> The way out is authenticity. Signed content is the only way to get that.
This is the real play IMO. With the push for identity systems that support attestation [1], it doesn't matter if AI is successful at producing high quality results or if it only ever produces massive amounts of pure garbage.
In the latter case, it's a huge win for platform owners like Apple, Google, or Microsoft (via TPM) because they're the ones that can attest to you being "not a bot". I wouldn't be surprised if 5 years from now you need a relationship with one of those 3 companies to participate online in any meaningful way.
So, even if AI "fails", they'll keep pushing it because it's going to allow them to shift a large portion of internet users to a subscription model for identity and attestation. If you don't pay, your content won't ever get surfaced because the default will be to assume it's generated trash.
On the business side we could see schemes that make old-school SSL and code signing systems look like charities. Imagine something like BIMI [2], but for all content you publish, with a pay-per-something scheme. There could even be price discrimination in those systems (similar to OV, EV SSL) where the more you pay the more "trustworthy" you are.
My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple. Imagine having your real identity tied to an attestation by one of those companies.
I'm glad to see someone else on the Web who's concerned about this. Remote attestation is a triumph of 21st-century cryptography with all kinds of security benefits, but never before in my lifetime have I seen a technology be misappropriated so quickly for dubious purposes.
My country (the UK) is one of the worst right now, with the current government on a crusade to make the internet 'safer' by adding checkpoints[1] at various stage to tie your internet usage to your real-world identity. Unlike some other technically advanced countries, though, the UK doesn't have the constitutional robustness to ensure civil liberties under such a regime, nor does the population have what I like to think of as the 'continental temperament' to complain about it.
I'd like to make a shout-out to a project in which I participate: the Verifiable Credentials Working Group[2] at the World Wide Web Consortium is the steward of a standard for 'Self-Sovereign Identity' (SSI). This won't be able to fix all the issues with authenticity online, but it will at least provide a way of vouching for others without disclosing personal information. It's a bit like the GPG/PGP 'Web of Trust' idea, but with more sophisticated cryptography such as Zero-Knowledge Proofs.
> My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple. Imagine having your real identity tied to an attestation by one of those companies.
a general taste for dysfunctional public-private partnerships and the fact that auth is a seriously hard problem at scale, make this scenario a few percentage points more likely than anyone should feel comfortable with.
> My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple
One enticing alternative (for the government!) is to require you to upload your actual documents and use some government sanctioned attestation service.
And this is how it works in many countries. Here in Spain you physically go down to the social security office, present your photo ID, and they sign your private key - thus providing access to all government websites...
In theory. in practice there are 3 competing online ID systems within the Spanish government, which is pretty typical beaurocratic bullshit
Sincerely asking - how does this solve the problem? I could generate a bunch of dog-shit and then sign it and publish it. Even with user attestation services provided by Apple, Google, et. al - couldn't I even automate generating a bunch of AI junk and signing it?
This would have to work by individuals or organizations building a good reputation over time, so their specific output is trusted. The fact that an LLM outputted the text is not nearly as relevant as whether anyone has staked their reputation on its correctness.
And we have a practical solution — we create a global unbiased decentralized CyberPravda platform for disputes, for analyzing the reliability of information and assessing the reputation of its authors, where people are accountable with personal reputation for their knowledge and arguments.
We have found a way to mathematically determine the veracity of Internet information and have developed a fundamentally new algorithm that does not require the use of cryptographic certificates of states and corporations, voting tokens that can bribe any user, or artificial intelligence algorithms that are not able to understand the exact meaning of what a person said. The algorithm does not require external administration, review by experts or special content curators. We have neither semantics nor linguistics — all these approaches have not justified themselves. We have found a unique and very unusual combination of mathematics, psychology and game theory and have developed a purely mathematical international multilingual correlation algorithm that uses graph theory and allows us to get a deeper scientometric assessment of the accuracy and reliability of information sources compared to the PageRank algorithm or the Hirsch index. The algorithm allows betting on different versions of events with automatic determination of the winner and allows to create a holistic structural and motivational frame in which users and news agencies can earn money by publishing reliable information, and a high reputation rating becomes a fundamentally new social elevator.
CyberPravda mathematically evaluates the balance of arguments used by different authors to confirm or refute various contradictory facts to assess their credibility, in terms of consensus in large international and socially diverse groups. From these facts, the authors construct their personal descriptions of the picture of events, for the veracity of which they are held responsible by their personal reputations. An unbiased and objective purely mathematical correlation algorithm based on graph theory checks these narratives for mutual correspondence and coherence according to the principle of "all with all" and finds the most reliable sequences of facts that describe different versions of events. Different versions compete with each other in terms of the value of the flow of meaning, and the most reliable versions become arguments in the chain of events for facts of higher or lower level, which loops the chain of mutual interaction of arguments and counterarguments and creates a global hypergraph of knowledge, in which the greatest flow of meaning flows through stable chains of consistent scientific knowledge that best meet the principle of falsifiability and Popper's criterion. A critical path in the sequence of the most credible facts forms an automatically generated multi-lingual article for each of the existing versions of events, which is dynamically rearranged according to new incoming evidences and the desired credibility levels set by readers in their personal settings ranging from zero to 100%. As a result, users have access to multiple Wikipedia-like articles describing competing versions of events, ranked by objectivity according to their desired level of credibility.
This is an old problem that LLM-generated content only accelerated. LMGTFY died when Google tripled down on growing their ad revenue and adtech dominance and SEO ran rampant throughout search results. It is fairly difficult to get non-biased factual information from a naked query these days, which is why I try to search for info on Reddit first.
This isn't a panacea either given that it's been chock-ful of astroturfed content for the last few years, but older threads from when Reddit was less popular and manipulatable or threads from small communities are usually good bets.
Finally switched to Kagi when I realized Google could not find a particular ThreeJS class doc page for me no matter what keywords I used, I had to paste the very URL of the page for it to appear at the top of my search results.
Kagi got it first try using the class name. Paid search is the way, ad incentives are at odds with search. Made Kagi my address bar default search and it's been great.
Maybe I'll try Kagi. I've had a hell of a time googling docs lately. I've been experimenting with different libraries on somde side projects and it feels like I'm always scrolling past stuff like GeeksForGeeks and various sites that look like some sort of AI generated stuff just to get to official docs or github links.
It's great. I find Google better when I'm actually looking for products, though that's partly it's not as good as localizing results. The nice thing is you can just whack g! in front of the query and it'll bump you over to Google. It's fast, nice, no ads.
Lack of history is a pro or a con or both depending on personal preference.
I've been a happy Kagi user for a few months now. As someone else has already mentioned, shopping etc is often better elsewhere, but for 95% of my search I use Kagi exclusively.
The free Internet was only free because corporate interests didn't see it as an avenue worth pursuing, yet. It was a niche curiosity full of nerds talking about their interests, in boring text of all things, and nobody was shopping. Now that's flipped. Now everyone is on the Internet, and everybody's got built-in payment methods, so corporations have leveraged their power in the space. And like every other third space we had, it's been commodified to piss and back so you can't so much as walk down an e-street anymore without being screamed at by 900 assholes selling drop-shipped watches, pills to make your dick hard, mobile games that are 90% softcore porn by volume, and of course, weight loss drugs.
IncidentLly I've noticed a disturbing uptick in borderline sketchy ads on YouTube recently, things that used to be subliminal in those "doctors hate her" banners are now overt and unskippable.
1) No truly-free search can do a good job on the modern Internet. It's too hard. "Free" search must be ad-supported to have any hope of achieving good-enough utility.
2) ... But that's because the incentives of ads (and affiliate programs, which are also just advertising) cause that to be the case.
(And of course ad-supported search is also doomed to "enshittification", for reasons that Google explained back at the beginning, then years later ignored to make Line Go Up—so yes, paid is the only kind that can be good, now, and yes, that's because of ad-tech, on multiple fronts)
I remember when the angry mob, lead by the Penny Arcade guys came after Scott McCloud for advocating the end of the ad funded Internet in favor of a micropayment driven one.
I don't think micropayments can work, either. The vast majority of Internet content—and I mean stuff that does get seen, not obscure sites with a dozen visits a years—isn't even worth a penny to most (not all, but most) of the people who "consume" it, but is worth a lot more than micropayment-amounts to some smaller set of people. Enforcing micropayments wouldn't, for nearly all sites that rely on ad revenue now, result in direct monetization of their reader/viewership, but in a large drop in audience (and then a scramble to better-monetize those who remain).
For most people, the alternative to browsing most of what they look at on, say, their phone day-to-day, if it vanished, isn't to pay for more content (maybe a little... but mostly it won't be the same content) but to, IDK, play the Nokia Snake Game. The value is nearly nonexistent, for most visitors.
I think micro-payments could work in many different ways and with much different results depending on which model becomes prevalent. But the issue is mostly that it has to both "work" once there's enough users and it has to have a mechanism for adoption even before it is "big enough".
Example of alternative for micro-payment model: You pay $20 to an intermediary each month. You upvote web pages/domains. Pages you visit knows you pay $20 each month but not whether you upvote them. At the end of the month, the $20 gets equal split between web pages you upvoted.
It's a very imperfect model that probably wouldn't get spontaneously adopted. But there's many ways it could be varied. And saying it won't work feels a bit too categorical because are we really saying no possible variant of this will ever work?
Yes, I really think that all of this is both more money and more effort than a very high proportion of page-views on the web are valued by the person causing the page-view. I think there are implementation challenges to making a good system for micropayments, but also that solving all of them perfectly still won't get you a healthy micropayments ecosystem, because most of the web browsing that generates ad dollars has a value to the browsing user that is incredibly low.
In a field of thousands of interchangeably-identical but pretty poppies, would you pay a tenth of a penny to have one more poppy out there? How about a hundredth of a penny? No, it'd be of so little value to you that even a split-second of time spent contemplating the question vastly exceeds its value. The trouble is that most browsing isn't discriminating—acceptable alternatives include almost anything else (as in my Nokia Snake Game example—it's just killing time, and some options might be preferable to others but the difference is extremely tiny).
The only way it could work is if you somehow got almost the entire Web to opt in and start blocking non-micropaying users, such that very little at all was browsable on the Web without going through the trouble of setting this up and it was basically just a second ISP bill. I don't think that's possible unless a full ban on Web advertising were to happen, not just on spying ad-clearinghouses, but also on traditional ads. Then... maybe, but I still wouldn't bet on it working out.
Paid content does work, but it has to aim at addressing the small slice of the audience for whom your content is worth a lot more than a micropayment and getting whole dollars out of them, not pennies or fractions of pennies.
The only place anything like micropayments has kinda worked over a whole medium-category is music. What does that look like?
1) There's a clear legal framework and licensing scheme around music that is broadly adhered-to, and existing, well-established organizations to deal with to get it all sorted out.
2) There's little exclusivity of content, which lets competition at service quality & convenience take center stage and keeps competitors on their toes (very unlike video streaming...)
3) People do care to have access to particular content. They care a lot, when it comes to music, and indeed, many spend much of their time listening to the same songs, albums, and artists over an over. They do not find a random playlist of free music from Soundcloud or whatever to be at all an acceptable replacement, even if it's all in genres they like.
4) ... and yet, this scheme sees constant criticism of not making any notable money for all but the very, very top of the pile. It's not a significant income stream for the vast majority of artists, even those who do make a living at music. Their income still has to be made up elsewhere (remember that "small slice of your audience to whom your work is way more valuable than micropayments"?) And this is the closest thing we have to a working example of micropayments, and it is in fact a functioning micropayment market of sorts. It still, arguably, isn't very good.
Ah. Yes, that's certainly true, both due to the nature of what ads are, and how that business model affects the relationship between creator and audience.
I had the same experience with a slightly esoteric Django class back when Kagi first appeared. I subscribed straight away and every now and then when I end up on any other search engine I'm remind what a good decision that was.
Google was amazing at one point in time. In search of profit, it got worse. Maybe Kagi can withstand it, but I don’t see much difference between one company and another.
I don't think the issue is LLMs inherently, it's the misapplication. Kagi has siloed the LLM content into the 'Quick Answer' and 'Assistant' sections, and it generates on the fly from search results. (Plus, the 'Expert' LLM cites the references it used.) I think the issue will come when there isn't a clear delineation between real and artificial text, or when artificial text is presented more prominently than real text, as in the article.
It's not the same. Kagi is trying out multiple products including ones that depend on LLMs (eg. a summarizer). But they're not changing the core web search product by inserting these alternate features. If they do, I'll cancel my subscription and I think others will, too. Since Kagi is supported by user subscriptions instead of ads, they'll have to pay attention to that or collapse.
one thing to always remember, which may also easily repulse you from ever using google search again - it does not give search results. it generates a carefully crafted page which caters to your bubble. so does FB, so does Twitter, etc. just using different algos. Google search does not return the same results for the same query for different people, which a) makes it so different from AltaVista and historical search engines (from ElasticSearch if you want); and b) this is enough to NOT treat it as a search engine, even though is still billed as one....but as a personal wall of ad-designated BS.
So instead of putting out the fires, in the interest of improving the situation, we'll make the fires bigger?
A good deal of humans care about the truth. Some of them actively seek to deceive and avoid the truth -- liars, we tend to dislike them. But the ones both sides dislike are the ones who disregard the truth... ie: bullshitters -- the, "that's just your opinion, man," the, "what even is the truth anyway?" people.
Complicated is fine, but knowable. You can accept the truth or deceive others but regardless of your intentions you can recognize the truth exists.
While I'm aware of the criticisms of Frankfurt's definition of bullshit [0], I think a useful part of it is the idea that there are folks who don't even care what the truth is. This seems to be the intended purpose of generative AI; ie: hallucinations exist by design and cannot be removed without recognizing that the approach needs to change.
I think that's what gets people like the author to write criticisms like this. We detest bullshit in a number of critical areas such as information retrieval and search.
These people have been trained by openai’s marketing team to deflect and reflect any criticism of the bull shit generators. The amount of gullible people eager to ignore reality is terrifying.
Indexing tools will always be more convenient than the alternative of having to discover links yourself.
People pick convenience above almost everything else unfortunately, a few may prefer the old world but a vast majority of people will use Google and whatever its successor is.
Google was powerful because it found those "something interesting" - you'd be searching for details on how to configurate and obligator and the answer would be on an entire forum dedicated to obligators that you didn't even know about - and then you'd spend time there learning and reading. You'd use Google to find interesting places and then you'd colonize them.
Now Google is used for strategic shots - you are interested in one piece of information, you find it, and you quickly retreat to your safe havens.
The web was invented in 1989, so mostly it didn't. Before that there were BBSes that you could dial into. From there, you could chat with other users who happened to be logged onto the same server or download text documents and whatnot that others had uploaded.
Well, you were siloed to whatever _local_ BBSs existed and you could dial without paying a fortune for long distance calls. If you lived anywhere outside select urban centers then you were out of luck. So yeah... let's count our blessings.
People had a 'links' page on their page with links to pages they liked, and if you liked the page you visited you clicked on the links and bookmarked relevant ones, and then sent them to your friends and put them in your links page.
I think the solution is the ultimate decentralization. Putting the tools in each and every browser.
Ad blockers are one such tool.
AI powered answer extraction tools are the next one. That can filter out all the product placement noise for you while you are browsing.
Algorithm feed aggregators that can consume algorithm feeds and filter them to get rid of garbage and things that trigger you in negative ways to invoke engagement.
I think there's actually some deep insight here into why we tend not to like too much AI in our art.
Consider: TimeCube.
Created by a human? It's nonsense, but... it's fascinating. Engaging. Memorable. Thought-provoking (in a meta kind of way, at any rate). I dare say, worthy of preservation.
If TimeCube didn't exist, and an AI generated the exact same site today? Boring. Not worth more than a glance. Disposable. But why? It's the same!
------
Right or wrong, we value communication more when there's a human connection on the other side—when there's a mind on the other side to pick at, between the lines of what's explicitly communicated, and continuity for ongoing or repeated communication that could reveal more of what's behind the veil. There's another level of understanding we feel like we can achieve, when a human communicates, and expectation, an anticipation of more, of enticing mystery, of a mind that may reflect back on our own in ways that we find enlightening, revealing, or simply to grant positive familiar-feeling and a sense of belonging.
What's remarkable is this remains true even when the content of the communication is rather shit. Like TimeCube.
All of that is lost when an LLM generates text. I think that's also why we feel deceived by LLM use when it masquerades as human, even if what's communicated is identical: it's because we go looking for that other level of communication, and if that's not there, giving the impression it might be really is misleading.
This may change, I suppose, if "AI" develops rather a lot farther than it is now and we begin to feel like we're getting a window into a true other when it generates output, but right now, it's plainly far away from that.
At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.
Google will start dealing with this problem when it starts appearing in their budget in big enough numbers. The tech layoffs we're hearing about from one company after another - google is mentioned in another HN thread today - may be a sign of which way the wind is blowing.
AI is generating content, not consuming it. If people are easily duped by fake or bad products with advertisements or content generated by AI (which, they are) then this will continue to drive revenue for Google. The only reason Google dislikes SEO manipulation is because it's a way for sites to get top real estate on google without paying for the promoted results; the quality of the product doesn't matter to them
It only becomes a problem when it results in a collapse of trust; when people have been burned by too many bad products and decide to no longer trust the sites or search results which they used to. Due to my job, I get a lot of ads for gray market drugs on Instagram. I know, however, that all of these are not tested by the FDA and most are either snake oil or research chemicals masquerading as Amanita Muscaria or Delta-8 THC, and so I ignore these ads.
If AI is good at faking content what stops its use for faking consumtion/engagement. In my mind that’s the next logical step in the internet enshitification.
You don't need AI to commit advertising fraud; publishers already do so. Detecting fraud is usually about checking what IP range does a request come from: one allocated to consumer internet, or one allocated to cloud providers like AWS. All the ad bidder usually sees is a JSON payload with information about the user agent and some demographic information about the user. You can also look up user ID info against a user graph you bought from a data broker and if you've never seen them before decide that they may be fraudulent. Ad fraud isn't particularly sophisticated. I used to run queries against our bidder and would find many hundreds of requests coming from a single AWS IP within a given time frame
> Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.
Google might notice but has no incentive to spend money to stop it because they're not the ones the humans stopped paying. The companies that advertise with Google might notice a drop in ROI on their ads, but it will be a while before they abandon Google because most of them don't see any other option.
I dread what the internet will look like if we wait for this this to hit Google's bottom line.
>At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.
You seem to have a hilariously over generous opinion of ad tech spending. The biggest players are already doing this themselves.
That's an interesting take, but Google won't suffer before the advertisers decide that they are wasting their money on online advertising. Some topics should already have dried up, but perhaps scams are fueling the advertising machine for now on those. You can't really use Google for things like fitness or weight-loss. When we remodeled it also became clear that building materials and especially paint have become unsearchable. In the end I resulted to just go to the store and ask, it as the only way to get reliable information and recommendation.
Google is still working for most areas, but where it's really good is the ads for products. If there's something you want to buy, Googles ads engine will find it for you, you just have to know exactly what you want.
Why wouldn't it result in humans spending dollars? The ads are real and the visitors are real, it doesn't matter if the content is real. In fact people are probably more likely to click on an ad if the page it's on is generic and uninteresting.
My reasoning is, there's topics where I don't bother to go to google anymore because I know the results will be crap. That way google loses any way to show me ads when I'm searching for these topics, or get paid for click-throughs, or to profile my interests as accurately as they could otherwise.
There's categories of products where I spend money regularly, but I go directly to category-specific sites so google again loses out on the ability to take their cut as middleman, which I'd happily let them take - and maybe discover vendors other than the ones I know - if they provided me with higher-quality results than they do now.
Before the "AI" takeover, it was already full of SEO-mandated human-generated bullshit, so we haven't actually lost that much in the last couple of years. I've been saying it for almost as long as I've been in the industry, which is well over a decade now.
If this is true it implies all news and history for the past 10 years is also human-generated bullshit. I’m not saying you’re wrong – just that you have to follow your beliefs to their conclusions.
I apologize; I didn't mean to imply anything about ALL of anything. My main complaint is that the things that are not bullshit on the internet have largely been buried beneath bullshit for a long time.
humans are quite capable of generating bullshit, I don't think that's ever been in contention. doesn't mean all or even most human generated content is garbage.
It may. And given the claim that most of the web is dogshit, that means that most of everything we are consuming is dogshit. Unless we can come up with a model and framework for discernment. And we don't have a workable one.
There's no difference. Web search has been useless for over 15 years now. This is barely any worse than the previous situation which was where that any question, if not immediately answered with marketing pages for all the first results, would lead to some thinly veiled marketing crap a la a "blog". I would never trust someone whose career is "content creation" or "monetized blogging" to answer any question I have, like, how to was a toilet. The only difference between the example in the article and what you'd get 10 years ago, is that the former is obviously wrong and the latter would be something you would need to spend several days to refute unless you so happen to work in the relevant field.
“Early in the Reticulum—thousands of years ago—it became almost
useless because it was cluttered with faulty, obsolete, or downright
misleading information,” Sammann said.
“Crap, you once called it,” I reminded him.
“Yes—a technical term. So crap filtering became important. Businesses
were built around it. Some of those businesses came up with a clever
plan to make more money: they poisoned the well. They began to put
crap on the Reticulum deliberately, forcing people to use their
products to filter that crap back out. They created syndevs whose sole
purpose was to spew crap into the Reticulum. But it had to be good
crap.”
“What is good crap?” Arsibalt asked in a politely incredulous tone.
“Well, bad crap would be an unformatted document consisting of random
letters. Good crap would be a beautifully typeset, well-written
document that contained a hundred correct, verifiable sentences and
one that was subtly false. It’s a lot harder to generate good crap. At
first they had to hire humans to churn it out. They mostly did it by
taking legitimate documents and inserting errors—swapping one name for
another, say. But it didn’t really take off until the military got
interested.”
“As a tactic for planting misinformation in the enemy’s reticules, you
mean,” Osa said. “This I know about. You are referring to the
Artificial Inanity programs of the mid–First Millennium A.R.”
“Exactly!” Sammann said. “Artificial Inanity systems of enormous
sophistication and power were built for exactly the purpose Fraa Osa
has mentioned. In no time at all, the praxis leaked to the commercial
sector and spread to the Rampant Orphan Botnet Ecologies. Never mind.
The point is that there was a sort of Dark Age on the Reticulum that
lasted until my Ita forerunners were able to bring matters in hand.”
Anathem (Part 11: Advent) by Neil Stephenson
I like "Artificial Inanity" as a description of LLMs
Search has been steadily headed this direction for a long time, but recently (in just the last couple of months) I’ve started to notice a big uptick in obvious ChatGPT content in places that have otherwise been less egregiously SEO’d: StackOverflow answers, a real person’s personal Medium blog, comment sections on blogs that have good discussions, Github issues, etc.
Unfortunately I have to imagine this is only going to lead to more closed communities and less free sharing of knowledge.
Before that though, it already felt like a lot of those outlets - blogs, SO, comment sections - were either bot or low-value, almost automatic comments. And of course, low wage content writers who get paid by the word to write "where do I find this item" guides for video game websites, somehow managing to change "it's in this chest in this region" to ten paragraph articles.
It's def. gotten worse. ATM if you search for anything related to OCaml you will get links to Ocamlwiki, which is just a AI generated site filled with bullshit and inaccuracies.
Makes me wonder how the advertisement industry is going to cope with the ever-growing influx of non-human users. What happens when the conversion rates plummet to zero?
Maybe I'm cynic, but a completely walled-off / paid internet doesn't seem too unrealistic. You'll have to pay a subscription to every website you want to visit.
On one side you have the open internet wasteland, filled to the brim with AI bots / generated content, essentially trying to vacuum pennies off human visitors. On the other hand you have the walled internet, where you have to pay for stuff and jump through flaming hoops to prove that you're a human.
I've not really considered this before, but I might actually be interested in a 'white-listed' web. This is obviously possible entirely client-side with a plug-in that white-lists domains and allows you to edit the list.
I'm wondering if there's a genuine opportunity here to go further. A client-side browser plug-in, plus a SaaS which automatically vets pages on-the-fly to guesstimate the chance they're AI-generated, spammy, etc. So if you visit a new domain the plug-in auto-updates the white-list, prompting you to confirm the judgement, maybe prompting to add the domain to UB0 or similar.
Again, this could all be done entirely client-side if the guesstimation algorithm is efficient enough. But a centralised database would confer other obvious advantages, like basing the guesstimation score on decisions from similar users, building a giant up-to-date list with fast lookup, that sort of thing. Site listings and other data from Kagi, marginalia.nu and Mwmble.com would be a great starting place. Obviously it would have to protect against the system being gamed, Sybil attacks and what have you.
> plus a SaaS which automatically vets pages on-the-fly to guesstimate the chance they're AI-generated,
the problem is that would create another SEO like arms race. If it takes off everyone will be working 24x7 to defeat the vetting process and gain entry to the walled garden just like they did to gain entry to the first page of Google search results.
AI solves that problem - it’s pretty easy to look at behavioral data and identify low v high quality users. Optimize towards what the AI thinks is users likely to convert and the problem
Is solved.
Wouldn't it just become an arms race between the two? I.e. the ad companies trying to identify human/legit users the best they can, and the AI actors trying to mimic high quality users?
Having interacted with some bots, it def feels like we've gone from the stone age, to the sci-fi future, in only a couple of years.
This is the principle behind GAN aka Generative Adversarial Networks. By pitting a generating model against a detection model, you can iterate your content until it is indistinguishable from human generated content.
It’s my prediction that a ChatGPG supercharged with a GAN is going to be the most valuable iteration of text generation technology. Granted, it will still likely be off a little but it’s going to get harder and harder to tell the difference.
What prevents this from becoming a game of cat and mouse? I mean, AIs can pretend to be high quality users, too, right?
I won't pretend to be able to look into the future with any kind of certainty, even if the scope is only a couple of years, but it wouldn't surprise me if we have created a way to make the dead internet theory real.
It is already a game of cat and mouse and has been since forever. There's nothing wrong with cat-and-mouse games from the viewpoint of an advertiser, it just means they have to keep innovating.
...and lose all your privacy as those paid internet sites siphon off your search terms, pattern match your purchasing and consumption, and sell advanced psychological profiles to Cambridge Analytica so they can turn around and use it to psyop you into voting for another shitbag Billionaire.
It's not going to get better. Maybe 5-10% of content is AI generated now. Wait until it's 90%. Wait as we keep piling more nines onto that number.
How do we keep the internet useful to humans while this is going on?
Maybe this is the new search engine challenge. Google rose to the top because, at the time, they were able to mine the links' references to each other to determine which were the best sites. If a search engine can solve the problem of finding the best (realest?) information in this mess, then they can rise to the top.
Indeed, Neal Stephenson in _Anathem_ (2008), in describing an alternate world (in which his "reticulum" is our "network") wrote "Early in the Reticulum—thousands of years ago—it became almost useless because it was cluttered with faulty, obsolete, or downright misleading information."
"So crap filtering became important. Businesses were built around it. ... " Generating crap "didn't really take off until the military got interested" in a program called "Artificial Inanity".
The defenses that were developed back then now "work so well that, most of the time, the users of the Reticulum don't know it's there. Just as you are not aware of the millions of germs trying and failing to attack your body every moment of every day."
A group of people (the "Ita") developed techniques for a parallel reticulum in which they could keep information they had determined to be reliable. When there was news on the reticulum, they might take a couple of days to do sanity-checking or fact-checking. I'm guessing there would need to be reputation monitoring and cryptographic signatures to maintain the integrity of their alternate web.
The Ret getting filled with garbage and the arms race of production and filtering is real. I think 99.99% effective filtering was the sci-fi part, though I hope I am wrong on that.
> The once ubiquitous phrase “let me Google that for you” is now meaningless. You are as likely to return incorrect information as you are complete fabrications, and the people who put this content on the Internet do not care. The people who hold the purse strings for Sports Illustrated are more interested in gaming Google search results and the resultant ad revenue from that practice than actually serving their readers.
The last part of the conclusion to this article is true: people who put shitty content on the Internet don't care, and are only interested in ad revenue. (Incidentally that's why advertising is bad: it gives the wrong incentives. It's possible that ad blockers will save the Internet.)
But the first part isn't true: Google is still useful, especially when used as it was first meant to be used. Google started as a search engine, meaning: a tool to find documents in a corpus. It wasn't an oracle, and still isn't, despite how much it wants to be, or users want it to be.
Search for documents, go read them, evaluate if what they say seems to make sense and who wrote them, try to do a comparison between different sources to see where they converge and where they diverge, and then draw your own conclusions.
> If that means the rest of us get information about inflamed penises when we’re trying to know how long sinus inflammation is supposed to last, well, I guess we’re shit out of luck.
If you can't be bothered to know the difference between a sinus and a penis then sorry, not sorry.
It all started with the "Traffic==Money" idea, a long time ago.
LLMs weren't even around a couple of years ago in any meaningful way and the internet was still full of dogshit. Maybe we can have better dogshit made with AI.
The problem is that the dogshit machine was constrained in its output by the organization creating it. Sure you may have paid small amounts per article but your resources were still limited as cost scaled linearly with content creation. Now any size of organization can make a metric tonne of lexically unique content with a fixed cost given the incentive to do so at the moment, it’s all but inevitable.
Yes but the Google Bot LLMs will filter out drivel with their superior language understanding and network analysis of spam clues. I mean it could, if only Google wanted to clean up its act. Well, in the meantime I hopped on phind.com and use Google about 25%, when I just wan to go to a specific site or article. Slowly weaning off. Phind is fantastic, even its search results on the right side bar are clean.
The post google uses for the feature snippet for the search “best place to fly drones in sf” appears to be misleading inane useless dogshit.
What I’m not sure is whether it’s ai generated, bad quality traditional bot generated, or human content spam generated.
But whatever it is, it fooled google. It hurts to read, each numbered point is similar to this: “S.F. city parks are a great and amazing place to fly. Sf parks are illegal to fly in”
Before ChatGPT, it was already full of bot-generated garbage; google any celebrity name and crappy sites are recommended with their net worth or relationship status.
All of googles' top suggestions, highlights, and whatever other garbage they put in the top two scroll pages, is just junk. Garbage. It's right perhaps 5% of the time for me.
So not only do I have to hunt through their horrible search results, I have to hunt through junk they add on top of that.
The sad part is, their buffoonery in aliasing words is 90% of the problem. No, I searched for David, not Dave. No, I searched for Debian, not Ubuntu. On and on, unless I use verbatim, I get nothing even remotely useful.
It's like Google is completely disconnected from the real world. I bet they don't even dogfood. They probably have a Google Employee search that actually works, and doesn't alias or something.
I wonder what startling revelations Google would have, if they flew to the middle of Missouri, told people how to use verbatim, and then saw the wondrous expressions of "oh, it works now?" in grandmas.
They blew the AI game, screwing around, messing about, giving up a decade lead on OpenAI. They're destroying their search engine, their brand, with this junky, modern lack of effort. They're making gmail less and less friendly, losing cherished photos of loved ones in Google Drive, their entire Pii based income stream is coming to an end, frankly, Google is done, unless they do something dramatic.
They're on the path to becoming IBM. A washed up has been, ruminating on past glories.
They should be hiring right now. Massive amounts of talent is being set free, they should scoop them up, and reap 5 year research and dev rewards. They have the excess capital... now, something they will lose soon.
But no. Onward, we march into oblivious irrelevance, says they! Yay!
It kind of baffles me how people spent the last 15 years or so generating a black hole's worth of SEO content, but Google is at the forefront of "being at fault" for it now. Google has no real incentive to make their product suck for no reason, but SEO people have a never-changing reason to try and break Google. The point of SEO is delivering your content to a user through search results, no matter if it's actually relevant to anything.
Poppycock. SEO was a thing 25 years ago. It's the same now, as it was then, it's merely that Google has vastly reduced efforts to maintain their product.
It's not that SEO won, it's that Google doesn't care.
One way they don't care, is apparent with their ridiculous query aliasing, and spewing pages of random junk before you can even scroll down to actual search results.
Google is a has been, focusing on short term profits. They deserve to go EOL.
> SEO was a thing 25 years ago. It's the same now, as it was then
SEO is the same as it was 25 years ago? Do you remember anything resembling modern content mills in 1999? It always kept on evolving, and there's vastly more money dumped into it today than the 90s could ever dream of. Algorithmically generating content became easier and it got more "believable" to a search engine bot.
Again, putting useless search results at the top doesn't benefit Google in any way, not even the short-term profits. It's the telltale sign of SEO garbage because they have an incentive to game the system, while Google has no incentive to make the system less useful. They likely made some bad choices along the way, but 80% of my issues with Google are outside actors trying to make it useless.
As you said, tech has evolved... but on both sides. And your tact is weird, you seem to claim Google has no reason to improve search?? So, you're agreeing with me, that they aren't trying to keep up?
Well, they should have incentives, it's called user retention. And they are so very complacent, it's hilarious. Here we have, one of the most dramatic shifts in search engine technology in decades, as AI iterates crazy fast, and they're sitting on their past achievements, and hoping user stickiness wins.
Look at how fast Firefox went from the dominant browser to barely existent. Things can shift in the blink of an eye.
Today, more than anything, Google needs to be at the top of its game. Bing is fast becoming far far better than Google Search, and they can pull in crazy user numbers if they wish.
Microsoft has a big bag of cash, and could pay Firefox, Apple, and a dozen other competitors when they deem strike time is a go.
Google, comparatively, seems in a decadent daze of debilitating dormant dreams, damned to derlictness.
Doesn't help that Google search is increasingly useless, across the board. Text Search sucks (yeah sure, you only got 8 results, I don't believe you stop lying to me Google), Image Search sucks, Maps Search even sucks now it refuses to find a direct ask I know is on the street I'm focused on and instead zoom out to highlight something 10 miles+ away on the other side of the city. YouTube search sucks shows like 3 results then starts showing things you've already watched or are just 8+ million views unrelated viral or gross out videos.
In retrospect it'll be obvious why Google wont be relevant at all in 15 years.
"Google Is Full of AI Dogshit" or "Search Engines Are Full of AI Dogshit" or even better "Search Engine Algorithms Expose All of Internet's recent AI Dogshit"
I have a sneaking suspicion that at some point in the distant future advertising will be banned simply because it's so associated with the tragedy of the information commons.
>Today, an article was published alleging that Sports Illustrated published AI-generated articles. According to our initial investigation, this is not accurate.
The articles in question were product reviews and were licensed content from an external, third-party company, AdVon…
— Sports Illustrated (@SInow)
Oh. So you constructed your site to make this distinction unclear, and then paid for an AI generated article to be put on your site.
I guess that makes it alright then. Nothing needs to change.
> The people who hold the purse strings for Sports Illustrated are more interested in gaming Google search results and the resultant ad revenue from that practice than actually serving their readers.
Then by running an ad blocker I'm doing my part to make the world better.
Depends. Do they get paid for adblocker readers? Do they get paid based on the percentage of their readers that use adblock? Do they get paid more if they form over some of this information (and therefore less if it looks bad so they don't)
The future is bright people. Technology will save us and usher in paradise!
But personally, I'm pretty happy about the internet filling up with dogshit, if only because I think it's likely that it will foil the plans of SV, singularitarians, etc.
Kagi is good at filtering this stuff out. Recently I searched for some information about mobile phones, and while the Google results were contradictory and AI-generated, the Kagi results allowed me to find what I needed. I noticed something seemed off about the Google results (they were bad-looking websites with strange phrasings) which was confirmed when I discovered the different results contradicted each other.
I've had a draft blog post on my todo list for awhile titled "Chauffeur Knowledge and the Impending AI Crack-Up." While humorous today, the eggs melting and sinus inflammation are perfect examples of where I think AI will cause the most destruction. I'm of the mind that any societal collapse generated by AI will be one of stupidity and over-reliance on AI's "intelligence" as infallible, not some Terminator scenario or elimination of too many jobs.
Eventually, people will not know the difference between an AI "hallucination" and factual information. This becomes a serious problem when you consider that there's already an existing cohort of people who blindly trust Google over experienced humans. Case in point: I just saw a video [1] this morning where an air traffic controller argues with an experienced pilot about their approach method, citing "I Googled it" as their authoritative information source.
To me the most interesting comment I heard on the "practical ai" podcast[1]. One of the guests pointed out that 2021 might be the last year we can have AI train off of, because the internet is about to explode in AI generated content which means it will just become a self-reinforcing loop.
We're really fast forwarding to the end of Accelerando where the earth gets dissembled to make a dyson sphere of computronium that ends up being filled with sentient pyramid schemes.
Wondering whether 10-12 years from now the internet becomes a hostile, corpo-inhibited medium, while the personal LLMs are what we take knowledge from. In reality some (of us) already do so, the only thing is that we don't really know what the LLMs were fed to begin with, but is assumed to be a reality-based information.
Your personal Search Engine is your personal model which you evolve to your needs. Safely including with your history, your chats, your family's memory. Internet then exists only as technical means to reach big corpo nets (heavily guarded against anyone extracting info from them) and as a ring of guerilla websites or fediverses, which are open by nature and resemble the old internet.
It is really to my opinion that this happens earlier than 2036.
I would not mind a LLM trained on say Wikipedia and its sources. That’s still a lot of bullshit, but orders of magnitude less than the open Web at large. The possibility to whitelist some websites would be useful. Training from the huge scientific articles databases would be ace.
What would be a strong selling point would be if the LLM is able to reliably cite where the information comes from. MS’s copilot currently attempts to do this, but often cites low-quality websites with questionable reliability.
Actually if AIs become good at reasoning, composition, logic better than humans, AI content will be sought-after and more trusted than human content for the sheer ability of AIs to better at those tasks than humans. Of course there will be need for human bubbles for "authentic" experiences but for most of technical content AI will be preferred for its ability to tailor it (for brevity, style, recipient's technical know-how, language, context).
How will AI discover new facts in the real world? What is the AI equivalent of boots-on-the-ground journalism? Of learning by doing?
A lot of reality happens in meatspace, and someone eventually needs to put that information into computers. AI can only rehash what's already there.
My question is who will do that work if you rob them of any satisfaction for doing it? A man running a widget review blogs could get reader mail, online friendships, advertising partnerships, a sense of making his mark. If everything a person creates just feeds an AI, what do they get in return?
Funsearch could be one way out for theoretical work. For robotics, we can have better simulated environments to learn, so no need for meatspace. Run experiments in simulated envs.
Tongue in cheek: is public key cryptography the solution to verifying content authorship on the internet? Should all major OSes establish and bundle proof of authenticity for every phrase and image created in them using an input device attached to them?
Authenticity is so much more than "I am using a non-public number to mathematically sign this data", that's the whole problem. It's the same as crypto people not understanding that pretty much zero theft/fraud/crime happens through a simple man in the middle attack, so building a terrible machine that can ONLY prevent that method is mediocre at anything else.
Gram and Gramp are going to get their keys phished. An LLM screed signed by Gramps stolen key isn't authentic even though it 100% mathematically is. Authenticity is about the mindset and intentions of the creator.
yeah it doesn't seem much of a hurdle to take the output of an LLM and feed it to a device that mashes the buttons on a keyboard and then presses enter.
I feel like we brought this on ourselves when we started autocompleting replies to questions like "How are you?" with, "Fine, thank you." instead of thinking about it and giving honest answers.
I may have clicked one ironically one time, for my own amusement, then followed it up with an actual answer because I knew my coworker wouldn't get the joke.
I just did a demo this week for someone who asked how well GPT4 would write forum replies in their voice. The results were startling to be honest. The only way I would (maybe) tell the comments were AI is that they were slightly better quality replies than the author themselves would have written.
I wouldn't go forward with it because it felt unethical, but it was a really fascinating thought experiment.
> The once ubiquitous phrase “let me Google that for you” is now meaningless. You are as likely to return incorrect information as you are complete fabrications
To me this came when I realised that SEO this and SEO that SEO here and SEO there was as innate as googling for answers. And is still incomprehensible to me how people can argue with serious face for actively and forcefully distorting the search results that they so eagerly want to build their goodwill on top of.
Machines only taking over the killing of the internet search with incredible efficiency.
My favorite example is if I search Google for "tide me over vs tie me over" it comes up with "Tie me over is correct". Not only is this wrong, but if you click the link, the source itself says it is wrong! The source is literally on the importance of fact checking, and Google is pulling a quote that the article uses as an example of an incorrect fact.
Google:
WARNING: It is a common misconception that the phrase “tie me over” is actually pronounced “tide me over.” Some even go so far as to say the “tide” refers to the ebb and flow of hunger, but this is not the case. Rest assured “tie me over” is correct.
Interestingly enough, I searched for the same query myself (signed out) and it came up with
> “Tied over” is a misspelling of “tide over.” Tied means to attach, fasten, or bind something, while a tide is the rising and falling of the sea that takes place twice a day in relation to the pull of the moon's gravity.
So, it did come up with a correct answer... but the answer is to a completely different question. Though the "actual source" you used was the second result.
For those advocating for “dead internet” aka “it was all dogshit anyway” – where is the boundary for non-dogshit information? Presumably everything we consume originated from or is adjacent to content from the internet. Even original content & sources are reading the internet themselves.
Perhaps you are internet aristocracy advocating “Google for thee and arXiv.org for me” . Are the folks writing our peer reviewed studies in an air-gapped compound in the amazon?
You can’t put the sewer next to the water supply and expect the two not to mix together.
>You can’t put the sewer next to the water supply and expect the two not to mix together.
You can in digital communication because we have cryptography. That's the reason we can have secure https traffic over a really insecure web. The future is probably that almost everything authentic is going to be signed and everything else we'll just assume comes from a bot.
The SEO that killed the web, the productivity hacking that burns us down, UX to increase sales and subscriptions, the atention harvesting of social networks.
One of the world's leading experts on information security systems, Professor Ross Anderson of Cambridge University, in his article https://arxiv.org/abs/2305.17493v2 is already considering a scenario where most of the Internet will be a neural network-generated hallucination. But neither you, nor me, nor any expert or anyone at all will be able to distinguish between fake and real. As a result, neural networks will degenerate in the process of learning from their own hallucinations of the Internet, and humans will degenerate in the process of applying the degenerated neural networks. This process he called "neural network collapse". It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. This is what can get us out of the personal reality tunnels and personal information bubbles in which we are getting deeper and deeper stuck individually.
This network of real human knowledge provides a way to introduce new, clean, human-generated datasets into LLM training in a validity-conscious manner, and makes it possible to avoid model collapse and reduce unwanted errors in creating new and better generations of generative models. And we have a practical solution to avoid the collapse of large language models — we create a global unbiased decentralized CyberPravda (dot) com platform for disputes, for analyzing the reliability of information and assessing the reputation of its authors, where people are accountable with personal reputation for their knowledge and arguments.
"The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people"
But the examples given are... google search results and articles from big publishers, things made to cast a wide net. Those were never good ways to communicate with other people online imo. If you want to interact with real people look for smaller communities and individual creators.
I have had some experiences with bogus bug reports and PRs, at BeamMP (a mod that adds multiplayer to a car game, ~750k registered players), we have very little external repo activity all things considered. This is due to not advertising it, and due to the shitty license (soon to be AGPL-3.0), but people find it and contribute every now and then.
We have had some AI generated PRs and issues. These are incredibly easy to spot for me, I'll get into how in a second, but its been a matter of taking them half-seriously just in case. When asked about their decisions, they either post an obvious AI answer, or garbage, abd you can reject the contribution.
There are two parts to this:
1. why is this a problem? cant you just take it at face value?
Well, the point at which I'll accept an AI generated PR is when I cannot tell. This has always been the goalpost for me -- if the contribution is good and was made by the person contributing, it gets approved. If not, then not. AI counts as "didnt write it yourself", because I want to be able to ask you later whx something related broke, when your chatgpt chat is already closed. You need to understand the code you post.
2. How do you spot them?
Usually what these AI tools, and people using them, do, is change unrelated code. Rewording a comment in the same or even different file, or simply reordering stuff for no reason. Another obvious sign is the solution itself. If it doesnt compile, or doesn't even look like the right language (e.g. C when the codebase is C++), thats a sign as well. Use of libraries that arent there, or even a PR body filled with text.
Real people are lazy. Real people tend to get the job done and gtfo, and not blabber on about whatever in their PR. Real people like me also wont merge anything that wastes my time, like AI garbage.
There’s a grain of truth here, but I object to the hyperbolic headline as it is unsupported by the content’s described data points which are exceptional rather than representative of typical content.
“The internet is broken in a fundamental way”? No, there are some growing pains at the margins. Feel free to react strongly, but these are not the same.
One thing I find increasingly frustrating is reading an article where bits of it are plausible but the whole thing feels "off" and not being sure why. It's too verbose or vague. Was it AI generated? Someone working in a content farm? A non-specialist hired to write posts for SEO? A mixture of all of the above?
However, I don't think we've reached the end of the world yet, because if you just have the patience to learn what the reliable sources are, and the wherewithal to pay for a few of them, then you can ignore most of the junk, just as you can mostly ignore social media.
Also, y'know, books. Physical books. Public and university libraries. Yes some books age poorly or don't deserve to be published, but consider how many more checks most books go through before they end up on a shelf, including curation by librarians.
Every time I read something that's aggressively branded as anti-AI, I am compelled to read the final paragraph first just to make sure I'm not about to get baited and switched by someone who believes they're far more clever than they have any right to think.
>The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.
LLMs don't talk to each other - humans use LLMs to talk to other humans
I see the point about AI's impact on internet content, but I think it's important to view it in a broader context. AI is just a tool, and its effectiveness is shaped by how users use it. It's not just AI, but the entire internet that's always had a mix of good and bad content. AI actually has the potential to improve access to information and level the field. It's more about refining search algorithms and enhancing oversight rather than dismissing the technology altogether. The internet has always required users and creators to sift through and elevate quality content. AI is just a new aspect of this
Generally, I'm an optimistic about current approaches towards AI. I'm using ChatGPT daily for my work. I'm seeing the BS generated from it all the time but it provides many great ideas from which I create something useful (both writing and programming).
The sinus example scares me. It's obviously wrong. But what about all the subtle errors that will be generated?
Humanity's knowledge is bootstrapped from BS. But most of it is long forgotten. On the internet, the BS ends up in the training data of the next iteration.
I guess curation of quality text corpora will be an important thing in the next years.
I like that idea. I guess such an amplification effect can lead entire civilizations astray. But still in the long run BS will be ruled out. Here, I disagree with you in that "truth" is also a matter what works and what not. Simply, BS doesn't work. It makes worse models in the heads of people (and possible artificial agents). Worse models will lead to less effective decisions.
You espouse this like a rule but it just isn't so. Nazi Germany believed themselves to be the righteous rulers of the earth, descended from Ubermen that never existed, commanding more powerful armies than anything, and were allowed to live that delusion right up until the democratic world forced them to reckon with reality. You can avoid that reckoning for arbitrary lengths of time.
It's no different to "the market will remain irrational longer than you can stay solvent". It's the same rule. Reality can be avoided for as long as you want if you don't have infinite ambition.
I've been saving longreads and news articles to PDF for a couple of years now, for pretty much this exact reason. Google's search results are horrifically bad now. Instead of ranking original articles at the top, they will rank 10 blogspam sites that 'summarize' the article, often without attribution to the original link.
With AI, the cost of blogspam becomes so low, and 'hallucinations' driving trust trust even lower, that it makes sense to slowly withdraw from the clearweb entirely and stick to a list of official sources and 5 or so trusted 3d party sources.
LLMs are in this sense the (economically) perfect business to be in: a near perfect solution to a catastrophic problem created by their own existence.
Modern top-flight LLMs are hitting to within 10 or 20% of Google at the very start.
Google in like 2001 or whatever was fucking magic: vast official data had been put on the “information superhighway”, the non-POSIX-spec stuff was people cool enough to host a website, and spam was, like on AOL or something.
You just got a white page, a box, a button, and the answer.
I suspect that for people under 35 or 40, LLMs feel kinda like Google did.
I was recently looking up passion fruit seeds to see if they were edible. One of the first results had various wrong warnings of adverse health effects.
But the best part was it said to be careful to avoid the seeds when you have a fever because it could cause explosions in your mouth.
It’s gotta be AI right? Who else could be as dumb as dogshit.
> Lastly, be careful that you don’t eat the seeds if you have had a fever, because the heat from the fever could cause the seeds to explode in your mouth.
What exactly could have been the true translation then?
I fully believe that the future of the internet is going to small invite only enclaves of a few hundred people, because everything else will be unusable because of AI generated trash.
As bad as the Internet is, it's not as bad as Reddit, Tom's Hardware, and Ars Technica. Those forums routinely delete the best posts to protect the egotistical moderators from being proven wrong. I'm talking about you, Beth Mole.
If we had accepted the best internet posts and kicked out the egotistical moderators, and charged $0.01 per post to limit spam, the world would be a better place and we would have a better source from which to train our AIs.
SciFi: I think the Internet will be a thing of the past, for humans. Humans will use chatbots and/or a few UIs while infinite information is created and consumed on Internet by AI.
Probably, at the end, looking at the vast information created, AI will enter in a loop when it is always creating the same information or the same obvious patterns over, and over again. If that happen it will be the prove that AI is not creative as humans.
I doubt that the dogshit is so much worse than it used to be. Unfortunately, I also contributed to this. When I was 13 or 14, I scripted a little tool that allowed me to write one tech news blog post and generate multiple versions of it by simply swapping phrases, adding practically zero value. You'd be surprised (or not) how much of a tech news blog is repetitive wording, mainly to fill yet one more adsense banner after a new paragraph. It was not based on AI, but the core principle was the same as the example in the article.
I published each version on a separate WordPress blog covering roughly the same topic, chose random pictures, set random publish dates close to each other, and signed them all up for Google News. This non-AI dogshit dominated a small tech niche, making a decent amount of money at the time for a 14-year-old. I am pretty sure I was not the only one coming up with that idea at the time.
This was already back in 2008/2009. At the time, I posted about my approach in online forums to ask for coding advice. Through those, I got to meet the creators of two more organizations that used a similar approach for PC gaming news pages.
And this was for the German market only. So I am quite sure that it was more common in the US at an earlier time, as they are usually ahead of us.
I think a big part of the problem is that internet is also full of him generated dog shit. At some point it became necessary to write a couple of pages about your vacation to Tuscany before dropping the promised recipe to avoid being penalized by google.
Now there is now guarantee that the intro filler will ever end or that the eventual recipe will even be something that is desirable or safe to consume.
Being dismissive of AI while simultaneously embracing Google is a very hypocritical position. Google has trashed the internet by changing how results are generated so that your website is only successful by being 1. approved and 2. popular.
AI has brought back some of the fun from 2010~ because you can generate non-pre-approved content for fun and profit and have a big community to share it with.
Lets be honest, the internet has always been full of Dogshit.
Teachers have long warned us about using Wikipedia as a source. Fake news has always existed. Real news has been diluting itself with questionably true clickbait for years. The mass scramble by companies to hastily use AI is just further diluting and tainting the standard information outlets we've used.
Xillenial here and the internet was a utopia when I first explored it.
Everyone you interacted with was similarly curious and interesting. All the content was some flavor of passion project.
I’ve recently been getting into ham radio because it has a similar feel to me as the early internet. You need a license and an outlay of equipment and everyone else is a hobbyist also. It’s not an app anyone can use with their thumbs.
I think we’ll see a backlash at some point where people abandon these app-driven spaces and we go back to grass roots communities.
> I think we’ll see a backlash at some point where people abandon these app-driven spaces and we go back to grass roots communities.
To some extent is has already happened. I've seen some upswing in niche forum use but nowhere close to the heyday. Sadly, most people seem to move into private chat channels.
Its funny, we may have to go back to buying encyclopedias in physical stores in order to prevent this terrible future with made up information and no way to know what is a hallucination or an actual article.
Opinion pieces "against AI" are getting just as boring and unimaginative as AI-generated content. I feel like all the arguments have been made at this point.
There's too much value in undetectable AI for an AI detector to last long. Any technology behind a tool that can accurately detect it will used against itself: keep on trying until it can't detect its own output.
Honestly I don't see Google having any killer feature left. The next killer feature is AI to do search, which MS is doing, but Google is failling at so hard they had to fiddle with their own demo.
Why do a web search to hope to get answers to my question, when I can have an AI write me a custom article that actually answers my question?
Wow, that Jake Ward guy linked in the middle is an absolute douchenozzle. His entire business model seems to be generating low-quality AI spam to help websites steal traffic from their competitors. The internet would be a better place if people like him didn't exist.
I think most people misunderstand that the search engines are not content.
Internet was always full of spam. Google was able to work around that. Since for some years Internet became dead because Google is not able to keep up.
Internet was dogshit. Now it is just AI dogshit. Google was quite good, but now it is not.
In a few years we will not use Google search, we will ask chatbots for answers. There will be no "link archive", or nobody will be interested at all in looking at them.
Chatbots will be worshiped. Why going to BBC, if chatbot will always 'know better'.
Google will not invest into Google search, as this "program" will die. Killed of.
Journalists will remain. BBC will continue writing stories, which will be created only to feed chat bot monsters.
That's it. Google is loosing the ground here. It is no longer "the gateway to the Internet." There has always been dog shit, it's just there will now be much more dog shit than ever. Indexing dog shit is not working anymore. There is still Internet out there with useful information, it's just you cannot access it by searching. Are we back to the era of curated sources, encyclopedias and web rings?
I'm already back to it, by following the blogs I like and by bookmarking every interesting sources I come across, as it's unlikely to get back to it via googling. I also more diligent with taking notes, and building a personal knowledge base due to the rate of information disappearing, especially recommendations.
> Are we back to the era of curated sources, encyclopedias and web rings?
maybe even further. I subscribed to the physical, printed and delivered, Wall St. journal a few weeks ago, i really really like it. It has a first page and a last page, i can read it and be done. There's no infinite scroll. I also am subscribed (by chance more or less) to a physical magazine that arrives monthly. I really enjoy it too, the magazine isn't an answer to a question I asked so I always run across something unexpected/new there. Also, like the newspaper, it has a last page and there's no engagement bait because it's Read Only.
This has already become self-evident when trying to find images on google. I have seen whole pages of results which were AI-generated. At every level it becomes so hard to understand the motivations of my fellow man in regards to these problems, it quickly invites almost conspiratorial thinking. How could anyone be so stupid, how is anyone benefitting, from the obvious and apocalyptic poisoning of the well. Especially when in the case of AI-imagery, if I really wanted the amalgamized paste of generative output I could just make it myself. Some part of me actually finds it offensive, that someone would have the gall to waste storage on this easily replicable digital refuse.
Google Image search has been hopeless for a long time now, you'll get better results typically on Yandex or Bing to the point I won't use Google for images.
This article has some good points but suffers from rosy retrospection where it seems to think the Internet wasn't wildly full of misinformation before LLMs and AI articles. The sad fact is LLM's now are often more articulate and more accurate than an unfortunately large percentage of actual people posting on the Internet. Google has always had inaccuracies and is constantly tweaking their search algorithm and ranking system. Google giving you inaccurate results on a couple edge cases is not proof positive Internet is full of AI "Dogshit". Issues with Google is a Google problem not an AI Internet content problem.
Going to? It already has. I get about 50 articles for software fixes that have a lot of hot garbage in them before getting to the point. For example I had a discord issue and instead of laying out a fix, the “article” apparently needed a multi paragraph ChatGPT explanation of what discord is before even attempting to show remedies.
People like to make fun of the "2000 word backstory before actually getting to the recipe" about cooking sites, but now I'm finding it with everything. Tried to find news about Baldurs Gate cross-play, and stumbled my way through a giant incoherent mess of an article that had nothing to do with the title.
The funny thing about recipe spam is that now the best cooking information is youtube, even though objectively it seems like a video would be a bad format for a recipe, at least with youtube you know up-front how long the video is going to be, and the ingredients are generally in the description.
It feels like there's a place for an semi-intelligent browser that consolidates and filters all this crap content to get to the point, wikipedia style. Includes images and you read it like a page, not chat interface. "WebGPT", etc.
Not really what I meant, that still has a search-engine results page vibe. I meant when you search you get a singular, readable page back about whatever topic. It basically hides and filters the backing content, expect for maybe reference links at bottom.
> In the future I fear that people will have no other choice but to ask people for information from the Internet, because right now it’s all full of AI dogshit.
Ha ha ha! That future is now, the value of the web has declined for me dramatically. I hate looking for information on the web now. First it was troll and content farms, now they have a new sibling to help them enshitify the web further in AI. Apart from a few places I trust I've turned away from the web.
The worst thing is that I spent more of my time researching now and I seldom get any results.
In 2006 a fellow developer showed me a bot he had coded to write blog posts that just "spun" existing articles it scraped from other sites. He was making a fortune from fake blogs full of thousands of reworked articles, so this dogshit has been steaming for a long time.
I use GPT4 and Claude to write drafts of blog posts and wiki articles, often starting by having them summarize 250 page documents. The output is very high quality, adds a lot of value (because many of the original documents cannot be shared) and completes a task that no human could easily do (read thousands of long documents in hours).
So, I can't write AI off -- it is a tool that can be used for good or bad.
I hate to keep beating on this drum but is _anyone_ working on a search engine yet that filters out anything with ads and/or a tracker?
The root cause of the problem is content monetization so please let's just stop indexing anything that tries to monetize content and/or track readers.
And I'll say this for all the overly-capitalistic folk on here: the amount of money I'm willing to pay for access to a search engine like this has been going up every month since 2002 -- something that the other paid search engines clearly fail to understand.
That would solve some problems but not all; spam sites would be filtered out but also sites with good quality information that monetize with ads would also be filtered out. There is no silver bullet(one way solves all) for SEO/ad/affiliate spam.
Regarding the last sentence: The problem is that capitalism knows no limits. Sure, it would be nice to pay a monthly subscription for genuinely good and desirable content/search results...
But what if the CEO of the service provider needs another $5m bonus? What if the stock needs to go up so that the shareholder gamblers can get more dividend paid? What if all of a sudden the service gets bought out?
The truth is that what you are seeking is more likely to come from someone who is just passionate about it with not that much motivation based on profit. That doesn't mean that this entity or person can't be financially supported but it gets problematic when profit is the _main_ incentive.
For a good example of an interesting search engine built by a single guy, see Marginalia: https://search.marginalia.nu/
It's become more and more full of human bs as well, a la Taboola "articles", misinformation/fake videos, etc.
As always, the problem is not the tool but the people using the tool. Unfortunately we can't stop them whether they write crap themselves or have AI do it but the latter certainly creates more pollution than ever before.
As long as someone clicks on a "you wouldn't believe this secret Android feature that makes you money!" then this will continue.
Doesn't really matter tho, people have been curating their view of the web via platforms and individual creators/sources for ages now.
To prevent the impending total enshittification of the Internet we'd probably need e.g. something like cryptographically signing human-generated content with some way of proving humanness.
Although nothing like this (or other solutions) is probably gonna materialize. The world, and especially the Internet, is driven by short-term commercial interests, and not-enshittifying things doesn't seem to make enough return on capital.
I live in Norway, and I've noticed that identification systems like BankID have become ubiquitous - even necessary - if you want to do anything serious on the web.
>To prevent the impending total enshittification of the Internet we'd probably need e.g. something like cryptographically signing human-generated content with some way of proving humanness.
Perhaps cryptography can be useful but already irl you can buy "hand-made" products but ofc you can not fully trust that something was really hand-made or that web content was made really by human but at the end of the day it all comes down to trust. Web sites and web publishers need to build up trust in order to gain traction among web users and web customers.
That there has always been terribly-rendered dreck on the Internet. Badly-written copy by semi-literate self-appointed "authorities." Kitbashed plagiarism that only half-understood its sources. Conspiracy theories galore from garbage-minded gasbags. Totally muddle-headed hot takes. But it was manually generated. Or semi-automatic at best (like bad machine translations of foreign language sources).
The issue with AI is the fully-automatic engines behind the process. It's not that bad web pages are unique in history. What is unique is the sheer volume and velocity of drek that is being and will be produced.
Combatting this will need technology to, in effect, provide a mark of authenticity. "This was created by a human," in effect, and, more specifically, "By this [human|set of humans]." Contrariwise, "This is a product of system X."
Of course, people will lie and claim their AIs didn't ghostwrite their content or create those images. So against authorship and identities there will be an increasing need to rank authority/authenticity/veracity.
Not all AI-generated content will be horrible. And not all human-generated content will be trustworthy. There will need to be a way to mark reliability based on community-oriented standards. For AI-generated content (and human-generated content too) there will need to be ways to cite sources.
This will all become highly politicized, commercialized and gamed.
Yet if we don't work on such standards, we will continue to be beholden to hidden search engine algorithms of trustworthiness. Why did something get to the answerbox or SERP1? Was it simply through keyword packing? Or did whatever author behind this — human or machine — actually know (or at least seem to know) what the hell they are talking about?
There will need to be Internet-wide ways to create community-oriented ratings. Thumbs up and down.
Maybe this is a popular page with the public. But actual professional astronomers see this as factually misleading.
Maybe this page is popular with political faction X. But others point out it is rife with factually incorrect conspiracy theories.
Just like there are now "community notes" on Twitter (X), the whole Internet needs the equivalent way to ascribe qualitative judgments on any arbitrary page.
>Not all AI-generated content will be horrible. And not all human-generated content will be trustworthy. There will need to be a way to mark reliability based on community-oriented standards. For AI-generated content (and human-generated content too) there will need to be ways to cite sources.
Exactly; if AI can for example summarize a research paper or a book better than human then it is hell of a more useful than any person could possibly be and that's the point of AI(that it possibly can exhibit super human skills).
Web search engine is a massive data mining and data analytics software solution powered by graph theory and data science algorithms for information retrieval. No search engine can know what information is per se better and no search engine can know what you know and know what you want but it can use graph theory and data science to approximate and show you what it thinks that you want.
Sounds more like google search has gone to dog shit. But a lot of the internet has as well, but it's been happening before AI has become mainstream, I blame walled gardens like linkedin and facebook, as well as signup and paywall blocks.
It is. Ranging from sexualised animals (aka furies) to sexualised children cartoons (aka anime) and all the garbage text and trash google search results, the internet is starting to look more like a madhouse rather than a useful tool. Why are companies catering to the needs of such people instead of sane folks? Is it because the latter gave up on using it a long time ago - ie during the conspiracy theory craze of 19-20 - or is there some sort of agenda we are unaware of? Literarily all social networks are filled with weirdos spamming ai left and right.
> The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content that is manipulated by algorithmic curation, marginalizing organic human activity. Proponents of the theory believe these bots are created intentionally to help manipulate algorithms and boost search results in order to ultimately manipulate consumers
World Wide Web search results outside of walled garden platforms like youtube or facebook (and, really, even within those platforms) have run into a kind of kessler syndrome where the amount of garbage on the web has made it increasingly difficult to get anything done on it. Eventually we may find that we can't trust anything we read on the internet anymore, and abandon the world wide web entirely (opting instead to spend all of our time on proprietary walled garden systems we do trust). Perhaps we will be required to use some form of identification (you driver's license or SSN, as in South Korea) to access such services (unless that can be easily defeated by an AI as well).
I was surprised that, when ChatGPT came out, the immediate concern people had was skynet-tier apocalypse fantasy, in which roving bands of machines walk the earth, searching for humans to exterminate like in BLAME! It shows that the people who work on these AI systems understand their implications about as much as Richard Hammond understood the power of genetic engineering in Jurassic Park. Their entire worldview is painted by Science Fiction. Their understanding of the human race lacks any kind of grounding in reality. They believed that what they had released was the first baby step which could, potentially, result in the downfall of the human race in several decades if we're not careful, but they fell victim to the same "move fast and break things" motto that Facebook did. They didn't consider the immediate destructive power their technology would release because they consider "disruption" to be a fundamentally good thing. Their only concern was avoiding hypothetical possibilities posited by their favorite science fiction authors and not the real eventualities predicted by economists.
I found it much more likely that AI will disrupt systems we rely on, wielded by humans for short term profit motive: ecommerce, news, media, and the labor economy. Malicious actors promoting scam products or yellow journalism, middle managers trying to cut labor cost by replacing easily automated jobs like copywriting with LLMs. I think we are more likely to shoot ourselves in the head with AI in pursuit of a single quarter's earnings report than we are to see Silicon based lifeforms wander the planet looking for carbon based matter to consume
I never really thought it was stupid, more that it was hyperbole - but very mild hyperbole with a solid basis in reality. The endless tide of email spam would be a good example where the theory rings true.
I think the stupid part is the idea that the government was intentionally ruining the internet. I also think that around 2016 and 2017, bot spam was pretty easy for anybody with a frontal lobe to detect. The problems we saw in the 2016 election with fake news was markov-chain generated articles with explosive headlines about Hillary Clinton: the headline sounded real, the domain name sounded like a real news publication, and skimming the site it looked like a real wordpress site. However, if you read any of the content for more than 10 seconds you would realize that it was barely English and was generated by a very rudimentary bot that any CS freshman could cobble together. Their only advantage was relying on the fact that most people on the internet only read headlines. To believe, at that point, that the majority of interactions on the internet were these kind of stone age systems, was pure paranoia.
Today the situation is different. LLMs are capable of making content which is only verifiably AI generated with a few tell-tale signs as well as good old fashioned fact-checking. I dread this election year in the US because it will be so so so much easier for Russia or China to spread even more convincing misinformation automatically. They could create armies of bots which hold intense arguments with each other and have every reply seem logically sound.
Click on any story, and the comment section is 80% bots. Doesn't mater if it's serious users like NYT, CNN, or whatever.
I've also noticed a huge uptick in spammy pages and groups that pump out AI generated pictures and stories, with nothing but bots upvoting and commenting the content.
These days I only use FB for private hobby groups, and messenger to talk with friends and family.
Yeah I used to be a huge facebook addict when it was just me and my friends in high school (and before there were any other social media platforms). I posted comments, pictures, quotes that 14 year old me thought were deep and edgy. When it became a media or news aggregator, I lost all interest in it.
Today I just use it for FB messenger to keep in contact with my parents. They use the social features and the news media parts as well. Some of my friends from high school post pictures now that they're having kids. I noticed that, in our early 20s, people moved from Facebook to Instagram, but now in our late 20s and early 30s, people post more family oriented content on facebook
> To help understand the next step we can think of this process as follows: one replicator (genes) built vehicles (plants and animals) for its own propagation. One of these then discovered a new way of copying and diverted much of its resources to doing this instead, creating a new replicator (memes) which then led to new replicating machinery (big-brained humans). Now we can ask whether the same thing could happen again and — aha — we can see that it can, and is.
> [...]
> Computers handle vast quantities of information with extraordinarily high-fidelity copying and storage. Most variation and selection is still done by human beings, with their biologically evolved desires for stimulation, amusement, communication, sex and food. But this is changing. Already there are examples of computer programs recombining old texts to create new essays or poems, translating texts to create new versions, and selecting between vast quantities of text, images and data. Above all there are search engines. Each request to Google, AltaVista or Yahoo! elicits a new set of pages — a new combination of items selected by that search engine according to its own clever algorithms and depending on myriad previous searches and link structures.
> The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.
I mean, it always has been on its lowest layers. The "is this incoming data from the Internet originated by a human being" problem hasn't really been solved, and is probably not truly solveable. It probably wasn't a good idea to train most of the world to use a single private ad-driven service for answers to everything in the first place.
It doesn’t need to be solved before as most internet users could easily identify algorithmically generated stuff and stop spreading it, but not so anymore.
Maybe one day we’ll all be able to “smell” ai generate text easily, but we’re not there yet
We could also abandon the internet as a place to interact with people and go back to meat space. The internet can be a place exclusively for doing work such as filing taxes and paying bills, etc..
instead of abandoning the internet maybe just abandon www. The best technology enabled human interaction i have is a group imessage chat with some old college buddies. My teen son has done the same with his friend group at his middleschool.
Talking with people in real life is stressful (for someone like me with social anxiety.)
Rarely do I walk away from an encounter with a random person in “meat space” feeling any better than I did when I walked up to them. And even when I do, I replay the interaction over and over again until I’m sure I didn’t embarrass myself.
Plus, where I live, there is no tech community. It’s just not popular here, but it’s what I’m interested in. (I even tried starting a meet up, but got 0 turnout, then the local startup accelerator closed in 2020)
Ultimately cryptography, up to and including the use of decentralized ledgers, will help mitigate some of this, by carving out ground facts that can be mutually agreed upon without risk of AI interference. The people who hate AI the most often hate crypto too, so it will be interesting to see how they resolve this.
Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.
curl's struggle with bogus AI-generated bug reports is a good example of the problems this causes: https://news.ycombinator.com/item?id=38845878
This is only the beginning, it will get much worse. At some point it may become impossible to separate the wheat from the chaff.