Hacker News new | past | comments | ask | show | jobs | submit login

This has been a problem for years. It's completely overrun a lot of terms' search results.

Honestly, given that ChatGPT produces higher quality output, the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.




> Honestly, given that ChatGPT produces higher quality output, the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.

I think that's questionable. Gibberish is easily skipped over, bargain basement ESL writing with no familiarity with the product sticks out a mile, but with GPT you get a kind of reverse "uncanny valley" effect where it's written in polished English and says just enough in the opening paragraphs that sounds informed to lead you to read half the piece before you realise it's machine generated bullshit. So you waste a lot more time on something that has no more insight (and cost even less to produce)


The "demand" for bottom-of-the-barrel "10 things you didn't know about X" or "a lazy tutorial on how to patch a roof" spam is probably close to being met already.

So if demand stays the same, but the cost to produce it gets even lower, the ecosystem can support more players each pumping out their own version of all the spam sites. Stupid fake number example: Instead of spending $1000 a month to push out 100 articles, you spend $200 a month to push out 100 articles, and now only need to bring in $300 instead of $1100. So now there's another $800 worth of views that others could capture without putting you in the red, while letting them be in the green. So we may just end up with even more blogspam pushing the unique sources down the results list in Google. It's not likely to be meaningfully better based on my experience with "naive" prompting (the people creating these articles aren't going to be tuning their prompts to get good stuff per topic, they'll just get the quick and dirty stuff).

But I don't think that's where it's gonna get fun.

You know where there could be a lot of demand for higher-quality, harder-to-spot spam? Corporate product marketing and politics. Astroturfing in far more writing styles for far less money. Individuals running more fake accounts on Reddit, HN, etc, pushing more "unique"-but-repetitive copies of their views, while sprinkling in a fair bit of on topic info on a wider variety of other topics.


there is no demand. It's just seo spam, and users like me just ctrl+click most of the first page of search results hoping that one of them will be usable. Over time, I'm less and less able to find a single relevant result though. The seo spam is incredible now


it's why i basically only search reddit for anything, which has its own problems of inauthentic posting/advertising/spam


They said gpt would increase productivity, but I don't think they meant the productivity of spammers!


I don't think they cared, but they had to have known. If you drop the cost of producing mediocre-to-bad text to ~zero, I think it's pretty obvious that the people with the lowest standards will be the most eager adopters.


> This has been a problem for years.

Hah, yep. I used to build tools to detect it at an adtech startup. A common approach at the time was to take someone else's text and use naive thesaurus replacement so that it would be just barely comprehensible but statistically look like english. So “You can catch the mouse” might become “you jar trap the rodent”. Glad to see how far technology has progressed! /s


And, unfortunately, it's still an ongoing problem. Some of it even ends up getting published by IEEE, e.g. [1].

There was some research a few years ago [2] into just how widespread this issue was in scientific publishing. The situation has likely only gotten worse with the introduction of higher-quality text generation LLMs.

[1]: https://ieeexplore.ieee.org/document/8597261

[2]: https://arxiv.org/abs/2107.06751


I've noticed this sometimes where it replaces proper nouns with a synonym, e.g. "Bill Gates" becomes "Invoice Gates." Unfortunately that pattern only applies to the most bottom-of-the-barrel SEO spam. I expect ChatGPT output will be more subtle, but if a lazy spammer doesn't obfuscate it enough, there will still be some tells - e.g. the five paragraph essay format with a conclusion beginning with "overall..."


Language models have seen enough of this genre of gibberish already that they can imitate it flawlessly.

https://twitter.com/janellecshane/status/1280915484351754240


Amazon here in Europe is overrun with those kind of wrong translations, input by product sellers that just run everything through google translate. Still amazing that a player as big as Amazon accepts all this content, which is just 1 step above gibberish.


yay the arms race between ROI and computational power required to generate spam vs ROI and computational power required to tell if something is indeed spam, the inescapable cat-and-mouse game between users just trying to find something that isn't garbage while a million hawksters attempt to dupe them into buying their garbage products


We've had spam for a long time. Much of it used to be farmed out to low cost countries where people would summarize or aggregate data and present it in the most aggressively monetized way possible. AI spam will be better. I personally don't mind reading AI generated content. I prefer GPT over stack overflow written by humans for 95% of what I need help with. I also don't think it'll lead to more spam, as it probably wasn't supply constrained anyway. The demand for spam is pretty inelastic, meaning that as the supply shifts, it'll just lower the price and not impact quantity demanded much. It'll be better spam and the advertisers will have to pay less for it. Spammer margins will also likely go down as there is a lot more competition from the tooling.


One of the questions I have is whether models are being trained on the SEO {spam|blogspam|adsense optimized|spun} websites.


Almost certainly. The web crawl data that GPT (and similar) LLMs are trained on is far too large to be entirely curated.


I believe low to medium quality algorithmically-generated content is easily identifiable with large language models. Consequently, spammers may find themselves needing to leverage LLMs to produce text that appears high in quality. This implies that the content generated could either be of substantial quality or well-disguised gibberish.

Therefore, some form of reputation system remains necessary, like those used for scientific publications. I predict websites that provide a trusted reputation system will have a lot to gain in the future. Github stars, upvotes, retweets, citation count, or just good moderation - they will be essential in solving the spam/hallucination problem.


Respectfully, to expect to use AI to solve a problem created by using AI, is wishful thinking.

We haven't even solved the email spam problem, and that started 30 years ago! We have simply accepted that the largest players in the business (MS, Gmail, Yahoo) will decide on our behalf which emails we will actually see. If you want to start your own email service, fine, just keep in mind that the big players might think you're a spammer.


But you can check your Spam folder if you disagree with the classification. Don't you agree the current anti-spam systems are rather good? Very rarely do they make a mistake. I haven't seen a spam in ages.

> to expect to use AI to solve a problem created by using AI, is wishful thinking

Fight fire with fire. Manual verification can't keep up with bots. We need "protective AI" to counter "aggressive AI".


Some spam is just thrown away, not put into your Spam folder, so there's some lost opportunity to disagree (and it's hard to judge whether you would have without seeing the spam).

Also, GMail is no longer good at detecting spam. It consistently puts my phone bill into spam, as well as other repeat emails I've marked as not spam. More importantly, I get bursts of actual, obvious spam in my inbox at least twice a month, possibly more often. Maybe Google has just given up and some other anti-spam system is better.


https://xkcd.com/810/

Mission accomplished?


I’ve though I’ve seen all of the xkcd comics and from time to time stumble again at something like this and marvel at how prophetic his work is.

Totally off topic, but I just think that the producers of shows like black mirror should just hire the guy and he’ll come up with plots for really disturbing things not the mellow obvious crap they try to push now…


Let's not forget we have also Tom Gauld's comics: https://zephr.newscientist.com/article/2377304-tom-gauld-sci...


The real fun will start when ChatGPT starts using its own gibberish for learning. Eventually everything on the internet will be ChatGPT many times over recycled garbage. It will rewrite wikipedia, news, history.


Will search engines turn into searchable gpt directories then?

It does sound like an improvement short term, but long term I wish there was another option.


At least the "123" keyword still works on Google.


> the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.

Maybe even to an extent such that they will be actually useful


The problem is that they will be filled with hallucinations: unlike "human spam" which is based on facts.


Yeah right, because the spammers cared for facts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: