Hacker News new | past | comments | ask | show | jobs | submit login
Junk websites filled w AI-generated text pulling in money from programmatic ads (technologyreview.com)
239 points by bookofjoe on June 28, 2023 | hide | past | favorite | 129 comments



Is this new? Near-gibberish "blogs" that are part of link rings have been around for at least a decade or so. Places used to sell "SEO" services where they'd have hundreds of automated wordpress blogs that would link to each other (to increase their inbound link count which affected Google SERP placement and PageRank) and to the target site / product. It worked quite well. The modern equivalent I see a ton of is when I search for something like "best _____ 2023" looking for reviews, I get a bunch of sites that just list a dozen terrible Amazon products with affiliate links to each. They're not actual reviews, they just re-state the product description but frame it as a review. It's obvious that the person / machine writing these "posts" has zero experience with the product, as they tend to point out irrelevant features and basically just say that every single one is a great choice, when none of the products mentioned are any mainstream known-good brands. Of course, when I just tried a few examples ("hammocks", "work boots", "steam irons", "backpacks") I couldn't find any of these sites, so maybe they've cleaned up lately, but I swear for the past few years it seemed like every search result for products led inevitably to an Amazon affiliate link farm blog.


Don't forget the asinine translation sites with page descriptions like "Have you ever wondered what the Sinhalese word for 'monitor lizard' is[1]? Millions of users have searched for this. In this page you will find..." at which poing the summary cuts off. Somebody generated those in an afternoon with a set of concordances and a Perl script, and they've been sucking traffic ever since.

[1] For the curious, "Thalagoya" for the terrestrial species, "Kabaragoya" for the swimming species


Ha! Somewhat unrelated, but that reminded me of a homeless guy in Portland whose intro line was "do you know how you say dragonfly in Spanish?", to which I responded "libélula" (I knew it because Spanish is my mother tongue). Interesting, he seemed kinda pissed off, as I spoiled his soliciting routine. Maybe he's the human version of those sites!


And the sites that list every possible phone number in a given area code, just so that they'll be a search hit for someone searching for a phone number. Zero value provided there.


those are usually for "is this a spam" vote, so I would argue that there is some value there


I'm not talking about the ones that provide a user-contributable database of "what business owns this number, is it spam"; those are great.

I'm talking about sites that just list all possible phone numbers in a range, for SEO purposes, with no useful information about each number, and then try to sell you access to their invasive databases of personal information.


“You won’t believe how many words the Sinhalese have for monitor lizards!”


"Slow lorises hate this one simple trick!"


"Find hot lonely monitor lizards in your area today!"


I see automatic translation of normal websites a lot too. I'm French and doing some home renovations, and according to google our houses are built in a way that is vastly different from my experience on the ground, and it's really frustrating, because our regulations and customs are different, and I really need to do things the local way.


And if you click through, you'll find that they don't have the Sinhalese word for monitor lizard, but you can contribute it for them to monetize.


> I get a bunch of sites that just list a dozen terrible Amazon products with affiliate links to each. [...] maybe they've cleaned up lately

The faucet turned off in 2020, when Amazon reduced affiliate payouts for many products [1] and cut off third-party affiliate networks [2]. There are still a lot of those sites around, but there's no longer the same level of incentives for them to be created.

[1]: https://www.cnbc.com/2020/04/14/amazon-slashes-commission-ra...

[2]: https://www.adexchanger.com/ecommerce-2/amazon-removes-third...


YouTube removing the dislike count destroyed that site's usefulness for reviews. Now there's no good way to tell the algorithmically created videos from genuine reviews without watching them, which then inflates the generated crap videos higher. Big channels can still do product reviews but reviews from average people are now unfindable under a sea of generated sewage content. Since big channels often are sponsored (and not always disclosed regardless of YouTube policy) and have a very narrow range of products they review, a huge chunk of functionality died the day YouTube decided to protect the feelings of political campaign staffers while tossing an enormous amount of content completely unrelated to US political ideological struggles into the void.


Is this specific to a genre of videos?

Upvotes/Downvotes never told me anything useful. The thumbmnail image, subject line and number of views are usually enough for me to know whether I want to give it a shot or not.


There's an addon called Return YouTube Dislike. I use it on FF and it seems to have put the thumbs-down count back... not sure how they get the numbers but it looks right to me


Ive seen actual gibberish blogs consisting of random Unicode characters (often within a specific rangr(). What are those


This has been a problem for years. It's completely overrun a lot of terms' search results.

Honestly, given that ChatGPT produces higher quality output, the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.


> Honestly, given that ChatGPT produces higher quality output, the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.

I think that's questionable. Gibberish is easily skipped over, bargain basement ESL writing with no familiarity with the product sticks out a mile, but with GPT you get a kind of reverse "uncanny valley" effect where it's written in polished English and says just enough in the opening paragraphs that sounds informed to lead you to read half the piece before you realise it's machine generated bullshit. So you waste a lot more time on something that has no more insight (and cost even less to produce)


The "demand" for bottom-of-the-barrel "10 things you didn't know about X" or "a lazy tutorial on how to patch a roof" spam is probably close to being met already.

So if demand stays the same, but the cost to produce it gets even lower, the ecosystem can support more players each pumping out their own version of all the spam sites. Stupid fake number example: Instead of spending $1000 a month to push out 100 articles, you spend $200 a month to push out 100 articles, and now only need to bring in $300 instead of $1100. So now there's another $800 worth of views that others could capture without putting you in the red, while letting them be in the green. So we may just end up with even more blogspam pushing the unique sources down the results list in Google. It's not likely to be meaningfully better based on my experience with "naive" prompting (the people creating these articles aren't going to be tuning their prompts to get good stuff per topic, they'll just get the quick and dirty stuff).

But I don't think that's where it's gonna get fun.

You know where there could be a lot of demand for higher-quality, harder-to-spot spam? Corporate product marketing and politics. Astroturfing in far more writing styles for far less money. Individuals running more fake accounts on Reddit, HN, etc, pushing more "unique"-but-repetitive copies of their views, while sprinkling in a fair bit of on topic info on a wider variety of other topics.


there is no demand. It's just seo spam, and users like me just ctrl+click most of the first page of search results hoping that one of them will be usable. Over time, I'm less and less able to find a single relevant result though. The seo spam is incredible now


it's why i basically only search reddit for anything, which has its own problems of inauthentic posting/advertising/spam


They said gpt would increase productivity, but I don't think they meant the productivity of spammers!


I don't think they cared, but they had to have known. If you drop the cost of producing mediocre-to-bad text to ~zero, I think it's pretty obvious that the people with the lowest standards will be the most eager adopters.


> This has been a problem for years.

Hah, yep. I used to build tools to detect it at an adtech startup. A common approach at the time was to take someone else's text and use naive thesaurus replacement so that it would be just barely comprehensible but statistically look like english. So “You can catch the mouse” might become “you jar trap the rodent”. Glad to see how far technology has progressed! /s


And, unfortunately, it's still an ongoing problem. Some of it even ends up getting published by IEEE, e.g. [1].

There was some research a few years ago [2] into just how widespread this issue was in scientific publishing. The situation has likely only gotten worse with the introduction of higher-quality text generation LLMs.

[1]: https://ieeexplore.ieee.org/document/8597261

[2]: https://arxiv.org/abs/2107.06751


I've noticed this sometimes where it replaces proper nouns with a synonym, e.g. "Bill Gates" becomes "Invoice Gates." Unfortunately that pattern only applies to the most bottom-of-the-barrel SEO spam. I expect ChatGPT output will be more subtle, but if a lazy spammer doesn't obfuscate it enough, there will still be some tells - e.g. the five paragraph essay format with a conclusion beginning with "overall..."


Language models have seen enough of this genre of gibberish already that they can imitate it flawlessly.

https://twitter.com/janellecshane/status/1280915484351754240


Amazon here in Europe is overrun with those kind of wrong translations, input by product sellers that just run everything through google translate. Still amazing that a player as big as Amazon accepts all this content, which is just 1 step above gibberish.


yay the arms race between ROI and computational power required to generate spam vs ROI and computational power required to tell if something is indeed spam, the inescapable cat-and-mouse game between users just trying to find something that isn't garbage while a million hawksters attempt to dupe them into buying their garbage products


We've had spam for a long time. Much of it used to be farmed out to low cost countries where people would summarize or aggregate data and present it in the most aggressively monetized way possible. AI spam will be better. I personally don't mind reading AI generated content. I prefer GPT over stack overflow written by humans for 95% of what I need help with. I also don't think it'll lead to more spam, as it probably wasn't supply constrained anyway. The demand for spam is pretty inelastic, meaning that as the supply shifts, it'll just lower the price and not impact quantity demanded much. It'll be better spam and the advertisers will have to pay less for it. Spammer margins will also likely go down as there is a lot more competition from the tooling.


One of the questions I have is whether models are being trained on the SEO {spam|blogspam|adsense optimized|spun} websites.


Almost certainly. The web crawl data that GPT (and similar) LLMs are trained on is far too large to be entirely curated.


I believe low to medium quality algorithmically-generated content is easily identifiable with large language models. Consequently, spammers may find themselves needing to leverage LLMs to produce text that appears high in quality. This implies that the content generated could either be of substantial quality or well-disguised gibberish.

Therefore, some form of reputation system remains necessary, like those used for scientific publications. I predict websites that provide a trusted reputation system will have a lot to gain in the future. Github stars, upvotes, retweets, citation count, or just good moderation - they will be essential in solving the spam/hallucination problem.


Respectfully, to expect to use AI to solve a problem created by using AI, is wishful thinking.

We haven't even solved the email spam problem, and that started 30 years ago! We have simply accepted that the largest players in the business (MS, Gmail, Yahoo) will decide on our behalf which emails we will actually see. If you want to start your own email service, fine, just keep in mind that the big players might think you're a spammer.


But you can check your Spam folder if you disagree with the classification. Don't you agree the current anti-spam systems are rather good? Very rarely do they make a mistake. I haven't seen a spam in ages.

> to expect to use AI to solve a problem created by using AI, is wishful thinking

Fight fire with fire. Manual verification can't keep up with bots. We need "protective AI" to counter "aggressive AI".


Some spam is just thrown away, not put into your Spam folder, so there's some lost opportunity to disagree (and it's hard to judge whether you would have without seeing the spam).

Also, GMail is no longer good at detecting spam. It consistently puts my phone bill into spam, as well as other repeat emails I've marked as not spam. More importantly, I get bursts of actual, obvious spam in my inbox at least twice a month, possibly more often. Maybe Google has just given up and some other anti-spam system is better.


https://xkcd.com/810/

Mission accomplished?


I’ve though I’ve seen all of the xkcd comics and from time to time stumble again at something like this and marvel at how prophetic his work is.

Totally off topic, but I just think that the producers of shows like black mirror should just hire the guy and he’ll come up with plots for really disturbing things not the mellow obvious crap they try to push now…


Let's not forget we have also Tom Gauld's comics: https://zephr.newscientist.com/article/2377304-tom-gauld-sci...


The real fun will start when ChatGPT starts using its own gibberish for learning. Eventually everything on the internet will be ChatGPT many times over recycled garbage. It will rewrite wikipedia, news, history.


Will search engines turn into searchable gpt directories then?

It does sound like an improvement short term, but long term I wish there was another option.


At least the "123" keyword still works on Google.


> the next generation of spam blogs will (sadly) probably be an improvement over the sea of crap we have now.

Maybe even to an extent such that they will be actually useful


The problem is that they will be filled with hallucinations: unlike "human spam" which is based on facts.


Yeah right, because the spammers cared for facts.


There’s a bright side to all of this: as the bottleneck to essentially all internet ads, Google can prevent the slow decay of the internet into AI hell by simply de-listing such garbage. And in this case there is a huge winner takes all scenario where if Bing was able to do this, they would quickly own search. It’s a problem that’s been festering for a while, but Bing could end it.


Google provides search that is always just slightly better than it's competitors, the better it is, the lower the revenue, they are incentivised to produce the worst (most profitable) results possible.


The fact that geeksforgeeks is more highly ranked than SO for any coding question I ask proves this to me.

Do programmers in Google not use Google to search questions? Do they not have self respect seeing the bad quality product they’re throwing up on billions of people every day? I can’t imagine working like that.


> Do programmers in Google not use Google to search questions?

Not really. Google has an excellent intranet search engine and mostly relies on in-house tools instead of commercial services (see https://github.com/jhuangtw/xg2xg). As a result its employees don't really need to rely on Stack Overflow, etc.


I have no idea how search rankings work, but GeeksForGeeks is massively popular in Indian tech colleges. This might be a reason for it’s high rank.


> Do they not have self respect seeing the bad quality product they’re throwing up on billions of people every day?

In the society I live in, self-respect doesn't pay the bills. I don't know where you're from.


If you would like to contend that the people at Google couldn't pay their bills if they had kept search more effective, I would like to see some serious numbers there.

Absent that, it sounds like the sort of excuse greedy people have always used to justify profitable harm.


Can't pay your bills without a job... They are constantly A/B testing things and running competitive teams. If profitability is the goal and you aren't the most profitable you can be on the table for the next round of layoffs.


People without a job can't pay their bills. I think it's pretty straightforward.


A Google engineer doesn't need to worry about paying bills unless their family has an incredibly expensive illness, they can switch jobs easily.


It's the oxymoron of their business: a perfect search engine would show you exactly what you need, exactly when you ask it. You would never be available for advertising, because the search engine would already find exactly what you were looking for if it's a product.


> Google can prevent the slow decay of the internet into AI hell by simply de-listing such garbage

Google has caused this problem by rewarding blogspam with high search rankings. Maybe they're trying to fix that, I don't know -- but if they are, they've already failed for at least a decade now.


For whatever reason, Google search is better than Bing, at least for "common'ish" searches. By "common'ish" the search has been done a few times before. While that remains true Bing isn't going to end anything.

I'd lay long odds Google stopped trying to rank pages themselves for those common searches long ago. Instead they look at what results users clicked on for a given search, and they are listed near the top. With a bit of randomness thrown in, so it can learn when new results become preferred. While saying it's a AI learning what to return for a search might be over the top, it's I doubt it's too far from the truth either. But you need a lot of people searching get the data to train the AI. Maybe Bing suffers from not having the users.

I'd also lay long odds that the search results are weighted by the ad revenue they generate for Google. Ie a site gets a boost if it is more likely to generate ad revenue for them if the user goes there. Competitors limit how far that can push that - they still have to be the best search engine. But as their primary goal is to earn money via ads they just need to be slightly better and use the remaining headroom to maximise ad revenue.

This is pure speculation on my part of course, and mostly evidence free. But I'm pretty sure Google engineers are smart enough to identify quite a bit of the SEO spam I see in the top of their search results - yet it remains there. This would be a quandary - if it wasn't in Google's interest for it to remain there because the spammy pages inevitably contain huge numbers of ads, and Google is at the end of the day an ad company that makes money from people clicking on ads.


They kinda do, at least in the sense that they no longer appear to index the entire web, instead cutting off the long tail for just the things they'll actually return results for.


Short of running every single page thru ChatGPT, how do you imagine they'll identify AI driven text vs human text?


This isn't 100% different than the situation that Google addressed initially.

There is information expressed/stored in the Web that is of relevance to the user's current information need. This is in pages, some of which are higher value than others. There is also a constant battle with bad actors trying to promote information or pages beyond its value or possibly relevance.

If interactive Web search is still going to exist at all (not just BS all our work with AI), then this is an opportunity for Google or someone else to step up.

This could mean aggressive new ideas that weren't viable before. For example, consider going full anti-BS. Basically, any whiff of attempting to manipulate search hits, or stealing/republishing content, or using generative AI is considered a strong signal for low quality, punishable by search engine death sentence. Suddenly, legacy genuine original content percolates to the top, and there's once again a market for high-value content creation. This one could be hard to do sufficiently well, and there might be conflict of interests in disrupting the status quo (e.g., some of a search/adtech company's revenue streams and departments/careers might currently depend on delivering low-value search hits), which could make it painful or face internal resistance.

I don't know who has the will, resources, and agility to do this. One existing player that could've done it with startup thinking a couple decades ago seems to have become more a machine and careers-driven for a long time. Another player has spent decades often operating in bad faith towards everyone, and though they've shown many times that they continue to be capable of long-cons, I'd guess they're probably culturally incapable of reasoning holistically about an approach fundamentally about integrity. Most potential upstarts don't have the know-how nor resources, and startup thinking would need to get back out of investment scam mindsets (whether VC-growth, crypto scams, or founders just trying to get their exit) and go back to trying to do something genuinely great for its own sake.


> For example, consider going full anti-BS. Basically, any whiff of attempting to manipulate search hits, or stealing/republishing content, or using generative AI is considered a strong signal for low quality, punishable by search engine death sentence.

We are doing something like this at Kagi search.

While it is difficult to detect each of these on their own, what is easy to detect is the purpose of the web page if it has advertising on it (and most of these pages exist for the purpose of monetizing low quality content). So we downrank ad-filled pages in our results. It is a simple but effective method for filtering out low quality content.


I can attest to that. I’m using Google for work and Kagi for private stuff. There’s quite an overlap in search categories (e.g. tech stuff). With Google, I learnt to scroll past the first three-ish links. On Kagi I had to relearn that the first link is actually the most relevant.


Interesting. Google has a conflict of interest in doing the same.


> some of a search/adtech company's revenue streams and departments/careers might currently depend on delivering low-value search hits

What I've come to realize is that most sites that actually get significant traffic these days that run ads (so, all of them) don't really care too much if companies paying for ad space get a good ROI. There are so many stories of companies launching a campaign on Facebook with something like a $10k budget, not configuring the targeting exactly correctly, and burning through their entire ad budget in a few hours with only a few conversions to show for it. Looking at Twitter recently, the only advertisers they seem to be able to attract look to be overseas companies selling junk products for cheap, usually with some kind of "made you look" aspect like bikinis. On that last point, YouTube Shorts also seem to very heavily weight things that appear as if they're going to show some kind of nudity, which of course they don't, just very clickbait-y looking thumbnails in general that don't match the actual content of the short. Does YT Reels require that the thumbnail be a frame from the actual reel? That always seemed like the best policy to me, but of course people game it by putting one "hero" thumbnail frame in that doesn't actually come from the rest of the video to be able to select that as the thumb.


This exact problem winds me up when it comes to sailing channels on YouTube, I get that most people just want to look at attractive people sailing about in exotic locations but I wish there was a way to tell YouTube that if I'm searching about boats I'd actually much rather watch someone's long-term restoration job on a 1970s project boat or a very experienced person talking about sailing in the places I actually live near. There's more of that sort of content than I could watch in a lifetime but it only takes one accidental click and your weeks of carefully training it for 'this, not that' falls apart like wet cake. Surely from the perspective of someone buying ads (admittedly I block them so it's a moot point in practice) they'd rather hawk high-value sailing equipment at me than low-value generic youtube tat I've never shown any interest in?

I know it's very much not in their interest to develop this since your eyeballs are the product but it would be so great if you could explicitly give the YouTube algorithm heuristics like: 'if the context is sailing exclude videos that heavily feature couples/relationships' or 'if the video sounds like that particular breed of shrill, hyperactive vlogger I'm definitely not interested'.


This is a losing proposition for Google because they already dominate all of internet advertising. The more Google is dependent on a smaller number of high-quality content creators the more those creators can demand a bigger slice of the ad revenue pie. Big publishers even run their own ad platforms to cut out Google.


The problem with getting ban happy and that generated / black hat content doesn’t care. It expects to be banned. They just make more. But higher value, none automated content gets killed by google (Adsense ban, negative seo or just out ranked by crap) and then they give us and stay dead.

These are hard problems to solve. But google at scale efforts have failed.


The fundamental problem here isn't even AI. Its just the new lowest bidder for barely coherent SEO.

I still have my popcorn ready for when the advertising bubble pops. There is no way web ads are worth what companies are paying for them now.


A bubble? Have ad prices gone up recently? I haven't been in the industry for a decade, but at that point there was a loooonggggg trend of declining ad rates since the heydey of the dotcom bubble. Wasn't that what was behind the "pivot to video" too? Nobody thinks simple display ads are worth that much?

Ad spend, impressions, clickthroughs, and conversions are all heavily instrumented and tracked, the idea that there's a two-decades-long blind spot bubble strikes me as extremely farfetched.

Everyone knows there's a lot of fraud, its generally baked into the price at this point, and is why some sorts of ads are far cheaper than others, but (almost)[0] everyone also tracks their bottom line for returns.

[0] brand campaigns are different than direct response here, but that's an even older thing that predates the internet entirely so is unlikely to implode.


> Ad spend, impressions, clickthroughs, and conversions are all heavily instrumented and tracked

Definitely but I’d argue that they’re even more heavily gamed. The information asymmetry between the platform and the ad buyers is just too large. The incentive to misrepresent is basically limitless. Truly tracking effectiveness requires a degree of privacy invasion that even the platforms can no longer get away with.


> there was a loooonggggg trend of declining ad rates <...> Nobody thinks simple display ads are worth that much? > Everyone knows there's a lot of fraud, its generally baked into the price at this point,

Spamming more of cheaper ads for the same price does not count as ads getting better in my book. If anything, to me it indicates the opposite: if ads were getting better, campaigns would be getting smaller, shorter, cheaper.


I remember hearing on a podcast several years ago that there is this underground network of buying traffic where legitimate websites can someone buy traffic through these click farm sites (e.g. top 10 X where you have to click next every time). And they can somehow transfer it to their metrics. The basic underlying idea was that pretty much all online advertising and media companies are sitting on fraudulent stats and all advertisers would probably wake up and the bubble will pop.

I can't find any information about this in particular because I don't know the term to search for. Does anyone if this is true or how to find out more about this underground network?


AI is problematic because the cost of producing convincing bullshit is essentially zero now.


I don't think it's problematic for anyone other than the spammers.

No longer do they need to hire an army of cheap copywriters in some third world country to produce their spam. Anyone can run a spam farm nearly for free, and crowding the market is inevitably going to be driving down the profitability for everyone in the business as clicks and impressions are diluted. As is the trend with AI in general, it's really hard to build an effective moat.


We have optimised such that the "Semantic Web" is no longer necessary.

We have now improved the A.I. to seem almost convincing, and sufficiently degraded the N.I.[1] such that convincing people takes much less effort.

[1] Natural Intelligence


Ah yes, the old, tired "if only the public would critically think about this stuff"


....it's true though? Most people don't care about this kind of stuff


Indeed. It is a very low bidder and does an increasingly decent job.


"There is no way web ads are worth what companies are paying for them now."

Sadly, it seems that they are worth slightly more than what people are paying for them. I don't know who of the 5 billion online are motivated by these ads, but there are enough of them and they have enough money to make the system viable.

I have a friend who runs a local business entirely via Google ads, when he has space capacity he increases his ads spend and his capacity is used, when he has tight capacity he cuts back on his spend. He has a consumer facing business.


Corporate Sales & Marketing budgets would need to get slashed in order for web ads to drop in value.

I don’t see an obvious catalyst for that in the short-term.


It wouldn't take that, just a leak/study of the ROI of online advertising. If the results show that the average ad impression is significantly less than currently assumed, then everything goes to hell in a handbasket.

The status quo is that nobody knows how effective their ads are. The only people with the data necessary to even begin to analyze the effectiveness of ads are the ad-serving companies, who have a strongly vested interest in making them appear as miraculously effective as possible.


The marketing boys surely are hiding some causality mixups to drive their department budgets.

I mean if I Google a brand store the first result is usually and ad for the store and the next the store. If they measure the revenue from adclickers they hugely overestimate the value of the ad since the clickers where going there in the first place.

It is probably a complicit SC between Google and marketing folks. They are pimping their PowerPoints.


I want to add one more thing here that I feel most comments are missing: typo squatting.

For example, the Microsoft IDE for dotnet is Visual Studio. You can get it by googling visual studio or you can go to visualstudio.com which redirects to visualstudio.microsoft.com

Now the typo squatted visual stuio dot com is interesting to me. It shows me a different scam depending on where I come from. Sometimes it is just a page full of ads. Sometimes, it is one of those “your registry has corrupted” kind of full screen scams, and sometimes it is just straight up pron. I’m saying all of this to dissuade you from visiting the website if you are on a potentially vulnerable machine.

The point is, someone could typosquat the New York Times or the Washington Post, or something like that and generate very confident sounding fake news now, right?

Maybe that’s what the tech giants want? They don’t want us typing the url but rather throw our hands up and say “{search engine} take the wheel”, and just type visual studio or nytimes and click the top result?


I think this isn't their goal simply because they already achieved it.

Almost no one in my social circle (except some IT people) types in urls anymore.


While I also want to believe that everyone gets to where they're going via Google, I'm squatting a domain for a really popular site in an esports niche I follow as a little experiment to test the hypothesis... apparently according to Cloudflare it's served 562 Unique Visitors / 2.58k Total Requests in the last 30 days.


Thats an incredibly low number. I think best typesquat i knew was for twitch, the t seems almost silent and the website there was incredibly nasty.


Interesting this article mentions "spammy pop-up ads" while spamming me with pop-up ads.


it's called gonzo journalism where they involve themselves with the topics of their reporting


They claim to be connected to MIT. Maybe loosely!


> NewsGuard has a clever way to identify these junk AI-written websites...For example, one site called CountyLocalNews.com had messages like “Sorry, I cannot fulfill this prompt as it goes against ethical and moral principles … As an AI language model, it is my responsibility to provide factual and trustworthy information.”

The red flag here wasn't the AI admitting it was "an AI language model". The red flag that this site was bullshit was the domain name CountyLocalNews.com

Everyone should be using an ad blocker, especially if going to random questionable websites, but the important thing that should have been hammered into people's heads a long time ago was that you can't trust everything you read online and that giving a few seconds consideration to your source is important.

We wouldn't have worry so much about things like AI spitting out fake news or deepfakes if people didn't treat @randomdude420 as a reliable source of information that should be taken seriously without question. @randomdude420, and I'm not picking on them if that's a real person, is just a random person online like me and should be treated as such. If something is important (and if it makes you angry or afraid it must be) verify it as best as you can.

Because Google has become trash we can't just click on the first few links that come back in search results and assume that we're getting accurate or quality answers. Since Google doesn't want to fix this, and nobody else is stepping up, people just have to be smarter.

It just sucks that it's getting harder and harder to find legitimate results at all since they're being drowned out by garbage.


I chuckled at:

> NewsGuard has a clever way to identify these junk AI-written websites...For example, one site called CountyLocalNews.com had messages like “Sorry, I cannot fulfill this prompt as it goes against ethical and moral principles …

`if str_contains(body, "Sorry, I cannot")` thanks NewsGuard, you've solved the problem pats NewsGuard on head while giving them an encouraging smile


The majority of humans can't defend themseles against their own confirmation bias, even when they are aware of the phenomenon. I suspect this isn't going to improve anytime soon.


It is not confirmation bias. It is a different tendency.

It is a tendency to absorb whatever we read or hear - regardless of its information content.

IMO: Human languages evolved more for bonding than for conveying information. So, words exchanged were more to soothe the emotions of individuals involved rather than convey useful information about the state of their environment.

Hence we are fundamentally wired not to process the information content critically but to be soothed emotionally by whatever we read or hear


That's what I've always thought too.


I'd like to think they can learn. Even with all the heavy metal poisoning, I don't think it's too late to teach new generations how to think critically.

I can't speak to how bad things are elsewhere, but the average American only reads at a 6th grade level. That's also about where their understanding of science is. Their math skills are even worse. Americans today are not a good indicator of what humanity is capable of. We could always try to fix our educational system as opposed to sabotaging it. I have to believe we can do better.


> The group estimated that around $13 billion is wasted globally on these sites each year

So if google fixes this problem, their revenue goes down $13B/year. Seems like they will never solve this problem.


Yeh when I read this I’m not sure I shed a tear for anyone


Right out of college I worked in a low morals lead generation company. They paid one of the salesman to write content and two of my fellow tech guys made websites off the content to collect real estate leads. The CSS generation might need a little human touch for aesthetics, but yeah you don't need to pay the people for the majority of this low effort trash anymore.

Which is bad... because I think the soul crushing work was what kept us from producing content at the rate a GPT assisted process will.


A decade ago, Google would have made a real effort to stop this, as it would have ruined the user experience for Google Search. The proof of this is in their algorithm update in 2011 ("Panda")[0] that among other things, severely downranked the deluge of free blog account sites that posted nothing but copied material (to rank for search terms) and surrounded them with spam links and ads. If you were online in the late 2000s, you'll remember how many of these trash sites there were back then.

Today, search is horrible to use because Google has for years let these spam sites proliferate. Generative text content is the 1.5C event for text search engines -- it's too late to reverse it, we just have to find higher ground somehow.

[0] https://searchengineland.com/google-panda-update-guide-38110...


Is it time to go short on GOOGL? Alphabet is a house of cards with Youtube as the exception. There is a chain of products that only exist to support search and is actually providing a 'counter momentum' that is preventing the company to do the necessary. And if they don't do it, the market eventually will.


> the market eventually will

To be more precise, is doing that right now. I ofter hear (in my bubble) people replacing Google for ChatGPT.


It’s so interesting, even few years ago I wouldn’t guess the main AI problem was “it generates convincing low-quality blogspam”.

There was all the stuff about singularity and how AI reinforces existing biases, but the most pressing problem seems to be “it’s seems almost right but also very wrong, way too cheaply”


It always comes down to what your investors want.

Remember when Uber was the harbinger of a driverless taxi future? Now they're hoping to be profitable through food delivery, like an impoverished agrarian economy praying for rain.


> I wouldn’t guess the main AI problem was “it generates convincing low-quality blogspam”.

Why not? This is the lowest-hanging fruit there is, and the open web's sole method of monetization has been ads.


We are worried about AI apocalypse, but the most short term outcome is AI spam and clickbait.

Our voice are just getting better. Expect tons of spammy calls. Or you think you're calling a human tech support and it's just some machine blabbing about snake oil.

Sam Altman talks about having a 100 humans in the cloud working for you. It easily turns into a 100 humans in the cloud trying to juice every little penny out of you they can.

Google's algorithms are quite dumb, and LLMs are getting real good at parroting.


Something I find astounding are all these pseudo-medical ads in Google Adsense and YouTube.

How can they even let them through?


>How can they even let them through?

The ad buy checks/credit card charges don't bounce?

Don't forget who is the customer in those transactions. Alphabet is an advertising company, not a search engine or video library.

Their goal is to maximize revenue, so if someone is willing to pay, why would they refuse to take the money?


Because false advertising, especially advertising fake medical devices, false drugs, and made up treatments, is illegal in the countries I was in.

Putting up a billboard with any of this crap would get someone in prison. And justly so.


>Because false advertising, especially advertising fake medical devices, false drugs, and made up treatments, is illegal in the countries I was in.

>Putting up a billboard with any of this crap would get someone in prison. And justly so.

Sorry for the late reply.

By all means report the advertisers and Google to the authorities wherever you are. I certainly wouldn't be upset of some (or all) of those folks went to jail. In fact, I think it would quite nice.

I use an ad blocker so I don't see such things. As such, if you do please, by all means, involve the authorities.


Why is anyone on HN seeing Adsense ads? Why aren’t you running an ad blocker?


Because I’ve spent the last 18 months implementing marketing integrations and continuously playing around with settings just to be able to read some documentation got old after a week.

But perhaps that’s why Google lets this harmful crap fester on their ad platforms. Perhaps they simply aren’t aware of it because everyone there runs an ad blocker. This would somehow be comforting, compared to the alternative.


News websites filled with "AI" headlines pulling in money from ads, ha-ha.

I mean, there is no sensation here. Before AI-generated text, you could pay 5$ per 1000 characters of human-generated text, and it was used to do the same thing – playing Google's game.


Since CAPTCHAs are mostly over (or need a frequently changing complex mousetrap), ads should only be billed on paid sales and CPI, it would eliminate CPC and CPM fraud because these business models can no longer survive the onslaught of generative fraud.


I once worked at a company that did something very similar to what you’re suggesting. It doesn’t work as well as it sounds.

The problem is that the feedback mechanism is far too easily compromised (both by intentional means and unintentional ones).

In order for such a scheme to function properly, the ad network has to be able to tell how many sales were made as a direct result of the click. The feedback loop needs to be nearly perfect. As things stand, it is terribly easy for ad purchasers to game, and not everybody is honest.

I agree that the concept is ideal, but I have yet to see an implementation that actually works without a great deal of additional effort (and a lot of human attention, and many many arguments…).


Good. Let them git the digital equivalent of Creutzfeldt-Jakob. It's happening and has been happening. Let them go nuts. AOL.com exists, how many of you had to check to see if that is actually true. I had to check. I mean, there was a merger. They gotta do what they gotta do. It grinds out pennies on nonsense no one checks on. We taught a rock to interconnect our minds ephemerally, yes the ad cents are making dollars even if it doesn't make sense but aren't the dollars or euros or NFSs or whatever just a pac-man score at this point. Like one you can exchange for food or a Porsche or whatever..... but what is the THING. AI doomerism is here to put horse blinders on us to keep us away from the fact that cross browser compute shaders are a thing now.

Protocols not platforms y'all. The game changed. Let them make money making nonsense. Cut through bullshit. Kill your masters. Host it yourself.


My personal experience is that Google is basically mostly garbage. The open web is dead.


The only threat this poses is to the humans who currently write content for these junk websites. Often assisted by AI anyway.


It poses a threat for people who are searching for information on the internet that isn't shit. You run into so much of it looking for stuff on cooking.


Four pop-up modals that block content on MIT Technology Review. Shameful.


“If policymakers banned the use of these types of ad services, consumers would face a radically different internet: more ads that are less relevant, lower-quality online content and services, and more paywalls,”

The most popular websites already run the maximum number of ads their users will tolerate. There is no scenario where websites “run more ads” to compensate for LLM bots.


I'll probably be filtering out books copyrighted 2023 or later.


What will happen when content is 99% AI generated?


Jump to marginalia?

https://search.marginalia.nu/

Until it suffers the same fate…


buzzfeed has been doing this for years without AI, I really don't see much difference here


tell me about ads, I can't read article without closing thousands popups. >_<


Maybe advertisers should not be trusting their money with google so easily


I posted about a 60,000+ article ecosystem here - https://news.ycombinator.com/item?id=36304763

Not to pick on the commentator, but at least on HN comment used it as an incorrect reference. I found it because I was going to use it as a reference, but I fact checked it and it was wrong. But also found a second ecosystem that fact check it as correct... I haven't looking into that ecosystem yet.

Incorrect information is good for SEO, because it's unique content. GPT can do it well with prodding. It's a very different beast to Markov chain or translation spam


your link leads to an empty page, is the comment deleted?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: