Pulumi AI is poisoning Google search results with AI answers

AbrahamParangi · 2024-04-29T14:38:49 1714401529

This is arguably more Google's fault than the content, obnoxious though it is. Google has been sacrificing search quality at the altar of search ads KPIs for a decade or more now and it shows.

VHRanger · 2024-04-29T14:46:38 1714401998

It's totally Google's fault. The original PageRank algorithm was more resistant to low quality content - it was fundamentally about a reputation web of trust.

As Google shifted from valuing high quality sources referring to you to other signals, it became more gameable.

Note that the "high quality" part matters - it's harder to game getting, say, a link from real humans at a high quality place than it is to generate yet another SEO blogspam website with 10,000 backlinks from other SEO blogspam.

nradov · 2024-04-29T15:05:43 1714403143

Links between sites stopped being as relevant to content quality ranking when the web became commercialized. Much of the high quality content (along with low effort junk) moved to commercial sites which mostly don't link to other sites. Or, if they do have external links it's because they're getting paid.

pcthrowaway · 2024-04-29T17:35:14 1714412114

That's mainly Google's fault also, for running Adsense

hn_throwaway_99 · 2024-04-29T15:07:27 1714403247

I agree with the other comments that Google definitely has sacrificed search quality "at the altar of ad revenue KPIs".

However, I think your comments are fundamentally incorrect re "The original PageRank algorithm was more resistant to low quality content - it was fundamentally about a reputation web of trust. As Google shifted from valuing high quality sources referring to you to other signals, it became more gameable."

PageRank became unusable as a primary source of quality long ago due to stuff like link farms and the like. This is a classic example of Goodhart's Law, https://en.wikipedia.org/wiki/Goodhart%27s_law.

Again, I'm not saying Google couldn't have done better by focusing more on search quality and less on ad revenue, but the idea that the "reputation web of trust" can't be totally gamed by extremely motivated spammers is just flat out false in my opinion.

VHRanger · 2024-04-29T15:35:52 1714404952

Right, but this gameability could have been improved leaning more into reputation scores across the pagerank-style calculation. Yes, this means some manual curation, which Google is allergic to, but it's also much cleaner long run.

I agree Goodhart's law is inevitable for Google search, but it's very hard to game "high reputation human cares about it". The high reputation human has to be bought out, but that should also eventually trigger a downrank of their reputation as a response.

Repulsion9513 · 2024-04-29T15:11:19 1714403479

How does Google determine what is high quality? That is in essence what they've changed.

rchaud · 2024-04-29T15:13:50 1714403630

The original PageRank algorithm wasn't set in stone. If it was, it would have become completely obsolete because it reflected the design of the 1990s internet. Back then, there weren't a million me-too sites on the same exact topic, all trying to capture the traffic of the #1 site in the segment. If you're looking for the Ultimate Beginner's Guide to Whatever In 2024, you have an embarrassment of riches to choose from. /s

PageRank also relied on backlinks as a quality signifier, assuming that more backlinks = more reputable. This has probably been the longest-lasting piece of the algorithm, as gaming this via "link-building" through ugly infographic embeds and blog post syndication on Medium.com, LinkedIn and Substack continues to be popular.

kevincox · 2024-04-30T00:50:39 1714438239

Yeah, I was going to say an alternate title: "Google returning low quality results for many searches."

bearjaws · 2024-04-29T14:43:22 1714401802

Google is becoming unusable for programming related questions.

The telltale sign was when I have to actually put in "MDN" in the search to get MDN to show up.

lelanthran · 2024-04-29T15:00:44 1714402844

> Google is becoming unusable for programming related questions.

> The telltale sign was when I have to actually put in "MDN" in the search to get MDN to show up.

That makes sense. It's not a sign that the search is poor, it's a sign that the search has been intentionally crippled.

It's clear evidence that google prioritises sites with adverts over sites with content. This could not have happened by accident.

kibwen · 2024-04-29T15:01:22 1714402882

For years I just used Google as the "I'm too lazy to search directly in Wikipedia" search box. Now I just properly use Wikipedia's search box. So thanks, AI?

bangaroo · 2024-04-29T14:49:53 1714402193

you would think a company full of engineers working on web technology building a tool for finding info on the internet would understand and adjust for the fact that literally nobody on earth has ever wanted to go to w3schools, ever, at all, in the history of the entire universe.

rchaud · 2024-04-29T15:21:44 1714404104

Speak for yourself. The W3Schools site has a clean information architecture, short chapters on topics and working example code with a sandbox.

The MDN site on the other hand is a pure technical reference site, like PHP.net.

Considering that search works by way of matching search phrases to site text, it's not at all surprising that Stack Overflow and W3Schools rank higher than the reference manual sites.

skywhopper · 2024-04-29T15:12:05 1714403525

No one who actually cares about the quality of the results is in charge. And for search results like the one given in the link (for "aws lightsail xray"), there are no ads, and thus the folks in charge of Google search these days probably prefer the results to be broken.

a_vanderbilt · 2024-04-29T14:55:41 1714402541

Like many companies that stop innovating, it's not the engineers who are in control. "Tonerheads" are. Raghaven has Search under his thumb. Ads is putting on pressure for revenue, and their incentives are not entirely aligned with Search.

aworks · 2024-04-29T19:26:21 1714418781

I didn't know what "tonerheads" were. First Google search for the term gives a reddit link and enough of a summary to understand the meaning. Presumably people aren't trying to make money from that search...

a_vanderbilt · 2024-04-29T20:41:16 1714423276

Here's a YouTube video of a Jobs interview where the idea originated:

https://www.youtube.com/watch?v=Qo8zdPNMRdY

the_duke · 2024-04-29T15:08:59 1714403339

A while ago I started checking each highly ranked but low quality site in Google search for Google ads.

I'm never disappointed.

You can guess the answer for w3schools...

To be fair, a site like w3schools will worry much more about SEO than MDN, but I'm willing to bet that "shows our ads" has a positive impact on site ranking.

Low quality results that display their ads are the ideal scenario for Google:

1 user searches Google, sees some adds

2 user visits low quality site, sees more Google ads

3 user doesn't find useful information, goes back to search

4 repeat

Repulsion9513 · 2024-04-29T15:13:22 1714403602

Oh but of course! If the site shows Google ads then it must be a high-quality, trustworthy site, so you see they'd be irresponsible to not use it! (/s)

pc86 · 2024-04-29T15:02:26 1714402946

I think there are two unrelated causes to this:

1. Marginally cleaning up low-volume technical search results is not going to get anyone promoted or a stock refresher.

2. Most of the programmers at Google are probably good enough not to have to Google programming questions.

And agreed about w3schools, expert sexchange, etc.

krainboltgreene · 2024-04-29T15:07:50 1714403270

> good enough not to have to Google programming questions.

This is the most hilarious concept I've ever heard.

rob74 · 2024-04-29T15:11:23 1714403483

...or they're good enough to know not to use Google for programming questions (and go directly to StackOverflow)?

abyssin · 2024-04-29T14:35:41 1714401341

Generative AI could have a paradoxical effect of making slow, human generated content more valuable because the only alternative is free meaninglessness generated from older meaningless free content. The free, machine generated content would converge to something hypnotic and more and more addictive. The meaningful would be seen as valuable enough for people to pay. There would be no in-between.

datadrivenangel · 2024-04-29T14:51:46 1714402306

This has already started to happen for manufactured goods.

Mass-manufactured and cheap but low quality goods are extremely available. People also will pay a premium to some local craftsperson who makes artisanal stuff at an extremely noncompetitive price, often with less technical quality in some regard.

The real need is to have trusted brands and harder to game information about those companies, so you can more quickly tell when a company starts submitting to the MBAs and eroding their quality.

gtirloni · 2024-04-29T15:10:10 1714403410

People also will pay a premium to some local craftsperson who makes artisanal stuff

I won't, unless it's for the art aspect of it or to help a specific developer/artist that I want to see succeed.

But by default, I'll pay a premium for high-quality very precise and reliable consumer goods that undergo strict quality assurance processes, mass produced or not. If artisanal stuff has less quality, I'm not inclided to pay a premium for it in the general case.

dan_quixote · 2024-04-29T15:26:49 1714404409

>I'll pay a premium for high-quality very precise and reliable consumer goods that undergo strict quality assurance processes

How? Every time I find something like this, Private Equity scoops up this company and strips it for parts by the time I want to buy more.

add-sub-mul-div · 2024-04-29T14:57:59 1714402679

Another side effect is a further loss of privacy if only poisoned content can be searched without being logged into a service that has your full info/credit card.

GPT-(n) pollutes the freely searchable internet to shit while a subscription to GPT-(n+1) is sold as the solution.

kibwen · 2024-04-29T15:03:52 1714403032

The end-game of Google-style surveillance capitalism has always been that privacy becomes the exclusive purview of the wealthy. AI is accelerating the trend, but it didn't start us on this path.

add-sub-mul-div · 2024-04-29T15:23:30 1714404210

It's not helpful to use the fact that the world already sucks to handwave a major force multiplier for the elements that will make it suck much worse much more suddenly.

kibwen · 2024-04-29T15:32:55 1714404775

Then allow me to be clear that I'm not trying to handwave anything: Google needs to be eliminated, along with the rest of the ad industry.

add-sub-mul-div · 2024-04-29T15:43:37 1714405417

Advertising is not likely to go anywhere but we do have an opportunity to nurture significant skepticism about AI before it's too late and becomes as entrenched in the world as Google and social media.

collingreen · 2024-04-29T16:17:36 1714407456

You just used "the world already sucks" to wave away their point just like you criticized in the comment they were responding to. :(

add-sub-mul-div · 2024-04-29T16:51:10 1714409470

Good catch but there is a consistent point I'm making. If the world sucks because of Google etc. then it's counterproductive to use that fact to downplay a new danger, and productive (in my opinion) to write it off so as to focus on a new danger that's still possible to avoid.

thinkindie · 2024-04-29T14:53:57 1714402437

I've noticed the same too. I was experimenting with Pulumi therefore I had to lookup several times on Google. At first I appreciated that their documentation is based a lot on their LLM, however I started to figure out soon that it was plagued with the same issues with other LLMs, plus it was almost everywhere and no really way to escape it.

I ended up switching to Terraform because the quality of documentation was far superior.

RedShift1 · 2024-04-29T14:44:24 1714401864

Add "before:2023" to your search query

zeven7 · 2024-04-29T14:51:16 1714402276

I want this idea fleshed out into a short story

benfortuna · 2024-04-29T14:59:31 1714402771

I'm sure an LLM can accommodate your request.

bilalq · 2024-04-29T16:12:47 1714407167

This is the kind of pro tip I love seeing on HN. It's a shame that the tech field moves so fast and info gets outdated constantly, but at least for now, this helps.

23B1 · 2024-04-29T14:41:46 1714401706

The miasma cometh, followed swiftly by the death of the internet as it stands.

Authentication, verification, and curatorial products will be the next gold rush – and smart investors should be skating to where that puck is gonna be.

brodouevencode · 2024-04-29T14:20:07 1714400407

I wonder how many other companies are (quietly) guilty of the same thing.

bamboozled · 2024-04-29T14:41:46 1714401706

Keep reading about stuff like this, I don't notice it in my day to day usage of Google at the moment honestly, are other people having the same experience, does this actually matter at all? Is it affecting Google in any meaningful way?

astrodust · 2024-04-29T14:33:03 1714401183

w3schools getting bumped down to number two source of outdated, inaccurate results. Impressive.

1123581321 · 2024-04-29T15:31:35 1714404695

I think w3schools should still be ranked highly. MDN seems better organized for people who understand how web development works and just need good reference, but there are a lot of people, possibly many more than there are skilled developers, who don't understand web development but need to make something anyway.

Anecdotally, I watched someone learn front-end web development from scratch last year with a ton of googling. This person didn't understand the concepts at all (still doesn't) but w3schools seemed to help them the most to keep going and get their app made.

I suggested MDN but they found it to be confusing, not even in the "I'm not ready for this yet" sense, but in the "yeah this doesn't answer the question" sense. It did answer the question and they would've been greatly helped by taking the content seriously, but they were still far from knowing what they didn't know. I have to respect that kind of user needs to use google, too.

Larrikin · 2024-04-29T14:42:37 1714401757

What are the issues with them in 2024? Like actual examples.

I don't do webdev but the answers never seemed completely wrong whenever I had to look something up and I know I've read praise in recent years about how they completely revamped and the old "wrong?" Information was fixed. I know some people like to recommend other docs over them but that's not saying they're wrong or bad, just not as good.

astrodust · 2024-04-29T15:01:07 1714402867

That's just the thing. You don't know if it was fixed or not, so you really can't trust the information given.

Their old PHP database material used to be utterly rife with SQL injection. Perhaps they've fixed some of that, but a single injection bug can ruin your day, so it's not something you can take as casually as they seem to.

wccrawford · 2024-04-29T14:54:12 1714402452

It's really hard to fix a damaged reputation. And for a long time, the large majority of their information was horrible. It's going to take a rather long time for people to stop completely blacklisting them.

pc86 · 2024-04-29T15:09:51 1714403391

Why should we?

If you poison the well long enough, nobody's going to drink from it regardless of whether or not you can test the water and prove that it's clean.

UncleEntity · 2024-04-29T16:08:19 1714406899

Yeah, like sourceforge.

I just googled them and the top sub-result is for "AutoClicker download" which, I assume, is intended for click fraud. Luckily "Forge Auto Clicker comes with no ads or malware making an amazing user experience!"

VHRanger · 2024-04-29T14:49:05 1714402145

It's not "wrong", it just has an awful signal/noise ratio.

Repulsion9513 · 2024-04-29T15:15:50 1714403750

Usually the answers were "right" in that they worked but also wrong in that they had issues or were otherwise less-than-ideal.

But I haven't used them in over a decade so lolidk

ThinkBeat · 2024-04-29T23:44:21 1714434261

What is the benefit to Google of making their product continuously worse?

Is it so people will spend more time on Google and it allows Google to show more ads?

and/or

Does it prioritize site that Google has ads at instead of sites that matches searches.

klysm · 2024-04-29T15:49:31 1714405771

This is 100% on Pulumi no? They had to make the intentional choice to make the AI results be indexable by search engines.

cnunciato · 2024-04-30T03:07:11 1714446431

Pulumi engineer here. We did, as an experiment (and labeled it thusly), but we believed we’d have much more control over it than we have. After we began hearing these reports, we began taking them down. The vast majority have since been converted to return HTTP 410 with noindex directives, yet Google still hasn’t responded. It’s been frustrating for us also.

lxe · 2024-04-29T15:23:25 1714404205

The experts-exchange of the new generation

serhack_ · 2024-04-29T14:46:59 1714402019

There're no words for them. Manipulating SEO is equal to other scammers I've seen.

drcongo · 2024-04-29T14:54:18 1714402458

You can't poison poison.