Hacker News new | past | comments | ask | show | jobs | submit login
Google edits Super Bowl ad for AI that featured false information (theguardian.com)
99 points by nemoniac 4 days ago | hide | past | favorite | 80 comments





Apparently, the rabbit hole goes deeper. The wayback machine puts the "AI Generated" description on the cheese website at August 7th, 2020. The AI didn't hallucinate anything because it didn't generate anything, the entire premise is simply fake.

https://web.archive.org/web/20200807133049/https://www.wisco...

The (edited) cheese ad: https://www.youtube.com/watch?v=I18TD4GON8g

What probably should be the target link: https://www.theverge.com/news/608188/google-fake-gemini-ai-o...


> The AI didn't hallucinate anything because it didn't generate anything, the entire premise is simply fake.

The article literally says it's not a hallucination and that the detail came from real websites.

"Google executive Jerry Dischler said this was not a “hallucination” – where AI systems invent untrue information – but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes..."


> The article literally says it's not a hallucination and that the detail came from real websites.

The "hallucination" term generally refers to any made-up facts. Harsh as it may be to put this weight of responsibility on LLMs, users of LLMs generally use them in the expectation that what is says is true, and has been (in some magic hand-wavy way) cross-checked or confirmed as factual. Instead they will print out what is most likely to follow the user's input, based on the training data.

Unfortunately, a vast amount of that vast corpus of training data is social media posts which can't be relied upon to be true. But if it gets repeated a lot then it's treated as true in the sense that "what does 'salary' mean" is generally followed by a billion social media posts saying "it referred to the time that Romans soldiers were paid in salt, because salt was a currency at the time"


Ok. None of what you said points to the commenter having discovered something novel or the other article being better. It's already stated in the OP article that the problem is caused by the internet containing false information.

> Apparently, the rabbit hole goes deeper.

But it doesn't go deeper than what's already in the article. The article already talks about how the problem is that the internet contains a bunch of misinformation and the LLM is as credulous as the average human, which is to say extremely so.


> The article literally says it's not a hallucination and that the detail came from real websites

The article says that's what a Google exec claims, not that that's the actual case. They haven't pointed to any of those websites and we don't have to take them at their word.

Someone further down pointed to a source on cheese.com, where it says gouda makes up 50% to 60% of all global consumption of Dutch cheese. If the source is accurate, the AI hallucinated an incorrect response.


“the Google executive Jerry Dischler said this was not a “hallucination” – where AI systems invent untrue information – but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes.”

So what you are saying is that Gemini is basically useless?

Good to know. Thanks for clarifying that Google!


Not just the "50 to 60 percent" statistic, it looks like the entire paragraph in the ad is word for word identical to a preexisting cheese shop site. So either Gemini's "generation" is just searching the web and doing verbatim copy paste, or Google copied it themselves which would be a new low for Google.

A lot of the answers given by Gemini are just a verbatim copy of parts of web sites. I have seen this time after time when comparing the AI results and text that I find in the referenced web sites.

> “the Google executive Jerry Dischler said this was not a “hallucination” – where AI systems invent untrue information – but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes.”

This is a huge fail for Gemini. Google, of all companies, should know the incentives to distort information aggregators for monetary gain (just look at how bad search results are these days). It's 100% expected that people will try to game LLMs the same way. It's entirely dependent on the LLMs to counteract this.


Maybe the goal is not accuracy, but market capture?

No, they're saying that Google searching in general is often useless.

If I do a google search I can inspect the source of a response for trustworthiness and I can inspect several other sources that also appear in the results for agreement.

With an AI query I get one response back and although people are trying to teach their AI tools to cite sources, this is not universal.


Checking your web search sources and credibility is a lot easier

That's quite a leap there from "sometimes wrong" to "basically useless"...

> quite a leap there

A healthy hop I would say, keeping in mind that this was showing off the best side of it in a commercial. So it just did a Google search, got a wrong factual result, the LLM couldn’t verify it was bogus, regurgitated it as is, and the executive piped in with “acshually, it’s just a plain old search result” somehow not realizing that just makes them look even worse.


If a model often produces information that is blatantly wrong, then you need to check ALL of its outputs. If you're going to have to double-check all information that it provides, you might as well skip using it entirely and search for the information directly.

You're missing the part where searching for the information directly might take hours or even days.

You're looking for black-and-white truth, while the real world is actually more interested in efficiency.

I can spend 2 days scouring documentation and forum posts and experimenting to get ffmpeg or Matplotlib to produce exactly the results I want. Or I can just ask ChatGPT and check if its code works, and if it doesn't maybe spend 10 minutes correcting it so it does, or refining the prompt so it does.

And so you're also missing the part where verifying the correctness of output is very often many orders of magnitude faster than coming up with the output in the first place.


LLMs produce wrong results. News at 11.

> but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes

It was a verbatim copy of the text, which raises the question about copyright. I guess "Wisconsin Cheese Mart" could probably sue Google, but more importantly, this would also mean that it is capable of making verbatim copies of the books contained in Anna's Archive.

That is, assuming that Google also did torrent the books like Meta did, which is very likely.

Funny how they call it "reflection", and not "copy". Maybe that's the way to win in courts: I didn't copy that movie, that was just a reflection of it.

Wouldn't this user of Gemini, who would copy Gemini's response to use this text in their ads, potentially get into copyright troubles with "Wisconsin Cheese Mart"?


Don't worry, just ask it to cite its sources so you can verify

AI search as yet lacks necessary incredulity.

“I did not invented some bullshit, I just repeated the bullshit someone else has invented” doesn’t really sound like a great argument.

87% of the Internet is bullshit.

how could you possibly know that?


That was the point…

Bingo. They admitted what everyone suspected: they are just scrapping BS from BS websites and spilling it back as something created by an "intelligent agent".

Amazing, they got the "actually" guy for a PR person. That's still shit Jerry!

For better or worse Jerry isn't a PR person

To be fair, do you believe a human would do better? If you hired someone to create content for your cheese website they would probably do the same thing. They'd search google and copy statistics without triple-checking all of them.

I don't think it's fair to frame this is an AI issue. The internet is simply full of misinformation. Should AI outperform the average human and detect misinformation? Maybe. But I don't think that's part of the current value prop, at least not for "non-reasoning" models. If you use an LLM you should be aware that what you're getting is whatever is most commonly found on the web. If that's not what you want, don't use an LLM.


Google saying “multiple websites have the fact so Gemini ran with it” is extraordinarily rich.

If ever there was a company which should know that “multiple websites” is not a good benchmark for accuracy, it’s Google.

This feels like a good parable for Google’s search these days. I’ve seen more wrong information from them than anyone lately. I wonder if they can course correct before it’s too late.


Does this matter?

If a human would research the matter they would arrive to the same conclusion by visiting those websites. Essentially, an average person would have made the same mistake when researching unless you have access to specific dairy industry data. We might want to hold LLMs for higher standards than human, but AFAIK every LLM comed with a disclaimer "fact check yourself."

This regard Grok is the best as it gives you the source list for cross referencing yourself.


Actually I would have argued that any human being with an ounce of common sense would find it obvious that gouda does not account for 50% to 60% of global cheese consumption. Like, you don't need an industry report to know for sure that is patently untrue.

But apparently this did not smell off to any of the many, many people who worked on the ad. That is the most baffling part to me. Are these people so bought into the hype of the product they're promoting that they just switch their brains off?


I think it does because a researcher can pick up context about the quality of their sources through the course of web research. The BigCo chatbot AIs are marketed to represent that BigCo, and people generally trust Google, in this case. It's good when they cite sources, but a major point of the chatbot is to abstract that legwork for most people.

The why did a human blogger find the error so quickly? Why didn't his fact check determine that the claim was true? Does Gouda being responsible for 50-60% of all global cheese consumption even sound remotely believable?

The source sentence on cheese.com was: "It is the most popular Dutch cheese in the world, accounting for 50 to 60% of the world's cheese consumption."

The glorified autocomplete failed to detect the nuance in this (admittedly poorly written) sentence that the 50-60% statistic was of the world's Dutch cheese consumption


Tbh I find it hard to read that sentence as meaning "50-60% of the world's Dutch cheese consumption".

The sentence isn't poorly written or nuanced, it's simply false.


That’s because GP also interpreted it wrong.

Out of all Dutch cheese consumed internationally, 50-60% is Gouda. Or, in other words, 50-60% of Dutch cheese exports are Gouda.

At least that’s how I read it.


Yes, I understand the purported true version of the fact. But it’s impossible to interpret the original quote like that without shoehorning it.

What does that mean for LLMs? The Internet has lots of poorly written text. If LLMs can't distinguish nuance, ambiguity, or lack of clarity then what exactly are they generating and why would their output be useful?

Taking a poorly written sentence, interpreting it as meaning something incorrect, and then presenting it with authoritative, confident language is very close to gas lighting.


I guess this is going to be a fun game for AI. Not only does it have to contend with false information vs true informatino, it also has to figure out correct information that might be written in an ambiguous way.

The sentence is wrong, but the AI should not be regurgitating the first factoid it encounters regardless

I know very little about cheese and such a stat seems just incredibly obviously untrue. I wonder how it made it past the dozens or more people who worked on the ad.

People? It was just all AI slop all the way down.

Seems like it was people

> It turns out Gemini didn't hallucinate a fake stat, Google just copied a website's existing text instead.

https://gizmodo.com/googles-ai-super-bowl-ad-fiasco-somehow-...


This is a better link - it makes it clear that the full text shown in the ad came from existing web page, not just that incorrect statistic: https://www.theverge.com/news/608188/google-fake-gemini-ai-o...

Don't blame some website on the fact no one fact checked it

The point is, people didn’t check and AI didn’t hallucinate.

Isn't AI supposed to do the fact checking?

It depends. If you're selling the AI then yes, it will do better fact-checking than any human could do. However, if you're explaining to a client why your AI made up stuff that has costed someone thousands of dollars then no, it was always your employee's job to double check everything.

And AI or not, I'm willing to bet AI was used to fact check.

I can imagine right now, all over the place, people are being tasked to write some article or provide some stats. They use an AI to do the work for them, lazy people that they are.

Then, their manager plugs the stats into an AI to fact check...

I remember some movie from the 80s, pre-internet. Anyhow, everyone had an implant, and it was decades in the future, no one could even read or write.

Instead, they'd just go through their day and if they were unsure of the answer to something, the implant would search a database and just fill in the info. From their perspective, they couldn't even tell if "what they knew" was in their head, or provided from the implant.

Anyhow, one guy couldn't get an implant and was considered disabled, for he had to learn to read, and acquire knowledge the old fashioned way. He slowly discovered that if he even tried to show people how to read, they were blocked from learning. And if info was provided contrary to "the public good", people simply couldn't understand the concept.

Turns out, some central computer decided what was right and wrong, and those that created it and perhaps once controlled society died... leaving in in charge.

This is what we saw. We saw someone query their implant, and then fact-checkers query their implant, and OK! All good!

Also, I think there were onions in the movie.


You got me curious and it sounds like this might be an episode of The Outer Limits. Perhaps it is not what you are remembering, though.

https://movies.stackexchange.com/questions/83389/movie-where...


Yes, that's it! I recall that every time he tried to explain the problem, the implant would prevent them from understanding that the problem was, the implant.

Google’s org chart is packed full of Product Managers and other similar titles that get paid millions to do ~nothing.

The number of L8+ “leaders” and “drivers” is really jaw dropping.


I refuse to believe that anybody is managing Google's products.

I have no clue how these people drive business value and actual software developers implementing those features don’t.

> The local commercial, which advertises how people can use “AI for every business”, showcases Gemini’s abilities by depicting the tool helping a cheesemonger in Wisconsin to write a product description

By copy/pasting another Cheese monger's incorrect product description...


Yet it’s something the cheese person would know how to do. Becoming a cheese headmaster in Wisconsin is not an easy task.

Top relevant search result is a Reddit thread from 11 years ago quoting a cheese.com article: https://www.reddit.com/r/todayilearned/comments/2h2euc/til_g...

Internet Archive confirms that page has had the blatantly wrong stat up since at least April 2013: https://web.archive.org/web/20130423054113/https://www.chees...

> If truth be told, it is one of the most popular cheeses in the world, accounting for 50 to 60 percent of the world's cheese consumption.

Clearly predates generative AI, so I think this is junk human-written SEO misinformation instead.

Here's that page today: https://www.cheese.com/smoked-gouda/ - still has that junk number but is now a whole lot longer and smells a bit generative-AI to me in the rest of the content.


That the greatest invention since controlled fire (if we are to believe the hype) was unable to discern as SEO misinformation

Right, especially bad since this is Google. Their brand should mean more than this.

I've been writing about how the greatest weakness of LLMs is their gullibility for ages. This right here is a great example - see also the Encanto 2 thing from a few weeks ago: https://simonwillison.net/2024/Dec/29/


AI results should be better than this, but there will be hallucinations from time to time. That is entirely foreseeable.

The true failure here is by the humans who couldn't be bothered to do the bare minimum to protect the brand. This would be a major failure if it where any an sort of ad, but it's utterly unfathomable for a highly expensive and absurdly visible Superbowl ad. Why does anyone involved, from intern to CMO, at either the agency or Alphabet, have jobs that they seem ruthlessly indifferent to performing with any sort of attention or care?


I had the same thing happen to me, except for Cars 4! Google AI picked up a fan fiction trailer summary for Cars 4 from somewhere akin to Fandom and reported it as true.

Ka-chow, Google!


I’m very far from an evangelist of the tech but I was under the impression they had gotten better at this sort of thing.

I question why a model which “knows” about cheddar / mozzarella cheese would make this blunder.

Was this supposed to be generated by one of those “show your work” reasoning models or is this just the regurgitation of one of the single short response parrot of old quora answer or reddit post ai chatbots?


Maybe it's because of the way the search integration is set up? All the search results are put in the context window and the model is just told to summarize it, so that's what it dutifully does, without considering whether or not the statements are plausible. Even without one of the reasoning models, if the prompt were adjusted to tell it that some of the statements may be incorrect and that information should be evaluated for plausibility first, then the result would probably be better.

I asked 4o and Sonnet to "Evaluate the plausibility of this statement: Gouda is the most popular Dutch cheese in the world, accounting for 50 to 60% of the world's cheese consumption".

4o correctly said

>While Gouda is an iconic and highly popular Dutch cheese, the claim that it accounts for 50–60% of the world's cheese consumption is implausible and likely a misrepresentation of its market share. Global cheese consumption is far too diverse for any single variety, including Gouda, to dominate to such an extent."

>Sonnet says "A more accurate statement would be that Gouda is one of the world's most popular cheeses and represents a significant portion of Dutch cheese production and exports. It's estimated that Gouda accounts for over 60% of Dutch cheese production, which might be where the confusion stems from."

Both correctly pointed out mozzarella, cheddar, and paremesan as the actual likely candidates for most popular cheese. So the models are clearly quite capable of this, any bad result is likely just prompting error, e.g. blindly asking for a summary.


My understanding off LLMs is not great, but Im pretty sure that when the glorified predictive text come up with a sentance, it doesnt cross reference that with any other knowledge to verify it. It just spits it out.

Gouda being 50-60% of global cheese consumption is obviously complete rubbish. I'm shocked no-one working on the ad noticed.

My company contracted one of these LLMs to give us a bespoke chatbot. Works great for translation, and that's what I've been using it for. I popped in the line "Table 1 (etc, etc)"

It perfectly translated the line, but doesn't it also give me a completely made up 2-column 10-row data table! I asked it why, and the response was along the lines of "I am designed to make your life easier, and I thought providing this table would reduce your workload"


Google: "Gemini didn't make a mistake. It simply plagiarized a factually-incorrect article. Word for word."

Gee, thank you Google for convincing me you have a product that I find useful and that I can trust. I look forward to you trying to cram this down my throat, against my will, at every opportunity you see fit. :-/


This is at least the second embarrassment with a high-profile Google AI ad/demo.

(I'm also thinking of when, in the LLM boat-missed frenzy, they faked an interactive AI demo, to make it look much more responsive than their actual tech was.)

I'm unclear on how either incident was allowed to happen.


> Jerry Dischler said this was not a “hallucination” – where AI systems invent untrue information – but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes.

So LLMs like any other computer system suffer from "Garbage In, Garbage Out".


none of the marketing geniuses thought: hey, that's odd, 50% of my cheese consumption is definitely not Gouda which probably accounts for every single one of them, Dutch people included.

It's bizarre to me how the media (and mainly tech media) has been bending over backwards to make this story a big deal.

A cheese fact on a cheese website was wrong therefore Gemini is bad? What?


Just because it’s on the Internet doesn’t make it true Google!

Sounds like Google's AI produced some stats that were quite obviously mistaken, but nobody cared to fact-check in the slightest before shipping.

> the Google executive Jerry Dischler said this was not a “hallucination” [...] but rather a reflection of the fact the untrue information is contained in the websites that Gemini scrapes.

What's this guy is describing is pretty much the root cause of what we colloquially refer to as "hallucination".


> What's this guy is describing is pretty much the root cause of what we colloquially refer to as "hallucination".

I thought a hallucination is when it provides fabricated facts because it is more interested in generating another token than being factually accurate? Not trying to argue, just trying to pin down a definition for this term.


Yeah. An MLM basically just computes the mostly likely thing based on its training, and says that. If it has no knowledge of a thing, it still calculates the most likely thing, which is a hallucination since it can't possible calculate that effectively.

Regurgitating false statements is not a hallucination, it's bad training data.

It's the difference between being wrong and just making things up from nothing.


“You don’t have any gouda?!” “‘Fraid not sir.” “But it accounts for 50-60% of total cheese consumption in the world!”

This is just another point showcasing how useless LLMs are. They either hallucinate OR are polluted by "wrong" information on websites. If I need to double or triple check everything an LLM does, I'd rather just do it myself immediately.

The correct answer is libertarians, btw.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: