Google explains AI Overviews’ viral mistakes and updates, defends accuracy

Abner Li | May 30 2024 - 5:17 pm PT

Google this afternoon published a long response about AI Overviews and their accuracy. The Search feature launched at I/O 2024 in the US and has been criticized for some high-profile mistakes.

Google starts by explaining how AI Overviews operate, including how they “work very differently than chatbots and other LLM products.”

They’re not simply generating an output based on training data. While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional “search” tasks, like identifying relevant, high-quality results from our index. That’s why AI Overviews don’t just provide text output, but include relevant links so people can explore further.

AI Overviews are “backed up by top web results,” with Google trying to distinguish it from the broader LLM hallucination problem that some have argued make LLMs a bad fit for Search.

This means that AI Overviews generally don’t “hallucinate” or make things up in the ways that other LLM products might

Instead, when AI Overviews get it wrong, Google says common issues are “misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available.”

Google highlighted some of the viral instances. In the case of “How many rocks should I eat,” Google acknowledges that it doesn’t handle satirical content well. It also points to how The Onion article was “republished on a geological software provider’s website.”

So when someone put that question into Search, an AI Overview appeared that faithfully linked to one of the only websites that tackled the question.

The other case Google highlighted was “using glue to get cheese to stick to pizza” and over-indexing on forums (Reddit in that case) as a source of reliable first-hand knowledge.

Finally:

In a small number of cases, we have seen AI Overviews misinterpret language on webpages and present inaccurate information. We worked quickly to address these issues, either through improvements to our algorithms or through established processes to remove responses that don’t comply with our policies.

In terms of next steps, Google has “limited the inclusion of satire and humor content” as part of “better detection mechanisms for nonsensical queries.” Additionally:

“We updated our systems to limit the use of user-generated content in responses that could offer misleading advice.”
“We added triggering restrictions for queries where AI Overviews were not proving to be as helpful.”
“For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections.”

The company previously said that the “vast majority of AI Overviews provide high quality information.” Citing its own tests, Google says today that the accuracy rate of AI Overviews and the quote-based Featured Snippets are “on par.” However, it unfortunately did not share any data to back that up.

Add 9to5Google to your Google News feed.

FTC: We use income earning auto affiliate links. More.