
Tens of Millions of Errors Per Hour: Investigation Reveals Google AI Search’s “Accuracy Illusion”
TechFlow Selected TechFlow Selected

Tens of Millions of Errors Per Hour: Investigation Reveals Google AI Search’s “Accuracy Illusion”
Even when the answer is correct, more than half of the cited links fail to support its conclusion.
Author: Claude, TechFlow
TechFlow Intro: A recent joint test by The New York Times and AI startup Oumi reveals that Google’s AI Overview feature achieves an accuracy rate of approximately 91%. Yet given Google’s annual volume of 5 trillion searches, this translates to tens of millions of incorrect answers generated every hour. More troubling still, even when answers are correct, over half of the cited links fail to substantiate their conclusions.
Google is now disseminating misinformation to users at an unprecedented scale—most of whom remain completely unaware.
According to The New York Times, AI startup Oumi was commissioned to assess the accuracy of Google’s AI Overviews using SimpleQA—the industry-standard benchmark developed by OpenAI. The evaluation covered 4,326 search queries, conducted in two rounds: one last October (powered by Gemini 2) and another this February (after upgrading to Gemini 3). Results show accuracy improved from roughly 85% under Gemini 2 to 91% under Gemini 3.
Ninety-one percent sounds impressive—until placed against Google’s scale. With about 5 trillion searches per year, a 9% error rate means AI Overviews generate over 57 million inaccurate answers per hour—or nearly one million per minute.
Correct Answers, Wrong Sources
Even more alarming than accuracy rates is the phenomenon of “source detachment.”
Oumi’s data shows that during the Gemini 2 era, 37% of correct answers suffered from “ungrounded citations”—i.e., the links attached to AI summaries did not support the information provided. After upgrading to Gemini 3, this figure rose—not fell—to 56%. In other words, while delivering correct answers, the model has become increasingly incapable of “showing its work.”
Oumi CEO Manos Koukoumidis cuts straight to the heart of the issue: “Even if the answer is right, how do you know it’s right? How do you verify it?”
This problem is further exacerbated by AI Overviews’ heavy reliance on low-quality sources. Oumi found Facebook and Reddit ranked as the second- and fourth-most-cited sources for AI Overviews, respectively. Among inaccurate answers, Facebook was cited 7% of the time—higher than its 5% citation rate among accurate answers.
A BBC Reporter’s Fake Article “Poisoned” Search Results Within 24 Hours
Another serious flaw of AI Overviews is their susceptibility to manipulation.
A BBC reporter tested the system with a deliberately fabricated false article—and within less than 24 hours, Google’s AI Overview presented its falsehoods as factual information to users.
This implies that anyone familiar with how the system works could “poison” AI search results by publishing misleading content and artificially boosting its traffic. Google spokesperson Ned Adriance responded that the search AI relies on the same ranking and safety mechanisms used to filter spam, adding that “most examples in the test involve unrealistic queries that people wouldn’t actually search for.”
Google Counters: The Test Itself Is Flawed
Google raised several objections to Oumi’s study. Its spokesperson labeled the research “seriously flawed,” citing three main reasons: the SimpleQA benchmark itself contains inaccuracies; Oumi used its proprietary AI model HallOumi to evaluate another AI’s performance—a process potentially introducing additional errors; and the test queries do not reflect real-world user behavior.
Google’s internal testing also revealed that, when run independently outside Google Search’s framework, Gemini 3 produced hallucinated outputs at a rate as high as 28%. Nevertheless, Google emphasizes that AI Overviews leverage Google’s search ranking system to improve accuracy—thus outperforming the base model.
Yet, as PCMag pointed out in its commentary, there lies a logical paradox in Google’s defense: If your rebuttal is “the report accusing our AI of inaccuracy itself relied on an AI that may be inaccurate,” that hardly bolsters user confidence in your product’s reliability.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














