
Google’s AI Paper That Crashed Storage Stocks—Worth $90 Billion—Accused of Experimental Fraud
TechFlow Selected TechFlow Selected

Google’s AI Paper That Crashed Storage Stocks—Worth $90 Billion—Accused of Experimental Fraud
For investors in AI infrastructure, when a paper claims performance improvements of “several orders of magnitude,” the first question to ask is whether the benchmarking conditions are fair.
Author: TechFlow
A paper from Google claiming to “compress AI memory usage to one-sixth” triggered a $90 billion market-cap wipeout across global memory-chip stocks—including Micron and SanDisk—last week.
Yet just two days after publication, Gao Jianyang, a postdoctoral researcher at ETH Zurich and the first author of the algorithm allegedly “crushed” by Google’s method, released a 10,000-word open letter accusing Google’s team of testing their competitor using a single-core CPU running Python scripts while testing their own method on an A100 GPU—and refusing to correct the issue even after being notified of it prior to submission. The post quickly garnered over 4 million views on Zhihu, was shared by Stanford NLP’s official X account, and sent shockwaves through both academia and financial markets.
(For reference: One Paper That Dropped Memory Stocks)
The core issue at stake is not complicated: Did this top-tier AI conference paper—aggressively promoted by Google and directly triggering panic-driven sell-offs across the global semiconductor sector—systematically misrepresent a previously published work and fabricate a false narrative of performance superiority via deliberately unfair experiments?
What TurboQuant Does: Shrinking AI’s “Scratchpad” to One-Sixth Its Original Size
When large language models generate responses, they must repeatedly refer back to intermediate computations already performed. These intermediate results are temporarily stored in GPU memory—referred to in the industry as the “KV Cache” (Key-Value Cache). The longer the conversation, the larger this “scratchpad” grows, increasing GPU memory consumption and cost.
Google’s research team developed TurboQuant, whose central claim is compressing this scratchpad to one-sixth its original size while maintaining zero accuracy loss and achieving up to an 8× speedup in inference. The paper first appeared on the arXiv preprint server in April 2025, was accepted by ICLR 2026—the premier conference in AI—in January 2026, and was repackaged and promoted by Google’s official blog on March 24, 2026.
Technically, TurboQuant’s approach can be simplified as follows: First, apply a mathematical transformation to “whiten” irregular data into a uniform distribution; second, compress each transformed value using a precomputed optimal compression table; third, employ a 1-bit error-correction mechanism to fix computational deviations introduced by compression. Independent community implementations have verified TurboQuant’s compression efficacy, confirming its genuine mathematical contribution at the algorithmic level.
The controversy lies not in whether TurboQuant works—but in what Google did to claim it “vastly outperforms competitors.”
Gao Jianyang’s Open Letter: Three Accusations, Each Piercing and Well-Documented
At 10 p.m. on March 27, Gao Jianyang published a lengthy article on Zhihu, simultaneously submitting a formal review on ICLR’s official peer-review platform, OpenReview. Gao is the first author of RaBitQ—an algorithm published in 2024 at SIGMOD, the top-tier database conference—addressing the same problem: efficient high-dimensional vector compression.

His three accusations are each backed by email records and precise timelines.
Accusation One: Using Another’s Core Method Without Attribution
TurboQuant and RaBitQ share a critical technical step: applying a “random rotation” to data before compression. This operation transforms irregularly distributed data into a predictable, uniform distribution—dramatically lowering compression difficulty. It is the most fundamental and closely aligned component between the two algorithms.
TurboQuant’s authors acknowledged this point in their response to reviewers—but never explicitly disclosed the connection to RaBitQ anywhere in the final paper. Crucially, TurboQuant’s second author, Majid Daliri, proactively contacted Gao’s team in January 2025 requesting help debugging his Python implementation derived from RaBitQ’s source code. His email included detailed reproduction steps and error logs—meaning TurboQuant’s team had deep, firsthand familiarity with RaBitQ’s technical details.
An anonymous ICLR reviewer also independently noted the shared technique and requested thorough discussion. Yet in the final paper, TurboQuant’s team not only omitted such discussion but moved the already incomplete mention of RaBitQ from the main text into the appendix.
Accusation Two: Labeling the Competitor’s Theory “Suboptimal” Without Evidence
TurboQuant’s paper labels RaBitQ as “suboptimal,” citing only that its mathematical analysis is “relatively coarse.” Gao counters that RaBitQ’s extended paper rigorously proves its compression error achieves the theoretical optimum bound—a result published at a top-tier theoretical computer science conference.
In May 2025, Gao’s team explained RaBitQ’s theoretical optimality in detail across multiple email exchanges. Daliri confirmed all TurboQuant authors were informed. Yet the final paper retained the “suboptimal” label without offering any counterargument or evidence.
Accusation Three: Rigged Benchmarking—“Tying One Hand While Wielding a Sword in the Other”
This is the most damaging accusation. Gao points out that TurboQuant’s speed comparison experiments employed two layers of unfair conditions:
First, RaBitQ’s official release includes an optimized C++ implementation (with multithreading enabled by default), yet TurboQuant’s team opted not to use it—instead testing RaBitQ via their own translated Python version. Second, RaBitQ was benchmarked on a single-core CPU with multithreading disabled, whereas TurboQuant ran on an NVIDIA A100 GPU.
The combined effect? Readers saw the conclusion that “RaBitQ is orders of magnitude slower than TurboQuant”—unaware this outcome resulted from Google binding their opponent’s hands before the race. The paper failed to adequately disclose these experimental disparities.
Google’s Response: “Random Rotation Is a Standard Technique—We Can’t Cite Every Paper That Uses It”
According to Gao’s disclosure, TurboQuant’s team stated in an email reply dated March 2026: “The use of random rotation and the Johnson-Lindenstrauss transform has become standard practice in this field—we cannot cite every paper employing these methods.”
Gao’s team argues this is a category error: The issue is not whether *all* papers using random rotation merit citation, but rather that RaBitQ was the first work to combine this method with vector compression *under the exact same problem setting*, and to prove its optimality. TurboQuant’s paper therefore bears responsibility for accurately describing its relationship to RaBitQ.
Stanford NLP Group’s official X account shared Gao’s statement. Gao’s team has posted a public review on ICLR’s OpenReview platform and filed a formal complaint with ICLR’s Program Chairs and Ethics Committee. They plan to publish a detailed technical report on arXiv shortly.

Independent tech blogger Dario Salvati offered a relatively neutral assessment: TurboQuant does contain genuine mathematical contributions—but its relationship to RaBitQ is far closer than the paper’s presentation suggests.
$90 Billion Market-Cap Evaporation: Academic Controversy Meets Market Panic
The timing of this academic dispute could hardly be more delicate. Following Google’s official blog announcement of TurboQuant on March 24, global memory-chip stocks suffered severe sell-offs. Per reports from CNBC and others, Micron declined for six consecutive trading days, falling over 20% cumulatively; SanDisk dropped 11% in a single day; SK Hynix fell ~6%; Samsung Electronics declined nearly 5%; and Kioxia dropped ~6%. The market’s logic was blunt: If software compression reduces AI inference memory demand sixfold, long-term demand for memory chips faces structural downward pressure.
On March 26, Morgan Stanley analyst Joseph Moore challenged this logic in a research note, maintaining “Overweight” ratings for Micron and SanDisk. Moore emphasized that TurboQuant compresses only the KV Cache—a specific cache type—not overall memory usage—and characterized it as a “normal productivity improvement.” Similarly, Wells Fargo analyst Andrew Rocha invoked Jevons’ Paradox, arguing that efficiency gains lowering costs may instead stimulate broader AI deployment—ultimately increasing memory demand.
Old Paper, New Packaging: Systemic Risks in the Research-to-Market Narrative Chain
As tech blogger Ben Pouladian analyzed, the TurboQuant paper was publicly available since April 2025—not new research. Google’s March 24 blog post repackaged and re-promoted it, yet markets priced it as a breakthrough. This “old-paper, new-release” strategy—combined with potential experimental bias in the paper—highlights systemic risks in how AI research propagates from academic papers to market narratives.
For investors in AI infrastructure, when a paper claims “orders-of-magnitude” performance gains, the first question should be: Are the benchmarking conditions fair?
Gao’s team has clearly stated their intent to pursue formal resolution. Google has not yet issued an official response addressing the specific allegations raised in the open letter.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News













