
The Next AI Earthquake: Why the Real Danger Isn’t the SaaS Killer—but the Compute Revolution
TechFlow Selected TechFlow Selected

The Next AI Earthquake: Why the Real Danger Isn’t the SaaS Killer—but the Compute Revolution
This revolution could end the grand party hosted by AI’s “shovel sellers” much earlier than anyone anticipated.
By Bruce
Recently, the entire tech and investment communities have been fixated on one thing: how AI applications are “killing” traditional SaaS. Ever since @AnthropicAI’s Claude Cowork demonstrated how effortlessly it could draft emails, create PowerPoint presentations, and analyze Excel spreadsheets, a wave of panic over the phrase “software is dead” has begun spreading. It’s genuinely alarming—but if your attention stops there, you’re likely missing the real tectonic shift.
It’s like everyone looking up at drone dogfights in the sky while no one notices the entire continental plate beneath our feet quietly shifting. The true storm lies beneath the surface—in a corner most people can’t see: the computational foundation underpinning the entire AI world is undergoing a “silent revolution.”
And this revolution may bring to an end—far sooner than anyone imagined—the grand party NVIDIA @nvidia so carefully orchestrated for itself as AI’s indispensable “shovel-seller.”
Two Converging Revolutions
This revolution isn’t a single event but rather the convergence of two seemingly independent technological trajectories—like two armies executing a pincer movement against NVIDIA’s GPU dominance.
The first path is the algorithmic “slimming” revolution.
Have you ever wondered whether a super-intelligent brain truly needs to activate all its neurons to solve a problem? Clearly not. DeepSeek figured this out—and pioneered the Mixture-of-Experts (MoE) architecture.
Think of it like a company employing hundreds of specialists across different domains. But each time it holds a meeting to tackle a specific problem, only the two or three most relevant experts are invited—not the entire roster brainstorming together. That’s MoE’s brilliance: it activates only a small subset of “experts” within a massive model during each computation, dramatically reducing compute demand.
What’s the result? DeepSeek-V2 nominally boasts 236 billion “experts” (parameters), yet activates only 21 billion per inference—less than 9% of the total. Yet its performance rivals that of GPT-4, which must operate at full capacity. What does this mean? AI capability is decoupling from compute consumption!
Historically, we assumed stronger AI meant more GPUs burned. Now DeepSeek shows us that with smarter algorithms, the same results can be achieved at one-tenth the cost. This directly casts serious doubt on the necessity of NVIDIA GPUs.
The second path is the hardware “lane-switching” revolution.
AI workloads fall into two phases: training and inference. Training resembles schooling—it demands reading vast volumes of data—and here GPUs, with their brute-force parallel computing power, excel. Inference, however, resembles everyday AI usage—where responsiveness matters most.
GPUs suffer from an inherent drawback during inference: their memory (HBM) is external, causing latency as data shuttles back and forth. Imagine a chef whose ingredients are stored in a fridge in the next room—no matter how fast the chef is, each dish requires running back and forth. Companies like Cerebras and Groq took a different approach, designing dedicated inference chips with memory (SRAM) integrated directly onto the die—so ingredients sit right at hand, enabling “zero-latency” access.
The market has already voted with real money. OpenAI publicly criticized NVIDIA’s GPUs for poor inference performance—and then promptly signed a $10 billion deal with Cerebras to lease its inference services. NVIDIA itself panicked, acquiring Groq for $20 billion to avoid falling behind in this new race.
When the Two Paths Converge: A Cost Avalanche
Now consider combining both innovations: run a “slimmed-down” DeepSeek model on a “zero-latency” Cerebras chip.
What happens?
A cost avalanche.
First, the slimmed model is small enough to fit entirely into the chip’s on-die memory. Second, without external memory bottlenecks, AI response speeds become astonishingly fast. The net result? Training costs drop by 90% thanks to MoE architecture; inference costs plummet another order of magnitude due to specialized hardware and sparse computation. Overall, the total cost to train and deploy a world-class AI may shrink to just 10–15% of that required by traditional GPU-based solutions.
This isn’t incremental improvement—it’s a paradigm shift.
NVIDIA’s Throne Is Having Its Carpet Quietly Pulled Out
Now you should understand why this is far more dangerous than the “Cowork panic.”
NVIDIA’s current multi-trillion-dollar market cap rests on a simple narrative: AI is the future—and the future of AI depends entirely on its GPUs. Today, that story’s foundations are cracking.
In the training market—even if NVIDIA retains its monopoly—if customers achieve the same output using one-tenth the number of GPUs, the overall market size could shrink dramatically.
In the inference market—a segment ten times larger than training—NVIDIA holds no decisive advantage and instead faces fierce competition from Google, Cerebras, and others. Even its largest customer, OpenAI, is defecting.
Once Wall Street realizes NVIDIA’s “shovels” are no longer the sole—or even best—option, what happens to valuations built on expectations of “permanent monopoly”? I think we all know the answer.
So the biggest black swan over the next six months may not be some AI application killing off another company—but rather a seemingly minor technical announcement: perhaps a new paper revealing breakthrough MoE algorithmic efficiency, or a report showing surging market share for dedicated inference chips—quietly signaling that the compute war has entered a new phase.
When the “shovel-seller’s” shovel is no longer the only choice, its golden era may well be over.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News













