China’s AI Computing Power Counteroffensive

2026.03.04

China’s AI Computing Power Counteroffensive

The cost itself is progress.

2026.03.04 - 10:19:14

AI算力

Navigating Web3 tides with focused insights

The cost itself is progress.

By Sleepy.txt

Eight years ago, ZTE suffered cardiac arrest.

On April 16, 2018, a single export ban issued by the U.S. Department of Commerce’s Bureau of Industry and Security (BIS) brought ZTE—a global top-four telecom equipment vendor with 80,000 employees and annual revenue exceeding RMB 100 billion—to an immediate standstill. The ban was straightforward: for seven years, no U.S. company could sell components, goods, software, or technology to ZTE.

Without Qualcomm chips, base stations halted production. Without Google’s Android licensing, smartphones had no usable operating system. Twenty-three days later, ZTE announced in a public statement that its core business operations had become impossible.

ZTE ultimately survived—but at a cost of $1.4 billion.

A $1 billion fine, paid in full upfront; and a $400 million deposit placed into an escrow account held by a U.S. bank. In addition, all senior executives were replaced, and a U.S.-appointed compliance oversight team was installed on-site. For the entire year of 2018, ZTE posted a net loss of RMB 7 billion, with revenue plunging 21.4% year-on-year.

Yin Yimin, then Chairman of ZTE, wrote in an internal memo: “We operate in a complex industry highly dependent on global supply chains.” At the time, those words sounded like both reflection and resignation.

Eight years later, on February 26, 2026, Chinese AI unicorn DeepSeek announced that its upcoming V4 multimodal large model would prioritize deep collaboration with domestic chipmakers—achieving, for the first time, a fully non-NVIDIA solution across the entire workflow from pretraining to fine-tuning.

In plain terms: “We’re no longer using NVIDIA.”

The market’s immediate reaction was skepticism. NVIDIA commands over 90% of the global AI training chip market. Is abandoning it commercially rational?

Yet behind DeepSeek’s decision lies a question far larger than commercial logic: What kind of computational independence does China’s AI industry truly need?

What Exactly Is Being Choked?

Many assume chip bans target hardware alone. But what truly suffocates Chinese AI companies is something called CUDA.

CUDA—short for Compute Unified Device Architecture—is a parallel computing platform and programming model launched by NVIDIA in 2006. It enables developers to directly harness the raw computational power of NVIDIA GPUs to accelerate complex computing tasks.

Before the AI era, CUDA was merely a niche tool for a handful of elite engineers. But as deep learning surged, CUDA became the foundational bedrock of the entire AI industry.

Training large AI models essentially involves massive matrix operations—the very workload GPUs excel at.

Leveraging a head start of more than a decade, NVIDIA built, via CUDA, a complete end-to-end toolchain—from underlying hardware to upper-layer applications—for AI developers worldwide. Today, every major AI framework—including Google’s TensorFlow and Meta’s PyTorch—has deep底层 integration with CUDA.

An AI PhD student begins learning, coding, and running experiments in a CUDA environment on day one of graduate school. Every line of code they write reinforces NVIDIA’s moat.

As of 2025, the CUDA ecosystem has attracted over 4.5 million developers, supports more than 3,000 GPU-accelerated applications, and serves over 40,000 enterprises globally. This figure means more than 90% of AI developers worldwide are locked into NVIDIA’s ecosystem.

CUDA’s real danger lies in its flywheel effect: the more developers adopt it, the more tools, libraries, and code get created—and the richer the ecosystem grows; the richer the ecosystem grows, the more new developers it attracts. Once set in motion, this flywheel becomes nearly impossible to stop.

The result? NVIDIA sells you the most expensive shovel—and defines the only permissible way to dig. Want to switch shovels? Sure. But first, you must rewrite *all* the accumulated experience, tools, and code built over the past decade-plus by hundreds of thousands of the world’s brightest minds—all written for that one digging posture.

Who bears that cost?

So when BIS enacted its first round of controls on October 7, 2022—restricting exports of NVIDIA’s A100 and H100 chips to China—Chinese AI firms collectively felt the same suffocation ZTE once experienced. NVIDIA responded by launching “China-specific” versions—A800 and H800—with reduced inter-chip bandwidth, barely sustaining supply.

But just one year later, on October 17, 2023, a second round of restrictions tightened further: A800 and H800 were also banned, and 13 Chinese companies were added to the Entity List. NVIDIA was forced to launch the further-downgraded H20. By December 2024—during the final months of the Biden administration—a third round of controls took effect, imposing strict limits even on H20 exports.

Three rounds of controls—each layer tightening the noose.

Yet this time, the story diverges sharply from ZTE’s.

An Asymmetric Breakthrough

Under the ban, everyone assumed China’s large-model ambitions would collapse.

They were wrong. Faced with restrictions, Chinese companies did not confront the blockade head-on. Instead, they launched a breakthrough campaign—one whose first battlefield was not chips, but algorithms.

From late 2024 through 2025, Chinese AI firms collectively pivoted toward one technical direction: Mixture-of-Experts (MoE) models.

Simply put: split a massive model into many smaller “experts,” activating only the most relevant subset for each task—not the entire model.

DeepSeek’s V3 epitomizes this approach. With 671 billion parameters, it activates only 37 billion—just 5.5%—per inference. Its training required 2,048 NVIDIA H800 GPUs over 58 days, costing $5.576 million total. By comparison, estimates for GPT-4’s training cost hover around $78 million—a difference of one order of magnitude.

This algorithmic optimization translated directly into pricing. DeepSeek’s API costs just $0.028–$0.28 per million input tokens and $0.42 for output tokens. GPT-4o charges $5 for input and $15 for output. Claude Opus is even pricier: $15 for input, $75 for output. Converted, DeepSeek is 25–75 times cheaper than Claude.

This price gap triggered massive global developer response. In February 2026, on OpenRouter—the world’s largest AI model API aggregation platform—weekly call volume for Chinese AI models surged 127% in three weeks, surpassing U.S. models for the first time. Just one year earlier, Chinese models accounted for less than 2% of OpenRouter traffic. One year later, their share grew 421%, approaching 60%.

Behind these figures lies an easily overlooked structural shift. Since the second half of 2025, the dominant AI application scenario has shifted from chat to Agents. In Agent use cases, token consumption per task is 10–100× higher than simple chat. As token usage explodes exponentially, price becomes decisive—and Chinese models’ extreme cost-efficiency perfectly hit this inflection point.

But here’s the problem: reducing inference costs doesn’t resolve the fundamental challenge of training. If a large model cannot continuously train and iterate on fresh data, its capabilities rapidly degrade. And training remains the unavoidable computational black hole.

So where do we source the “shovels” for training?

From Backup to Primary

Xinghua, Jiangsu—a small city in central Jiangsu province—was known for stainless steel and health foods, with zero prior connection to AI. Yet in 2025, a 148-meter-long domestic AI compute server production line opened there, going from contract signing to full operation in just 180 days.

At the heart of this line are two fully domestic chips: the LoongArch-based Loongson 3C6000 CPU and the TaiChu YuanQi T100 AI accelerator card. The Loongson 3C6000 features entirely self-developed instruction set architecture and microarchitecture. TaiChu YuanQi emerged from the National Supercomputing Center in Wuxi and Tsinghua University, adopting a heterogeneous many-core architecture.

At full capacity, the line produces one server every five minutes. Total investment stood at RMB 1.1 billion, with an expected annual output of 100,000 units.

More importantly, mega-clusters composed of these domestic chips have begun handling genuine large-model training workloads.

In January 2026, Zhipu AI and Huawei jointly released GLM-Image—the first state-of-the-art image-generation model trained end-to-end exclusively on domestic chips. In February, China Telecom’s trillion-parameter “Xingchen” (Stellar) large model completed full-cycle training on a domestically built 10,000-GPU cluster in Shanghai Lingang.

These milestones prove one thing: domestic chips have crossed the threshold—from “capable of inference” to “capable of training.” That’s a qualitative leap. Inference only runs already-trained models, placing relatively modest demands on chips. Training, however, requires processing massive datasets, performing complex gradient computations and parameter updates—demanding orders-of-magnitude higher chip performance, interconnect bandwidth, and software ecosystem maturity.

The core enabler behind these efforts is Huawei’s Ascend series. As of end-2025, the Ascend ecosystem had surpassed 4 million developers, over 3,000 partners, and supported pretraining of 43 mainstream industry large models and adaptation of more than 200 open-source models. At MWC on March 2, 2026, Huawei unveiled its next-generation computing infrastructure, SuperPoD, to international markets.

Ascend 910B’s FP16 performance now matches NVIDIA’s A100. Gaps remain, but the chips have evolved from “non-functional” to “functional,” and from “functional” toward “user-friendly.” Ecosystem development cannot wait for perfect chips—it must scale massively during the “good-enough” phase, driven by real-world business needs to push iterative improvements in both hardware and software. ByteDance, Tencent, and Baidu have all doubled their 2026 procurement targets for domestic AI servers versus 2025. According to MIIT data, China’s intelligent computing capacity has reached 1,590 EFLOPS. 2026 is becoming the inaugural year for large-scale deployment of domestic AI compute.

U.S. Power Shortages and China’s Global Expansion

In early 2026, Virginia—the U.S. state hosting massive volumes of global data center traffic—paused approvals for new data center projects. Georgia followed suit, extending its pause through 2027. Illinois and Michigan soon imposed similar restrictions.

According to the International Energy Agency, U.S. data centers consumed 183 terawatt-hours (TWh) of electricity in 2024—roughly 4% of national electricity use. By 2030, that figure is projected to double to 426 TWh, representing over 12% of total demand. Arm CEO even forecasts AI data centers will consume 20–25% of U.S. electricity by 2030.

America’s grid is buckling. PJM Interconnection—the grid serving 13 eastern states—faces a 6 GW capacity shortfall. By 2033, the U.S. faces a nationwide 175 GW power capacity deficit—equivalent to electricity demand from 130 million households. Wholesale power prices in data-center-dense regions have surged 267% over the past five years.

The endpoint of computing power is energy. On this dimension, the Sino-U.S. gap dwarfs the chip gap—though reversed in direction.

China’s annual electricity generation stands at 10.4 trillion kWh; the U.S. generates 4.2 trillion kWh—just 40% of China’s total. Crucially, residential electricity consumption accounts for only 15% of China’s total, versus 36% in the U.S. This implies China possesses vastly greater industrial electricity surplus available for compute infrastructure.

On electricity pricing: AI-company clusters in the U.S. pay $0.12–$0.15 per kWh, while industrial rates in western China sit near $0.03—just one-quarter to one-fifth of U.S. levels.

China’s incremental power generation now exceeds the U.S.’s by a factor of seven.

While America wrestles with power shortages, China’s AI sector is quietly expanding overseas—not with products or factories, but with tokens.

Token—the smallest unit of information processed by AI models—is emerging as a new digital commodity. Produced in China’s compute factories, it flows globally via undersea fiber-optic cables.

DeepSeek’s user distribution tells the story: 30.7% domestic (China), 13.6% India, 6.9% Indonesia, 4.3% U.S., 3.2% France. It supports 37 languages and enjoys broad adoption across emerging markets like Brazil. Globally, 26,000 enterprises have activated accounts, and 3,200 institutions have deployed its enterprise edition.

In 2025, 58% of new AI startups incorporated DeepSeek into their tech stacks. Within China, DeepSeek commands 89% market share. In other sanctioned countries, its share ranges between 40% and 60%.

This landscape eerily echoes another battle over industrial autonomy—four decades earlier.

In 1986, Tokyo signed the U.S.-Japan Semiconductor Agreement under intense American pressure. Its core provisions included: opening Japan’s semiconductor market so U.S. chipmakers captured ≥20% market share; banning Japanese semiconductors from exporting below cost; and imposing a 100% punitive tariff on $300 million worth of Japanese chips. Simultaneously, the U.S. blocked Fujitsu’s acquisition of Fairchild Semiconductor.

That year, Japan’s semiconductor industry stood at its zenith. By 1988, Japan commanded 51% of the global semiconductor market, versus 36.8% for the U.S. Six of the world’s top ten semiconductor firms were Japanese: NEC (#2), Toshiba (#3), Hitachi (#5), Fujitsu (#7), Mitsubishi (#8), and Panasonic (#9). In 1985, Intel lost $173 million in the U.S.-Japan semiconductor war and teetered on bankruptcy.

But everything changed after the agreement.

Using Section 301 investigations and other tools, the U.S. launched comprehensive suppression against Japanese semiconductor firms. Simultaneously, it backed South Korea’s Samsung and SK Hynix, enabling them to undercut Japanese pricing. Japan’s DRAM market share plummeted from 80% to 10%. By 2017, Japan’s IC market share had shrunk to just 7%. Once-dominant giants were either broken up, acquired, or faded into obscurity amid endless losses.

Japan’s semiconductor tragedy lay in its contentment with being the world’s finest producer within a globally fragmented system dominated by a single external power—never aspiring to build its own independent ecosystem. When the tide receded, it realized it possessed nothing beyond manufacturing itself.

Today, China’s AI industry stands at a similarly pivotal, yet fundamentally distinct, crossroads.

Similar: We face immense external pressure—three rounds of chip controls, escalating in severity; and the CUDA ecosystem’s walls remain towering.

Different: This time, we’ve chosen the harder path—algorithmic optimization to the extreme; domestic chips advancing from inference to training; the Ascend ecosystem amassing 4 million developers; and token-based global expansion penetrating worldwide markets. Each step builds an independent industrial ecosystem Japan never possessed.

Epilogue

On February 27, 2026, three earnings bulletins from domestic AI chip firms were released simultaneously.

Cambricon reported revenue growth of 453% and achieved its first-ever annual profitability. Moore Threads saw revenue rise 243%, but posted a net loss of RMB 1 billion. MXChip recorded 121% revenue growth, with a net loss nearing RMB 800 million.

Half fire, half sea.

Fire represents extreme market hunger. The 95% void left by Jensen Huang is being filled, inch by inch, by these domestic firms’ revenue numbers. Regardless of performance or ecosystem maturity, the market demands a second choice beyond NVIDIA—a once-in-a-lifetime structural opportunity torn open by geopolitics.

Sea represents the enormous cost of ecosystem building. Every dollar of loss is real money spent to catch up with the CUDA ecosystem: R&D investment, software subsidies, and the engineering manpower dispatched onsite to debug compilation issues one customer at a time. These losses aren’t signs of mismanagement—they’re the “war tax” paid to build an independent ecosystem.

These three financial reports document the true nature of this compute war more honestly than any industry white paper. It is not a triumphant march forward—but a brutal, bloody, advance-while-bleeding trench warfare.

Yet the nature of the war has indeed changed. Eight years ago, the question was: “Can we survive?” Today, it is: “How much must we pay to survive?”

The cost itself is progress.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

BlockBeats

China’s AI Computing Power Counteroffensive

TechFlow Selected TechFlow Selected