
NVIDIA Begins Selling the “Pickaxe-Making” Method
TechFlow Selected TechFlow Selected

NVIDIA Begins Selling the “Pickaxe-Making” Method
The person you’re trying to beat is renting you all the tools you need to beat them—annual payments required, with the contract price increasing every year.
Author: Ada, TechFlow
San Francisco, San Jose Convention Center, live at GTC.
Bill Dally, NVIDIA’s Chief Scientist, sat onstage facing Jeff Dean of Google. Midway through their conversation, Dally dropped a number: “Previously, porting a standard-cell library containing roughly 2,500 to 3,000 cells required a team of eight engineers working for about ten months.”
He paused.
“Now it takes just one GPU card—overnight.”
The audience didn’t gasp—because those who understood instantly grasped what this meant. Ten months of work by eight engineers had been compressed into a single night on an in-house GPU. And Dally added: the results matched or even surpassed human-designed layouts across three key metrics—area, power consumption, and latency.
The next day, headlines declared, “NVIDIA Uses AI to Design GPUs.”
But the reality is far more intriguing than that headline suggests.
What Is NVIDIA Running Internally?
What NVIDIA runs internally isn’t a black box—it’s a suite of toolchains refined over several years.
NB-Cell is a reinforcement-learning–based program built specifically for the most labor-intensive task: standard-cell library migration. Prefix RL tackles the long-standing research challenge of placement within carry-lookahead chains. According to Dally, layouts generated by this system are “ones humans could never conceive,” improving key metrics by approximately 20% to 30% versus human designs.
Then there are two internal LLMs: Chip Nemo and Bug Nemo. NVIDIA fed both models the RTL code, architecture documentation, and design specifications from every GPU it has ever built—from the G80 to Blackwell. As Dally described it, this effectively distilled two decades of NVIDIA’s “muscle memory” into an internal model—so new hires gain instant access to the collective expertise of seasoned engineers with twenty years’ experience.
So—can AI design GPUs yet?
Quite the opposite. Dally’s exact words were: “I’d love to someday just say, ‘Design me a new GPU,’ but we’re still far from that point.”
NVIDIA hasn’t used AI to design a GPU. But it has done something else—one that will make the entire industry dependent on it going forward.
$2 Billion Move Into the EDA Heartland
On December 1, 2025, NVIDIA invested $2 billion in Synopsys—one of the “Big Three” EDA companies. The two signed a joint development agreement to embed NVIDIA’s accelerated computing stack across Synopsys’ full EDA workflow. Blackwell and the next-generation Rubin GPU will be deeply integrated with Synopsys.ai.
Synopsys’ position warrants explanation. Virtually every advanced-node chip—including Apple’s M-series, AMD’s MI-series, and Google’s TPUs—is designed using either Synopsys’ or Cadence’s toolchains. Together with Siemens EDA, these three firms dominate the foundational tools for chip design. You can avoid Qualcomm chips; you can skip TSMC’s fabs—but you cannot decouple yourself from these three software suites.
Three months after investing in Synopsys, NVIDIA brought Cadence, Siemens, and Dassault onboard, announcing they were all developing AI-driven chip-design tools powered by NVIDIA GPUs.
The benchmark data NVIDIA released is startling: Synopsys PrimeSim runs 30× faster on Blackwell; Proteus, 20× faster; Sentaurus achieves 12× acceleration on B200 versus CPU-only execution. MediaTek sped up Cadence Spectre 6× using H100s. Astera Labs achieved 3.5× faster chip verification using Synopsys + NVIDIA.
One detail deserves special attention: Cadence’s Millennium M2000 platform is explicitly labeled “built exclusively for the EDA market—and exclusively based on NVIDIA Blackwell.”
That word “exclusively” is telling. Historically, EDA tools ran on CPUs—Intel and AMD processors were equally viable. Going forward, if you want the fastest EDA performance, you’ll need NVIDIA GPUs.
The Real Shape of the Flywheel
Most people envision NVIDIA’s flywheel like this: sell GPUs to AI companies → AI companies train large models → large models prove GPUs indispensable → more customers buy GPUs.
This flywheel is already formidable. But beneath it lies another, deeper layer.
NVIDIA uses its own AI-powered tools to design next-gen GPUs—achieving generational leaps in design efficiency—while simultaneously binding the entire industry’s EDA toolchain to its hardware. Competitors wanting to catch up must rent the very tools needed to chase NVIDIA—from NVIDIA’s ecosystem.
This underlying anxiety lurks behind AMD’s earnings report—the one that sent its stock plummeting. Even though NVIDIA and Synopsys publicly state their investment “carries no obligation to purchase NVIDIA hardware,” the market knows better: accelerated EDA features launch first—and often exclusively—on NVIDIA hardware. AMD and Intel are left relying on a path “optimized for their biggest rival’s platform.”
Imagine an AMD engineer trying to design a chip to compete with Blackwell. He opens Synopsys’ tools—tools that run fastest on NVIDIA GPUs. So he faces a stark choice: endure a design cycle twice as slow—or buy stacks of NVIDIA GPUs to design the chip meant to beat NVIDIA.
The pickaxe is still being sold. But the sales model has changed.
The Real Position of Domestic GPU Startups
At this point, a sobering set of numbers is essential.
In the same fiscal year that NVIDIA’s net profit exceeded $70 billion (FY2025), China’s “Four Little Dragons” of domestic GPUs—Moore Threads, MetaX, Bitmain (Biren), and Enflame—lined up at the IPO gate.
Moore Threads’ prospectus shows cumulative net losses of ¥5 billion from 2022 to 2024, followed by another ¥271 million loss in H1 2025. As of June 30, its accumulated unremedied losses totaled ¥1.478 billion. Management estimates the earliest possible date for consolidated profitability is 2027. MetaX fares slightly better, with cumulative losses exceeding ¥3 billion over three years. Biren is worst off: over ¥6.3 billion lost in three and a half years, with only ¥58.9 million in revenue in H1 2025—less than one-tenth of Moore Threads’同期 ¥702 million.
R&D intensity tells an even starker story. Moore Threads’ R&D expenses accounted for 2,422.51% of its revenue in 2022—and remained at 309.88% in 2024. It spends over three times its revenue on R&D annually—not sustainable business, but life-support via continuous infusion from private markets and the recently reopened STAR Market.
The tooling bottleneck is even tighter. Huada Jiutian’s 2022 IPO prospectus stated its tools only partially support 5nm advanced nodes. Primarius Electronics covers 7nm/5nm/3nm nodes—but only offers point tools, not full-flow solutions.
Huada Jiutian founder Liu Weiping was candid: “Domestic EDA still falls significantly short in supporting advanced processes—especially today’s 7nm, 5nm, and 3nm nodes. Currently, domestic EDA reaches 14nm capability. Though 7nm process technology has been mastered, deep integration of 7nm tools into real-world applications still requires coordinated efforts across the entire industrial chain.”
In other words: full-flow EDA for advanced nodes remains functionally unavailable domestically. Chinese GPU startups still rely on Synopsys and Cadence to design chips. In 2025, Trump briefly announced export controls on all critical software—though never formally enacted, EDA tools for sub-7nm nodes remain under strict U.S. export control. Licensing approvals hang on a foreign government’s decision.
The capital markets’ reaction borders on surreal. On its listing day, MetaX closed at ¥829.90—a 692.95% single-day surge. After listing, Moore Threads’ share price briefly ranked third on China’s A-share market—behind only Kweichow Moutai and Cambricon—with media estimating its market cap at ~¥359.5 billion at the time.
Beneath those numbers lies the real business: a group of companies still burning cash, still dependent on externally controlled toolchains to keep designing chips—yet priced in public markets as heirs to “China’s NVIDIA.”
And the very toolchain those companies use to design chips is becoming part of NVIDIA’s ecosystem. That $2 billion tie-up with Synopsys—and Cadence’s Millennium M2000 label declaring it “exclusively based on NVIDIA Blackwell”—turns the act of catching up into a paradox.
A Complete Chain: From Design to Manufacturing
Back to that GTC dialogue.
Dally remained characteristically humble throughout. “AI is still far from designing chips autonomously”—a line NVIDIA has repeated for four or five years. But each year, the phrasing evolves. Four years ago: “AI assists design.” Three years ago: “AI automates certain steps.” This year: “One night on one GPU replaces ten months’ work by eight engineers.” Each year advances one step—and each year ends with: “We’re still far from the ultimate goal.” Looking back three years later, last year’s “still far” has already been achieved—and the new “still far” now sits beyond every competitor’s reach.
What NVIDIA has done over the past twelve months boils down to one thing: applying AI to the most valuable, highest-moat segments of the chip value chain—and then selling those tools, layer by layer, to the entire industry.
Front-end chip design is taken over by internal LLMs like Chip Nemo; mid-stage tasks—standard-cell library migration, layout optimization—are handled by NB-Cell and Prefix RL; the entire EDA toolchain is locked to NVIDIA GPUs via the $2 billion Synopsys deal and Cadence’s “exclusively based on Blackwell” branding; and manufacturing-side lithography computation is handled by cuLitho—already deployed by TSMC.
From design to manufacturing, NVIDIA has rebuilt every stage with AI—and every stage ultimately leads to the same endpoint: if you want the fastest tools, you must buy NVIDIA GPUs.
For any competitor aiming to build a chip capable of beating Blackwell, the most awkward reality has already arrived. The EDA tools required to design that chip run fastest on NVIDIA GPUs; the lithography computations required to manufacture it rely on NVIDIA’s fastest algorithm libraries; and the compute needed to train the AI that designs it? Still NVIDIA GPUs.
The person you’re trying to beat is renting you every tool you need to beat them. Rent is paid annually—and the contract renews at higher rates each year.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














