
Token Factory Economics Is Restructuring the Entire AI Industry
TechFlow Selected TechFlow Selected

Token Factory Economics Is Restructuring the Entire AI Industry
In just two years, the token industry has undergone a stunning reversal—from burning cash and internal competition to oversupply, then to supply shortage and simultaneous increases in both volume and price.
Author: Haishan
From the “cabbage-price” token price war of 2024 to the collective price hikes by Alibaba Cloud, Tencent Cloud, and Baidu Intelligent Cloud in 2026,
the token industry completed, in just two years, a stunning reversal—from money-burning internal competition and overcapacity to supply shortages and simultaneous growth in volume and price.
Since 2026, the A-share AI computing-power sector has surged over 55% cumulatively; leading large-model enterprises such as Moonshot AI and Zhipu AI have each exceeded RMB 1 billion in monthly revenue, with some firms generating more revenue in 20 days than they did throughout all of 2025.
This industrial revolution—termed by Jensen Huang as the “token factory economy”—has long transcended mere technological hype. It is now a deterministic trend driven by explosive real-world demand, structural supply-demand imbalances, and global competition for energy and computing power. Its underlying logic is being fundamentally reconfigured, reshaping the entire AI industry’s rules of engagement—and even upending the world’s foundational operating principles.
01 The “Oil” of the New Era
The essence of this industrial inflection point is AI’s full transition from a “model arms race” to a “token production race.”
Prior to 2024, the industry’s core narrative centered on “whose model has more parameters, whose model is smarter.” Major vendors burned vast sums training large models, capturing market share via free or deeply discounted tokens—even reaching the absurd situation where “selling tokens was less profitable than selling bottled water.”
But the breakout success of OpenClaw (colloquially dubbed “Lobster”)—an intelligent agent—in February 2026 shattered that logic entirely.
Traditional large models operate on a “human-to-AI” single-turn interaction mode, consuming only 1,000–3,000 tokens per dialogue turn. In contrast, agents adopt a cyclical “plan–act–observe–reflect” architecture. Processing a complex task requires dozens or even hundreds of model calls—consuming 100,000 tokens for a mid-complexity task and up to millions for highly complex ones—earning them the industry moniker “token crushers.”
Data from China’s National Data Administration confirms this explosion: China’s daily token call volume soared from 100 billion at the start of 2024 to 140 trillion by March 2026—a more than 1,000-fold increase in two years—and rose another 40% in Q1 2026 alone versus end-2025.
Industry narratives have since pivoted completely: no longer competing on model “IQ ceilings,” but rather on who can produce massive volumes of tokens more cheaply and reliably—and who controls the levers of intelligent supply.
Faced with surging demand, the rigid supply-demand mismatch forms the core underpinning of the sustained strength in token pricing. This imbalance is not short-term volatility—it is a structural contradiction dictated by long-cycle constraints across the entire value chain.
Supply-side bottlenecks are threefold and difficult to overcome:
First, core hardware capacity is monopolized, and expansion cycles are extremely long.
High Bandwidth Memory (HBM) is the “heart” of AI servers. Samsung, SK Hynix, and Micron collectively control over 95% of global HBM capacity, with expansion lead times stretching 24–36 months—resulting in an HBM shortfall exceeding 40% in 2026.
Squeezed by this shortage, standard DDR5 memory prices jumped 300% within six months; 256GB server memory modules now cost over RMB 40,000 per unit; and AI server delivery timelines stretched from three months to twelve.
Second, electricity and energy have become the largest hidden bottleneck. Power density per rack in intelligent computing centers is 10–20 times higher than in traditional data centers. Electricity costs account for over 60% of token production costs—and constructing power infrastructure for large-scale data centers takes 3–5 years. In eastern China, computing-power quotas have become nearly impossible to secure.
Third, infrastructure and operations & maintenance (O&M) capacity cannot keep pace with demand. Liquid-cooled data center penetration rose from 15% in 2024 to 45% in 2026—but there is a severe shortage of technical talent and construction capacity, leaving many already-built computing clusters unable to operate at full capacity.
On the demand side, growth has erupted in a “three-stage rocket” pattern—with exceptionally strong sustainability.
Stage One: Mass adoption of consumer-facing agents. Individual users have shifted from casual chat and entertainment to using AI assistants for email management, coding, and planning—pushing average daily token consumption from dozens to thousands, and eventually toward tens of thousands.
Stage Two: Full-scale deployment of enterprise-grade production applications. Companies no longer view AI as an optional enhancement but instead treat tokens as a core production input. Kunlun Tech and 58.com, for example, consume over one trillion tokens per month. AI transformation across manufacturing, finance, and healthcare is unleashing trillions of tokens in new demand.
Stage Three: Explosive global expansion. Domestic large-model tokens cost just one-fifth to one-third of overseas alternatives like Claude or GPT—allowing them to rapidly capture markets across Southeast Asia, the Middle East, and Latin America. In Q1 2026, Chinese cloud providers’ overseas token revenue surged 320% year-on-year—emerging as a new growth pole.
At a deeper level, tokens are evolving into the foundational commodity of the AI era—reconstructing the entire digital economy’s value system. Just as electricity was the core energy of the industrial age and traffic the core asset of the internet era, tokens are the core production material of the intelligent age—possessing three defining attributes: measurability, priceability, and tradability—serving as the universal value anchor linking computing-power supply and intelligent demand.
This shift has triggered a complete business-model revolution: the industry has moved past the internet-era “burn cash for scale” playbook and entered a new stage of “pay-per-use, profit-driven” economics.
Major players widely adopt a strategy of “subsidizing C-end users to cultivate habits while scaling monetization with B-end clients”: offering time-limited free tokens to individual users while charging enterprise customers precisely based on consumption volume. In Q1 2026, leading cloud providers’ AI business gross margins climbed above 35%—achieving scalable profitability for the first time.
For China, this token-industry revolution presents a historic opportunity to leapfrog global competitors. China possesses the world’s lowest green-energy costs, the most comprehensive computing infrastructure (accounting for over 60% of global server production capacity), the broadest application scenarios, and the most cost-effective large models—fulfilling all prerequisites to become the “World Token Factory.”
Just as China leveraged its cost advantage to become the “World Factory” decades ago, it is now leveraging integrated advantages in energy, computing power, and application scenarios to dominate global token production and supply.
In the near term, supply-demand mismatches will persist through end-2027, keeping token prices elevated and accelerating industry consolidation.
In the long term, as chip capacity ramps up and model efficiency improves, tokens will enter a “cabbage-price” era—penetrating every corner of the national economy and becoming the core engine of digital economic growth.
02 How Are Subsectors Performing?
As the token industry pivots from “low-price internal competition” to “supply-demand tightness,” its subsectors have undergone structural divergence.
A differentiated dynamic has emerged: upstream players controlling pricing, midstream players improving margins, and downstream players realizing monetization. The upstream (computing-power hardware production), midstream (token hub orchestration), and downstream (scenario-specific application deployment) segments each exhibit distinct barriers to entry, levels of industry health, and value-allocation logics.
First, upstream computing-power hardware—the core production capacity of token factories—is a hard necessity under an oligopolistic landscape.
It comprises four key subsegments: AI chips, computing-power servers, liquid cooling systems, and intelligent computing-center operations—an industry dominated by a few major players.
AI chips are the core engines of token production. NVIDIA holds over 90% of the global high-end GPU market.
Meanwhile, domestic A-share leaders are accelerating breakthroughs: Cambricon’s MLU590 chip has achieved mass production and supports both large-model inference and training, with AI-chip revenue up 320% year-on-year in Q1 2026.
Hygon’s DCU products command over 30% penetration in domestic intelligent computing centers, deeply integrated with industry leaders including Sugon and Inspur. Jingjia Micro’s JM9-series GPUs have been deployed across government affairs and financial sectors under China’s IT innovation initiatives, establishing itself as a core supplier of domestically produced general-purpose GPUs.
Computing-power servers serve as the physical carriers of token production capacity. A-share leaders hold half the global market.
Inspur Information remains the world’s top AI-server vendor by market share, with Q1 2026 shipments up 180% year-on-year. Sugon’s liquid-cooled servers lead domestically in market share, providing hardware support for over 80% of China’s national-level intelligent computing centers.
Liquid cooling is a must-have solution for high-power intelligent computing centers, with penetration rising rapidly from 15% in 2024 to 45% in 2026.
Envicool is the undisputed leader in liquid cooling, serving core clients including NVIDIA, Inspur, and Huawei, with liquid-cooling orders up 210% year-on-year in 2026.
Shenling Environment’s liquid-cooling data-center solutions have been deployed across multiple national-level intelligent computing centers, with order growth exceeding 150%.
In intelligent computing-center operations, Baosight Software, Guanghuan New Network, and Runze Intelligent Computing—leveraging prime geographic locations and green-energy resources—have become China’s largest third-party intelligent computing-center operators, with computing-power leasing revenue up over 100% year-on-year in Q1 2026.
Second, midstream token hubs have shifted from price wars to value wars.
Midstream token industry players shoulder the critical functions of computing-power orchestration, model service provision, and standardized token output. Key participants fall into two categories: large-model vendors and cloud-service providers.
Leading A-share large-model vendors have now established clear token commercialization pathways.
For instance, Kunlun Tech’s TianGong large model now processes over 1.2 trillion tokens daily, serving over 120,000 paying B-end customers. Its enterprise-tier token service is priced at just one-quarter of overseas models, with AI-related revenue up 450% year-on-year in Q1 2026.
iFLYTEK’s Spark large model focuses on vertical domains—education, healthcare, and office productivity—with 70% of its token consumption originating from B-end production applications.
Among cloud-service providers, Alibaba Cloud, Tencent Cloud, and ByteDance’s VolcEngine—though not listed on A-shares—are driving strong benefits for related A-share ecosystem players: Yonyou Network and Kingdee International (HKEX) build enterprise-grade AI applications atop Alibaba Cloud, serving as major channels for token consumption.
Finally, downstream application scenarios—the ultimate outlet for token value—are penetrating both the C-end (consumer) and B-end (enterprise) markets.
Downstream segments fall into three categories: C-end personal applications, B-end enterprise services, and vertical-industry digitization—each exhibiting markedly different token-consumption scales and commercialization timetables.
C-end scenarios emphasize inclusivity, focusing on personal AI assistants, content generation, and creative design.
A-share examples include Wondershare’s AI creative software suite (MiaoYing Studio and Wondershare AI Painting), boasting over 5.5 million global paying users. Token consumption rose 320% year-on-year in Q1 2026, with model optimization cutting per-user token costs by 40%.
Colorful Information’s AI-powered email and smart-office assistant services have accumulated over 300 million users, with daily token calls surpassing 50 billion.
B-end enterprise services constitute the primary driver of token consumption—accounting for over 65% of total usage.
For example, Tonghuashun’s AI investment advisory service serves over 100 million retail investors, with daily token calls exceeding 800 billion and AI-related revenue up 190% year-on-year in Q1 2026.
Supcon Technologies’ industrial AI platform delivers intelligent O&M services for chemical and power industries, consuming over 5 million tokens annually per factory.
Runda Medical’s AI-powered medical diagnostic assistance system is deployed across over 3,000 hospitals nationwide, processing over 200 billion medical-text tokens daily.
Overall, vertical-industry B-end scenarios represent the long-term growth pole for the token industry—AI transformation in autonomous driving, intelligent manufacturing, and fintech is unlocking trillions of tokens in demand.
03 Which Stocks Are Riding the Wave?
From an industrial perspective, the token industry has fully shifted from “model contests” to “capacity-and-monetization contests.” Supply-demand mismatches—combined with accelerating commercial-value realization—have positioned six leading A-share companies as core investment candidates across the upstream hardware, midstream models, and downstream applications segments—the most promising names in this trillion-token economy.
First is Inspur Information—the undisputed leader in AI servers and the cornerstone of token-production capacity. As the world’s top AI-server vendor by market share, Inspur serves as the essential hardware backbone powering global token factories. Deeply aligned with NVIDIA, the company secures priority access to high-end GPU allocations—its supply-chain and scale advantages are unmatched.
In Q1 2026, its AI-server shipments surged over 150% year-on-year, pushing its global market share beyond 25%. With nearly RMB 40 billion in undelivered orders—scheduled through end-2027—it stands as the most certain performer across the entire value chain.
Second is Envicool—the liquid-cooling leader and the “cooling heart” of token factories. As intelligent computing-center power density surges, liquid cooling has become a non-negotiable requirement for scaling token production—penetration soaring from 15% in 2024 to 45% in 2026. Envicool’s liquid-cooling revenue rose over 210% year-on-year in Q1 2026, with order visibility extending to 2027—making it the most elastic upstream performer.
Third is Kunlun Tech—the pioneer in large-model commercialization and the benchmark for token monetization. Kunlun Tech is the earliest A-share large-model vendor to achieve scalable token profitability. Its enterprise-tier token service is priced at just one-third to one-quarter of overseas models—rapidly capturing SME markets.
In Q1 2026, its daily token call volume surpassed 1.2 trillion, with over 120,000 paying B-end customers and AI-related revenue up over 450% year-on-year. Gross margin remained above 42%—making it the purest A-share token-monetization play.
Fourth is iFLYTEK—the leader in vertical-domain large models and the core vehicle for industry-specific tokens. iFLYTEK has deep expertise in education, healthcare, and industrial sectors; over 70% of Spark’s token consumption comes from B-end production applications—demand is highly resilient.
Backed by years of domain-specific experience and strong barriers in scenarios and data, the company’s government-and-enterprise customized token-service orders are growing rapidly. AI-related revenue is projected to exceed 60% of total revenue in 2026. As AI penetration deepens across vertical industries, iFLYTEK stands to fully benefit from the long-term token-demand tailwind of digital transformation.
Fifth is Wondershare—the global leader in C-end AI creative tools and the core driver of personal token consumption. Wondershare is the world’s leading C-end AI creative-tool vendor, with over 5.5 million paying users across video editing and AI painting products. After full AI-feature rollout, user willingness to pay and session duration increased significantly—token consumption rose over 320% year-on-year in Q1 2026.
In summary, the current token-driven opportunity is a long-term, demand-led phenomenon. In the short term, investors may prioritize upstream hardware leaders like Inspur Information and Envicool. Mid-term, focus should shift to commercialization benchmarks like Kunlun Tech. Long-term, vertical-scenario leaders like iFLYTEK offer compelling upside. High-quality enterprises stand to reap dual gains—both earnings and valuation uplift—during this high-growth cycle.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














