
Why Does Baidu Start with an "Operating System" to Build an AI That's "Omnipotent and Ubiquitous"?
TechFlow Selected TechFlow Selected

Why Does Baidu Start with an "Operating System" to Build an AI That's "Omnipotent and Ubiquitous"?
Smart and capable super productivity.
Author: Lafeng de Geek

A large model can map out China's five thousand years of history, yet fail to tell you what time it is now; it can clearly explain quantum mechanics, but struggle to create a professional-grade, illustrated PowerPoint presentation.
Why do large models seem omnipotent in theory, yet fall short in practical use?
The reason is simple: intelligence and knowledge don’t automatically translate into capability.
Intelligence requires large models to be trained on massive datasets, developing highly capable "brains" that excel at answering questions.
To meet both criteria—intelligence and practical ability—we need to equip this smart brain with agile limbs, achieving “deep thinking + deep delivery.”
Thus, advancing large models from intelligent thinking to being both “smart and capable” has become the decisive factor determining whether this wave of large models will be a fleeting trend or a transformative force in history.
Baidu has set an example.
On April 25, at the Create 2025 Baidu AI Developer Conference, Baidu founder Robin Li unveiled Cangzhou OS—the world’s first operating system for the content domain—jointly launched by Baidu Wenku (Baidu Docs) and Baidu Wangpan (Baidu Netdisk).
Leveraging the deep integration of underlying technologies, capabilities, and data accumulated by Baidu Wenku and Baidu Wangpan over the years, Cangzhou OS seamlessly flows across different scenarios, delivering high-quality, end-to-end outputs through intuitive interfaces with minimal barriers to entry.
Powered by Cangzhou OS, Baidu Wenku and Baidu Wangpan envision a future where AI achieves truly one-stop, end-to-end delivery anytime, anywhere, and on any device—making AI “omnipotent and ubiquitous.”
01
Cangzhou OS: Advancing AI Toward Operating System-Level Evolution
In the tech industry, there's a shared understanding: any technology transitioning from lab research to widespread adoption must traverse a long Gartner Hype Cycle.

During the first phase of this cycle, growth is driven by market hype fueled by technological breakthroughs. However, as real-world implementation fails to meet expectations, rapid decline follows—until conditions for practical deployment mature. Only then does the second wave of ecological explosion emerge, when the technology becomes nearly frictionless, omnipotent, and ubiquitous infrastructure.
One hallmark of entering this second stage in software industries is typically the emergence of a mature operating system—such as Windows in PCs or iOS in mobile phones.
So how do we define a mature operating system? About 15 years ago, the global tech community debated: why were Apple smartphones considered entirely different species from earlier feature phones despite sharing similar functions like touchscreens, large displays, calling, photography, music playback, and messaging?
A key reason was that iOS inherited kernel-level stability and multitasking capabilities from Mac OS, turning them into an open ecosystem. Developers could freely integrate Apple’s foundational capabilities to build innovative applications. This transformed the definition of a phone—from something dictated by two giants like Motorola and Nokia—into a vast, infinitely possible industry co-created by an entire ecosystem, thus opening the door to more than a decade of mobile internet evolution.
Technology advances relentlessly, but business narratives often repeat themselves with similar rhythms. The core logic proven in mobile OS development remains equally valid today in building operating systems for the era of large models.
In summary, three elements are essential: comprehensive foundational capabilities, flexible central orchestration, and a thriving application service ecosystem. These align perfectly with the three-layer architecture of Cangzhou OS: infrastructure base, central system, and application services. The only difference is that instead of traditional APIs bridging applications to the core layers, Cangzhou OS uses MCP—a more standardized, lower-barrier interface.

The foundation layer, MCP Server, centers around Chatfile Plus, which uses a knowledge framework to perform element-level disassembly and parsing of multimodal, varied-form, and multi-format content. It includes toolkits for multimodal understanding, retrieval, file transcoding, and analysis.
Meanwhile, Baidu Wenku and Baidu Wangpan have built “three major libraries”: public knowledge base (accumulated public data from Baidu Wenku), private knowledge base (user-authorized data stored in Baidu Wangpan), and memory library (users’ historical instructions, usage habits, and generation records within Wenku or Wangpan).
These data come in various modalities, forms, and formats. The public knowledge base offers general information, while the private knowledge base and memory library store personalized user data.
Within the knowledge framework, Cangzhou OS vectorizes and tags multimodal content from these “three libraries,” converting unstructured data—images, text, video, audio, documents—into multidimensional vector representations (tokens) that computers can understand, using specialized models.
At the central system level, Baidu Wenku and Wangpan developed their own “three engines”: Fusion Editor (for editing documents and PPTs), Reader (for viewing documents and presentations), and Player (for audio-video playback).
In addition, Cangzhou OS features a “Scheduling Hub” that combines interaction components, intent modeling, and transmission infrastructure with user memory and profile data to interpret user intentions via models and efficiently allocate and dispatch Agents.
At the topmost layer lie numerous AI Agents. Cangzhou OS integrates hundreds of AI Agents from Wenku and Wangpan—including PPT creation, AI picture books, AI mind maps, AI posters, AI notes, AI scanning, and AI voice transcription—covering visual, textual, video, and audio modalities. These span learning, office work, daily life, and entertainment scenarios. Leveraging the Fusion Editor’s powerful editing, modification, and fine-tuning capabilities, search results and generated content achieve higher quality and better alignment with personalized task requirements.
02
Building More “Smart and Capable” Agents on Cangzhou OS
Focusing on top-layer application services, Baidu Wenku & Baidu Wangpan not only offer hundreds of AI Agents refined by hundreds of millions of users but also integrate numerous third-party professional Agents to expand the application ecosystem.
As a “one-stop AI content access and creation platform,” Baidu Wenku has over 40 million paying users and 97 million monthly active AI users. Baidu Wangpan has evolved into a “one-stop content service platform,” serving over 1 billion users with total storage exceeding 100 billion GB, and boasting over 80 million monthly active AI users. Together, Baidu Wenku and Baidu Wangpan have become true “super productivity tools” in the age of large models.
At the conference, Baidu Wenku and Baidu Wangpan showcased new capabilities powered by Cangzhou OS: “GenFlow Super Partner” and “AI Notes.”
GenFlow Super Partner is a multi-agent collaboration feature introduced in the Baidu Wenku app. With Cangzhou OS support, content generation supports parallel multitasking and leverages comprehensive online information along with users’ personal habits and preferences to deliver tasks effectively.
For instance, if a user says, “I want to hold an outdoor wedding in Hainan during Labor Day—help me plan it and design invitations,”
The request may seem simple—just filling in a template. But to satisfy the user, the system must understand aesthetic preferences, budget expectations, preferred procedures, weather conditions, crowd levels, and venue availability in Hainan during that period. Then, it needs to combine all this textual and visual knowledge using PPT tools to generate a complete proposal. Finally, based on the plan and the user’s taste, it creates a matching wedding invitation poster.
To accomplish this, the system must retrieve the user’s chat history, browsing records, identify intent, conduct web searches, and leverage PPT tools to analyze user intent, infer preferences, dynamically combine tools, and ultimately deliver a detailed plan including schedule, date, location, budget, theme, execution details, style, and personnel arrangements.
Moreover, the event plan and poster must be consistent, requiring synchronized output from the same operating system in parallel.
Of course, AI won’t always get it right on the first try—both the wedding plan and the poster need to be editable. This capability is enabled by Cangzhou OS’s Fusion Editor.
Clearly, from deep thinking to deep delivery, GenFlow Super Partner is almost the only true “multi-agent collaboration” product currently available in the market. It solves common issues plaguing such products—high cost, long generation time, low efficiency, unstable delivery, and inability to optimize through multiple rounds of dialogue—while being deeply embedded in mature products and integrated with user-authorized private data, giving AI a real chance to achieve “omnipotence and ubiquity.”
Baidu Wangpan’s AI Notes is another powerful assistant for office workers, postgraduate candidates, and civil service exam takers.
AI Notes is the industry’s first multimodal AI note-taking tool, allowing users to embed考研 study videos and note pages stored in Baidu Wangpan into a single interface for seamless interaction. Video content and notes are strongly interlinked, enabling a full-cycle workflow—from watching videos, generating AI notes, summarizing AI mind maps, to AI-generated practice questions for knowledge assessment.
For example, recently, the difficulty of English exams for postgraduate entrance became a hot topic. A user wants focused review on postgraduate English. AI Notes first retrieves relevant materials stored in the user’s Wangpan, cross-references publicly available online resources to identify key topics, and organizes them. But it doesn’t stop there—it further verifies its identified key points against past official exam papers. Only after verification does it proceed to generate mind maps and predict potential test questions, accelerating the user’s learning process.
This process involves no fewer tool calls than planning a wedding. Identifying key topics and retrieving past papers requires robust web search functionality. Since past papers often appear in PDF or even image format, and expert explanations are delivered via video, multimodal content parsing is crucial. Generating final mind maps and predicting questions demands strong reasoning, multimodal content generation, and mapping capabilities from large models—all while ensuring absolute accuracy.
All of this is empowered by Cangzhou OS.
Naturally, Baidu encourages developers to fully embrace MCP. So Cangzhou OS isn't limited to internal Baidu ecosystems. A critical step in growing an OS is openness—unlocking developer innovation.
To maximize the value of its ecosystem and applications, Baidu Wenku and Baidu Wangpan have pioneered the integration of MCP into product and ecosystem connectivity, establishing a three-tiered MCP Server–Client–Host architecture. They expose Wenku and Wangpan capabilities through MCP Servers and provide MCP Client SDKs, making it easy for enterprise users, developers, and intelligent agent applications (as MCP Hosts) to connect.

A representative case is Samsung smartphones. Samsung is integrating multiple MCP servers from Baidu Wenku and Wangpan—including file upload, download, search, sharing, and content understanding.
On one hand, users can directly access cloud backup, sharing, document summarization, and Q&A functions via voice commands in the phone’s assistant interface.
On the other hand, these servers enhance Samsung’s native cloud storage functionality, solving limitations in batch backing up and sharing large or numerous files.
For example, in the phone’s photo gallery, a user activates the voice assistant and says: “Back up yesterday’s photos taken at Olympic Forest Park to Baidu Wangpan, and send Xiao Ming’s photos to him.” The relevant images are uploaded to the user’s authorized Wangpan account, a shareable link is generated, and the assistant accesses the contact list to send the link via SMS. Upon clicking, recipients can directly view or save the files in Baidu Wangpan.
Undoubtedly, the reliability of an OS isn’t measured by piling on tools or showcasing flashy tech. The best benchmark for an OS’s capabilities lies in whether its top-tier application ecosystem is useful, mature, and rich.
03
The Story of OS Has No End
In capital markets, a type of company most valued by investors is known as a “friend of time.”
A “friend of time” refers to a business that, once it gets something right, simply continuing down that path leads to perpetual, engine-like growth—with ecosystem developers benefiting continuously alongside.
An operating system is precisely such a perpetual market. As long as PCs and smartphones remain in demand, the stories of Microsoft, Apple, and Google continue without end.
Large models follow the same principle. When “deep thinking + deep delivery + public/private data + MCP ecosystem” converge, the resulting AI will evolve into an omnipresent, omnipotent force of the new era—sparking continuous bursts of new species, much like the Cambrian explosion.
Looking downward, this means Baidu Wenku, Baidu Wangpan, and others opening up their own capabilities. By actively embracing the ecosystem, they position themselves as creators of new large-model species and architects of new rules.
Looking upward, countless new Agents will emerge and gain visibility on Cangzhou OS, forming a surging, expansive new application service ecosystem.
Right now, this story has only just begun.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News










