
Why CZ Is Bullish on Vana for Building Better AI
TechFlow Selected TechFlow Selected

Why CZ Is Bullish on Vana for Building Better AI
Revealing how Vana becomes the key infrastructure of the AI ecosystem.
Author: Jesse, Core Contributor at Biteye
Editor: Crush, Core Contributor at Biteye
* Approximately 6,000 words, estimated reading time: 12 minutes
One month ago, YZi Labs announced its investment in Vana, with Binance founder CZ joining as an advisor, cementing Vana’s position as a leader in the AI data sector. Four days later, during an AMA with Vana, CZ stated that data is the core fuel of AI—public data has been exhausted, while private data remains untapped—and expressed strong confidence in Vana's product-market fit (PMF) and user growth.
Why are YZi Labs, Coinbase Ventures, and Paradigm all investing in Vana? Why does CZ believe in Vana’s future?
This report systematically analyzes the challenges facing AI data, Vana’s core value proposition, real-world applications, and growth trajectory, revealing how Vana is becoming critical infrastructure within the AI ecosystem.

01 AI and the Data Dilemma: Breaking Down Closed Barriers
According to PitchBook data, the U.S. AI industry attracted nearly $20 billion in investments in Q1 2025. By 2024, AI startups accounted for one-third of global venture capital, totaling $131.5 billion, with nearly a quarter of new startups focused on AI. Statista data further confirms this explosive growth trend—the amount of VC funding in AI and machine learning surged from $670 million in 2011 to $36 billion in 2020, more than a 50-fold increase. This clearly shows that AI has become the preferred domain for smart capital and top entrepreneurs.
However, AI’s fundamental architecture—"data + model + computing power"—is now hitting structural bottlenecks. The key driver of AI model performance isn’t superior computing power or algorithmic breakthroughs, but rather the quality and scale of training datasets. Large language models have now reached a tipping point where available training data is running dry. Meta’s Llama 3 was trained on approximately 15 trillion tokens—an amount that has nearly exhausted all high-quality, publicly accessible internet data. While public internet data may seem vast, it represents only the tip of the iceberg. A crucial fact often overlooked by the market is that most high-value data is locked inside private systems requiring authorized access. Public web data accounts for less than 0.1% of all existing data. Solving this problem exceeds the AI industry’s internal capabilities—it requires blockchain technology to reshape data ownership relationships and establish new incentive mechanisms that catalyze the large-scale emergence of high-quality data.
On the other hand, the vast majority of today’s data is controlled within the closed ecosystems of Web2 tech giants. AI development is encountering a "data wall," precisely because these companies fully understand the immense value of their data. High-performing AI models yield significant economic returns—for example, OpenAI generates about $3.4 billion annually. Building top-tier AI models demands massive amounts of data, which often comes at a steep cost.
For instance, Reddit earns around $200 million per year selling data; PhotoBucket sells image data for $1–$2 per picture; Apple’s news data deals reach up to $50 million. Data ownership has evolved from a mere privacy preference into a major economic issue. In a world where AI models drive most of the economy, owning data is equivalent to holding equity in future AI models.
As data commercialization becomes increasingly common, access to data is also growing harder. Many platforms are adjusting their terms of service and API policies to restrict external developers’ access. For example, both Reddit and Stack Overflow have revised their API rules, making data acquisition significantly more difficult. This trend is expanding—platforms holding valuable data are gradually closing off.
Yet, there remains one group with unrestricted access to this data: users themselves. Most people don’t realize that legally, they retain full ownership of their own data. Just as parking your car doesn’t give the parking lot the right to use or sell it, storing your data on social media platforms doesn’t strip you of ownership.
When registering, users typically check a box saying “allow the platform to use my data.” This merely grants the platform limited rights to operate services using the data—it doesn’t mean users relinquish ownership.
In fact, users can always request to export their own data. Even if platforms tightly restrict API access for developers, individual users still have the legal right to retrieve their personal data. For example, Instagram allows users to download their account data—including posted photos, comments, and even AI-generated marketing tags. On 23andMe, users can request to export their genetic data, though the platform rarely reminds them of this option, and the process may not be intuitive.
Globally, regulations are continuously improving to ensure users can reclaim their data smoothly. In today’s data-driven economy, users must recognize their ownership rights over personal data and actively exercise them.
02 Core Concepts of VANA
Tech companies are protecting their valuable data assets by building closed systems. VANA’s core mission is to unlock data trapped within these closed ecosystems and return control to users, enabling true data sovereignty.
In other words, every user can extract their own data from various platforms and reconstruct a dataset that is higher quality and more personalized than anything any single platform could offer.
The VANA framework rests on two foundational concepts:
-
Non-Custodial Data: This means users can manage access to their data just like managing personal funds. Similar to using a digital wallet for crypto assets, in the VANA ecosystem, users employ wallets to control how their data is used. By signing transactions, users authorize applications to access their data and determine its specific usage, ensuring autonomy and security.
-
Proof of Contribution: While a single data point holds little value, aggregated user data increases exponentially in value. The Proof of Contribution mechanism ensures high standards for data pool quality while creating pathways for contributors to receive value in return.
When developers pay to access data, contributors receive governance token allocations proportional to their contributions. This mechanism enables data providers to earn ongoing economic rewards from data utilization and grants them meaningful governance rights—directly participating in setting data usage rules and decisions.
By incentivizing high-quality data contributions, this system is reshaping data market pricing and operational efficiency, laying the foundation for a decentralized data economy.
03 VANA Ecosystem Applications
3.1 DataDAO
DataDAO is VANA’s decentralized data marketplace, allowing users to contribute, tokenize, and apply data. Users can choose suitable data pools based on data type (e.g., fitness data, research data). Contributed data undergoes quality and value validation via Vana’s Proof of Contribution mechanism, ensuring fair compensation for contributors.
After verification, data is tokenized into digital assets usable for trading or AI training, while contributors retain control over usage. Each time data is used, contributors receive token rewards and governance rights, enabling economic benefits and influence over the direction of the data pool. By aggregating diverse data sources, DataDAO creates a liquid data market, enabling secure and efficient data circulation within the Vana ecosystem.
The core of DataDAO is the Data Liquidity Pool (DLP)—a verified dataset linked to tokens. DLPs are managed and governed by DataDAO members. Each DLP clearly defines its data structure and contribution criteria; for example, Sleep.com, a sleep data DAO, established a clear data schema to ensure all on-chain data is structured and usable. Data value lies not just in volume, but in structure and usability.
DataDAO places strong emphasis on data authenticity and validity. Most current DataDAOs use Trusted Execution Environments (TEE) to run Python code for data validation—ensuring quality without compromising privacy. For instance, Amazon DataDAO uses browser extensions to generate proofs of data quality. All DataDAOs publicly disclose their Proof of Contribution, so users know exactly how data quality is assured.
The top 16 DLPs in the VANA ecosystem receive additional incentives, allowing users to earn rewards by contributing high-quality data. Rewards are distributed based on metrics such as data access volume, quality, and cost savings. Currently, the Reddit DataDAO is the largest, attracting around 140,000 users and successfully training a community-owned AI model. DataDAO by DLPLabs allows drivers to connect their DIMO_Network accounts and earn rewards by sharing data to advance automotive AI innovation. 23andWE aims to acquire 23andMe to prevent genetic data from being sold off.
DataDAO represents a revolutionary approach to data management—empowering individuals to own their data and monetize it through tokenization. This ecosystem is rapidly evolving, offering more open and democratic possibilities for data governance and AI training.
3.2 DataFi
Building on data liquidity pools, DeFi is gradually extending into the realm of data tokens. Data Liquidity Pools serve as the foundational layer of the entire ecosystem, upon which various DeFi applications can be built using data tokens.
Early-stage applications already exist in the DataFi ecosystem. For example, decentralized exchanges @VanaDataDex and @flur_protocol allow users to trade data tokens and track market dynamics for specific data tokens. These platforms promote free circulation of data assets and make the data market more dynamic.
Notably, most current DLP reward mechanisms deposit rewards into the DLP treasury without directly burning data tokens or affecting supply-demand balance. However, with the rollout of VRC-13 updates, this is changing. The new model introduces a more market-driven approach: rewarding VANA to encourage data tokenization, then injecting those tokens into DEX pools to boost trading activity and further activate the DeFi ecosystem.
Looking ahead, functionalities common in traditional DeFi—such as lending, staking, liquidity mining, and even insurance—could be introduced into the data token market, unlocking entirely new application scenarios.
From a traditional Web2 perspective, just as corporations hedge price volatility through oil futures, data markets might develop data futures—enabling users to lock in future prices for datasets and reduce uncertainty in acquisition costs.
Some trading firms already treat data as a new asset class, studying methods to evaluate market value—including assessing specific data token valuations, probability of usage, and lifecycle—all of which directly impact data token pricing and market liquidity, leaving ample room for innovation.
3.3 More Convenient Data Access
Currently, accessing datasets on the mainnet remains relatively cumbersome. Users must submit detailed requests specifying their needs, payment amounts, and intended code, then wait for approval before gaining access. While this ensures transparency and compliance, it adds friction to the process.
To improve efficiency, Vana is developing faster data access methods, enabling automatic API access and direct retrieval across multiple DataDAOs. For example, users could soon combine sleep data with Coinbase or Binance transaction data to analyze sleep patterns among holders of specific projects, uncovering novel market insights.
Additionally, Vana is advancing a proposal to burn data tokens and VANA in an 80-20 ratio to gain data access rights.
Vana is also launching a new data query interface that greatly simplifies access. Users log in via wallet, authenticate identity, and generate digital signatures to prove access rights. Since Data Liquidity Pools record data formats, users can clearly understand the data structure and use SQL queries to retrieve needed information. During this process, users may first receive synthetic data for testing, ensuring query accuracy. When dealing with real data, all computations occur within TEEs to guarantee data security. This mechanism effectively prevents the “double-spending of data” problem—stopping buyers from reselling purchased data—and protects the economic value of data, ensuring sustainable market development.
04 Value Analysis of Vana
Data is rapidly becoming the core asset of the digital age. While data collection and storage technologies are mature, the real challenge lies in effectively evaluating data quality, maximizing value, and safeguarding privacy. Vana solves this ingeniously through innovative incentives: users can stake Vana tokens to support high-value DataDAOs and earn corresponding rewards, creating a positive feedback loop.
4.1 Overcoming the “Data Wall”
AI development has hit a “data wall”—high-quality public data resources are nearly depleted. The next leap in AI will depend on how effectively we can access and utilize high-quality private data, such as personal health records, smart device usage logs, and Tesla driving videos—potential yet untapped training resources.
There’s a paradox in data value: data retains value due to its privacy; once widely available, it becomes commoditized and devalued. Just as AI models themselves are undergoing commoditization, long-term competitive advantage will come from controlling unique datasets that enable superior performance in specific domains. Once data goes public, price competition emerges immediately, and value plummets.
Vana’s DataDAO leverages TEE to transfer value from high-quality private data while preserving privacy. This breakthrough expands the scope of valuable data assets beyond limited public data into the vast realm of private data, opening new frontiers for AI advancement.
4.2 The Unique Curve of Data Value
Data value follows a distinctive curve: a single data point is nearly worthless, but when volume reaches critical mass, value grows exponentially. This poses a major challenge for data financialization—collective returns only materialize after sufficient aggregation.
Vana’s DataDAO offers an innovative solution. By pooling similar data, DataDAO gives contributors collective bargaining power. Take Tesla owners: if all share driving data via a DataDAO, they gain strong pricing leverage over any buyer. In contrast, if each owner independently sells data, price competition ensues—buyers simply acquire samples from the lowest bidders.
Structured, verified high-quality datasets (like authenticated Tesla driving logs) are extremely valuable in the market. Vana’s organizational framework enables this value to be fully realized.
4.3 Breakthrough in Cross-Platform Data Aggregation
The greatest strength of DataDAO is its ability to aggregate data across platforms—a near-impossibility in today’s closed ecosystems. Imagine a researcher needing access to the same user’s Facebook messages, iMessage history, and Google Docs content. Traditionally, this would require cooperation among Facebook, Apple, and Google. But these platforms lack incentive to integrate user data (which weakens their data moats) and face regulatory barriers preventing such sharing.
DataDAO bypasses this obstacle through user-led data integration, unlocking cross-platform data value and enabling unprecedented possibilities for AI training and research.
4.4 A New Model of Economic Participation
Vana’s vision extends far beyond technical innovation—it pioneers a new economic participation model. Here, users can join the digital economy without traditional capital—they already possess the most valuable resource: their personal data. Users don’t need money; they just need to share data. That *is* their capital. DataDAO provides Web3 users with passive income streams based on their unique personal data, lowering the barrier to entry into the digital economy.
4.5 Reshaping AI Profit Distribution
This model could fundamentally transform how the benefits of AI progress are distributed. Instead of value flowing primarily to big tech firms, Vana enables broad participation in the AI economy through data ownership and governance. Early signals show strong resonance—over 300 DataDAOs are already under development on testnet.
Looking ahead 3–5 years, we may witness the emergence of a fully user-owned AI model powered by 100 million contributors, potentially outperforming today’s leading centralized models. Owned entirely by users, such a model would foster deeper engagement and stronger connections. Data sovereignty empowers users to selectively support ethical models and reject unethical companies from using their data.
Decentralized AI offers a more democratic framework—one where society collectively decides what AI should learn and believe, rather than leaving it to a few corporations. User ownership of data means not just economic rights, but real control over AI behavior—such as addressing issues like model censorship.
05 Conclusion
Commercially, Vana aims to build a complete data value chain covering data aggregation, AI model training, and data sales. Today’s data market is monopolized by a few platforms and data brokers. Vana seeks to correct this inefficiency and create a fairer data trading ecosystem.
Vana is more than just another platform—it represents a fundamental shift in data ownership and the way AI evolves. By enabling users to participate in collective value creation while retaining sovereignty over their data, Vana is laying the groundwork for a fairer, more innovative AI future.
In today’s AI landscape filled with hype, Vana stands out with its innovative mechanisms that directly address core industry pain points—and is poised to become a pivotal force shaping the future of AI.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News














