How can AI large models and Web3 coexist?

2024.01.22

How can AI large models and Web3 coexist?

Blockchain's specific contributions to large AI models are reflected in "computing power, data, and collaboration."

2024.01.22 - 08:59:04

Navigating Web3 tides with focused insights

Blockchain's specific contributions to large AI models are reflected in "computing power, data, and collaboration."

Author: Tian Hongfei, Lead of "AI+Crypto Studio" at Wanwudao

Large language models (LLMs), the fastest-spreading high-tech innovation in human history, have captured global attention—while yesterday’s darling, web3, faces increasing legal challenges. However, as fundamentally different technologies, neither displaces the other. Mr. Tian Hongfei, lead of Wanwudao’s “AI+Crypto Studio,” will explore critical issues arising in LLM development and how web3 ventures are working to resolve them.

Challenges in the Large Model Industry and How Web3 Can Help Solve Them

It is widely known that after 2015, the internet industry entered an era of oligopolistic dominance, prompting antitrust scrutiny worldwide. The emergence of large models has further strengthened this concentration. Large models consist of three core components: algorithms, computing power (compute), and data:

In algorithms, despite some monopolistic tendencies, open-source efforts, academic research, and public distrust toward big tech help maintain relative openness;
In compute, training costs for large models are prohibitively high, limiting access to only well-funded corporations—effectively placing algorithm production under corporate control;
In data, while training relies on publicly available datasets, these will soon be exhausted due to exponential parameter growth. Future progress thus depends on private data. Although small businesses collectively possess vast amounts of data, it remains fragmented and difficult to utilize—leaving large enterprises with a continued data advantage.

Therefore, centralization in the age of large models is stronger than ever before. The future might be controlled by just a few—or even a single—machine. (Even in decentralized Web3, Vitalik Buterin’s proposed Ethereum endgame envisions block production handled by one massive machine.)

Additionally, OpenAI, the company behind ChatGPT, operates with fewer than two dozen core personnel. For various reasons, ChatGPT's algorithm remains closed-source, and the organization shifted from nonprofit to capped-profit status. As applications built on ChatGPT reshape daily life, even minor model adjustments can profoundly impact humanity. Compared to Google’s “Don’t be evil” ethos, ChatGPT exerts deeper influence over individuals.

Thus, computational trustworthiness of models becomes a crucial issue. While OpenAI may operate without profit motives, concentrated power among a few individuals still poses significant risks. (In contrast, Vitalik’s envisioned Ethereum endgame maintains transparency through easily verifiable outputs—even if block production is centralized.)

Meanwhile, the large model industry currently faces compute shortages, imminent exhaustion of usable training data, and challenges around model sharing. Prior to 2021, AI struggled with data scarcity, with deep learning firms hunting for vertical-specific datasets. Post-LLM, the bottleneck has shifted squarely to compute availability.

‍The development of large models involves several stages: data collection, preprocessing, model training, fine-tuning, deployment, and inference queries. At each stage, blockchain offers contributions toward mitigating excessive centralization in large models.

In data usage: Public data will likely be depleted after 2030; valuable private data must be leveraged under privacy-preserving conditions enabled by blockchain;
In data labeling: Token incentives can drive large-scale community participation in annotation and validation;
During model training: Model sharing and collaborative training enable distributed compute pooling;
During fine-tuning: Token rewards can incentivize community involvement;
During user inference: Blockchain helps protect user data privacy.

Specifically:

‍

1) Scarce Compute Resources

Compute is an essential—and currently the most expensive—input for large models. Startups raising funds often spend up to 80% of capital immediately purchasing GPUs from NVIDIA. Companies building proprietary models must invest at least $50 million to build their own data centers, while smaller startups rely on costly cloud computing services.

However, the sudden surge in demand for large models has far outpaced NVIDIA’s supply capacity. Demand for compute doubles every few months. Between 2012 and 2018, compute needs grew 300,000-fold, with costs rising 31-fold annually.

For Chinese internet companies, U.S. export restrictions on high-end GPUs add another layer of difficulty. In short, the astronomical cost of training is the primary reason large model technology remains under the control of a privileged few.

How can blockchain help alleviate the compute bottleneck?

Large model operations fall into three phases: model training, fine-tuning, and inference queries. While training is notoriously expensive, each model version requires only one-time generation. In practice, most user interactions involve inference. AWS statistics confirm this: 80% of actual compute usage occurs during inference.

Although model training demands high-speed GPU interconnects—making distributed training impractical across networks unless time is traded for cost—inference can run efficiently on individual GPUs. Fine-tuning, which adapts pre-trained models using domain-specific data, also requires significantly less compute than full training.

Consumer GPUs often outperform enterprise-grade ones in graphics rendering and remain idle much of the time. Since projects like SETI@home launched by UC Berkeley in 1999 and Grid Computing in 2000, architectures have existed to harness idle computing resources for massive computational tasks. Before blockchain, such collaborations were limited to scientific or altruistic causes, restricting scalability. With blockchain, token-based incentives now enable broad participation.

Decentralized cloud platforms like Akash have created general-purpose compute networks where users deploy ML models for inference and image rendering. Projects such as Bittensor, Modulus Lab, Giza, and ChainML focus specifically on decentralized inference. Meanwhile, blockchain-AI protocols like Gensyn and open-source generative AI platform Together aim to build decentralized compute networks for large model training.

Challenges: Decentralized compute networks face major hurdles—not only unreliable, low-bandwidth networks and unsynchronized computation states—but also diverse GPU environments, economic incentive design, cheating prevention, proof-of-work mechanisms, security, privacy protection, and anti-spam defenses.

2) Scarce Data and Data Validation

The core algorithm behind large models, Reinforcement Learning from Human Feedback (RLHF), relies on human input to refine models—correcting errors, reducing bias, and filtering harmful content. OpenAI used RLHF to fine-tune GPT-3 into ChatGPT, sourcing experts via Facebook groups and paying Kenyan workers $2/hour. Domain-specific optimization typically requires expert-labeled data—an effort perfectly suited for token-incentivized community participation.

Decentralized Physical Infrastructure Networks (DePINs) use tokens to incentivize individuals to install sensors and share real-time physical-world data for model training. Examples include React (energy usage), DIMO (vehicle telemetry), WeatherXM (weather data), and Hivemapper, which uses token rewards to crowdsource map data and label traffic signs—improving accuracy in RLHF-powered algorithms.

As model parameters grow, public data will be exhausted by 2030. Progress beyond this point will depend on private data—estimated to be ten times larger than public datasets but scattered across individuals and enterprises, often protected by privacy or confidentiality constraints. This creates a paradox: large models need data, yet data holders want model benefits without surrendering control. Blockchain technologies offer solutions.

For open inference models requiring minimal compute, models can be downloaded directly to data sources. For proprietary or large models, sensitive data must be anonymized before being sent to the model. Techniques include synthetic data generation and zero-knowledge proofs.

Whether moving model to data or data to model, integrity verification is essential to prevent tampering or fraud by either party.

Challenge: While web3 token incentives can motivate participation, preventing malicious behavior remains difficult.

3) Model Collaboration

On Civitai, the world’s largest AI art model-sharing platform, users freely share, copy, and modify models to suit personal needs.

Bittensor, an emerging open-source AI project and dual-consensus blockchain, implements a token-incentivized decentralized network of models. Using a mixture-of-experts approach, it collaboratively produces problem-solving models and supports knowledge distillation—enabling models to share insights and accelerate learning—offering startups a pathway into large model development.

Autonolas, a unified network for automation, oracles, and public AI services, designs a consensus framework enabling agents to coordinate via Tendermint.

Challenge: Many training processes still require intensive communication; reliability and time efficiency in distributed training remain major obstacles;

Innovative Integration of Large Models and Web3

The above discusses how web3 can address key challenges in the large model industry. The convergence of these two powerful forces enables novel applications.

1) Using ChatGPT to Write Smart Contracts

Recently, an NFT artist with no programming background successfully deployed a smart contract and issued a token called Turboner using only ChatGPT prompts. He documented the entire week-long process on YouTube, inspiring others to explore AI-assisted smart contract creation.

2) Crypto Payments Empowering Intelligent Management

Advances in large models have dramatically improved the intelligence of virtual assistants. Combined with crypto payments, these agents could manage resources and execute complex tasks in open markets. AutoGPT demonstrated this potential by autonomously purchasing cloud credits and booking flights using a user’s credit card—but its capabilities are limited by login requirements and authentication barriers. Multi-Agent Systems (MAS), including frameworks like Contract Net Protocol, envision multiple AI assistants collaborating in open markets. With token-based economies, such cooperation could transcend trust-based limitations, evolving into large-scale market-driven coordination—akin to humanity’s transition from barter to monetary systems.

3) zkML (Zero-Knowledge Machine Learning)

Zero-Knowledge Proofs (zkp) in blockchain serve two main purposes: improving performance by offloading computation and verifying results on-chain, and protecting transaction privacy. In large models, zkps enable trustworthy computation—verifying consistency and authenticity of model outputs—and privacy-preserving data training. In decentralized settings, service providers must prove they deliver the promised model without cutting corners; data contributors must participate securely without exposing private information. While zkps offer promise, challenges remain—homomorphic encryption and federated learning solutions are still immature.

Solutions Based on BEC (Blockchain Edge Client) Architecture

Beyond the above approaches, another school of thought—lacking token incentives and employing minimal blockchain—has received little attention.

The BEC architecture shares similarities with Jack Dorsey’s concept of Web5 and Tim Berners-Lee’s Solid project.

They all believe:

Each person should have a corresponding edge node under their control;
Most application-level computation and storage should occur at the edge;
Collaboration between personal nodes occurs via blockchain;
Node-to-node communication happens over P2P networks;
Individuals fully control their nodes or may delegate management to trusted parties (sometimes called relay servers);
This achieves maximum possible decentralization.

When such personally controlled nodes store private data and host large models, they can train fully personalized, 100% privacy-preserving personal AI agents. Dr. Gong Ting, founding partner of SIG China, poetically compares these future personal nodes to Olaf’s personal snow cloud in *Frozen*—a loyal companion following its owner everywhere.

In this vision, metaverse avatars evolve from keyboard-controlled puppets into sentient agents capable of continuous learning—scanning news, managing emails, and replying to social messages autonomously. (Note to chatty girlfriends: you may soon need tools to detect whether your boyfriend is using an agent to fake attention.) When new skills are needed, installing new modules on your agent works just like downloading apps on a smartphone.

Conclusion

Historically, as the internet became increasingly platform-dominated, although unicorn startups emerge faster, the ecosystem has grown less favorable for independent entrepreneurship.

With Google and Facebook offering efficient content distribution, YouTube—founded in 2005—was acquired by Google for $1.6 billion just one year later;

With Apple’s App Store enabling rapid app distribution, Instagram—founded in 2012 with fewer than a dozen employees—was bought by Facebook for $1 billion in 2012;

Backed by ChatGPT-like large models, Midjourney—with only 11 people—earned $100 million in a year. OpenAI, with fewer than 100 staff, is valued at over $20 billion.

Internet platform giants grow ever stronger. The rise of large models hasn't disrupted the existing monopoly structure. Big tech still dominates algorithms, data, and compute. Startups lack both the capability and capital to innovate at the foundational model level, confining them to vertical applications atop existing models. While large models appear to democratize knowledge, real power rests in the hands of fewer than 100 individuals globally who can produce such models.

If large models eventually permeate every aspect of life—if you consult ChatGPT about your diet, health, work emails, or legal letters—then theoretically, a few model controllers could subtly alter parameters and massively influence billions. Job displacement caused by AI might be mitigated via UBI or Worldcoin, but the danger of malicious manipulation by a handful of gatekeepers is far graver. This contradicts OpenAI’s original mission. While transitioning to capped-profit addresses profit motives, it fails to solve the problem of concentrated power. After all, these models were trained rapidly on decades of freely shared human knowledge online—yet the resulting intelligence is held hostage by a tiny elite.

Thus, large models and blockchain hold fundamentally conflicting values. Web3 builders must engage in large model innovation and apply blockchain to fix its flaws. If the vast amount of freely accessible data on the internet represents collectively owned human knowledge, then the large models derived from it should belong to all of humanity. Just as OpenAI recently began paying academic databases for training content, it should also compensate individuals for their personal blogs and shared writings.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

万物岛ThreeDAO

How can AI large models and Web3 coexist?

TechFlow Selected TechFlow Selected

How can AI large models and Web3 coexist?