Sora Emerges Out of Nowhere—Will 2024 Be the Year of AI+Web3 Transformation?

2024.02.23

Sora Emerges Out of Nowhere—Will 2024 Be the Year of AI+Web3 Transformation?

What sparks can emerge from the convergence of Web3 and AI?

2024.02.23 - 08:40:06

Navigating Web3 tides with focused insights

What sparks can emerge from the convergence of Web3 and AI?

Author: YBB Capital Zeke

Preface

On February 16, OpenAI unveiled its latest text-to-video diffusion model, "Sora," showcasing another milestone moment for generative AI through high-quality video generation across diverse visual data types. Unlike AI video generation tools such as Pika, which are still limited to generating a few seconds of video from multiple images, Sora trains within compressed latent spaces of videos and images, breaking them down into spatiotemporal patches to achieve scalable video generation. Moreover, the model demonstrates capabilities in simulating both physical and digital worlds—its 60-second demo could justifiably be called a "universal simulator of the physical world."

In terms of architecture, Sora continues the previous GPT model’s technical path of “source data - Transformer - Diffusion - emergence,” meaning that computational power remains the engine driving its development and maturity. Since video training requires significantly more data than text training, demand for compute will grow even further. In our earlier article, “Prospective Tracks Ahead: Decentralized Compute Markets,” we have already discussed the importance of compute in the AI era. With rising interest in AI recently, numerous compute projects have emerged, while other DePIN projects (storage, compute, etc.) that benefit passively have also seen significant price surges. Beyond DePIN, what other sparks can emerge from the intersection of Web3 and AI? What opportunities lie within this space? The main purpose of this article is to update and complete our previous work, exploring potential roles for Web3 in the age of AI.

Three Major Directions in AI Development History

Artificial Intelligence (AI) is an emerging scientific technology aimed at simulating, extending, and enhancing human intelligence. Since its inception in the 1950s and 1960s, after more than half a century of development, AI has become a key driver transforming social life and industries across sectors. Throughout this journey, the intertwined evolution of three major research paradigms—symbolism, connectionism, and behaviorism—has laid the foundation for today's rapid advancements in AI.

Symbolism

Also known as logicalism or rule-based systems, symbolism posits that simulating human intelligence through symbolic manipulation is feasible. This approach represents objects, concepts, and their relationships in problem domains using symbols, solving problems via logical reasoning. It has achieved notable success especially in expert systems and knowledge representation. The core belief of symbolism is that intelligent behavior can be realized through symbol manipulation and logical inference, where symbols represent highly abstracted representations of the real world.

Connectionism

Also referred to as the neural network approach, connectionism aims to achieve intelligence by mimicking the structure and function of the human brain. This method constructs networks composed of many simple processing units (analogous to neurons), learning by adjusting the strength of connections between these units (similar to synapses). Connectionism emphasizes learning and generalization from data, making it particularly suitable for pattern recognition, classification, and continuous input-output mapping tasks. Deep learning, an advancement of connectionism, has made breakthroughs in fields such as image recognition, speech recognition, and natural language processing.

Behaviorism

Behaviorism is closely tied to biorobotics and autonomous intelligent systems, emphasizing how agents learn through interaction with their environment. Unlike the first two approaches, behaviorism does not focus on internal representations or cognitive processes but instead achieves adaptive behaviors through cycles of perception and action. Behaviorism holds that intelligence emerges through dynamic interactions with the environment and learning—this method proves especially effective in mobile robotics and adaptive control systems operating in complex and unpredictable environments.

Although these three research directions differ fundamentally, they often interact and converge in practical AI research and applications, jointly advancing the field.

Overview of AIGC Principles

The explosively growing Generative AI (Artificial Intelligence Generated Content, or AIGC) currently represents an evolution and application of connectionism. AIGC can mimic human creativity to generate novel content. These models are trained on large datasets using deep learning algorithms, enabling them to learn underlying structures, relationships, and patterns in the data. Based on user-provided prompts, they produce unique outputs including images, videos, code, music, design, translation, answers to questions, and text. Today’s AIGC essentially consists of three elements: deep learning (DL), big data, and large-scale computing power.

Deep Learning

Deep learning is a subfield of machine learning (ML), and deep learning algorithms are neural networks modeled after the human brain. For example, the human brain contains millions of interconnected neurons working together to process information. Similarly, deep learning neural networks (or artificial neural networks) consist of multiple layers of artificial neurons operating collaboratively inside computers. Artificial neurons are software modules called nodes that use mathematical computations to process data. Artificial neural networks are deep learning algorithms that solve complex problems using these nodes.

Neural networks can be divided hierarchically into input layer, hidden layers, and output layer, with parameters connecting different layers.

● Input Layer: The input layer is the first layer of the neural network, responsible for receiving external input data. Each neuron in the input layer corresponds to one feature of the input data. For instance, when processing image data, each neuron may correspond to a pixel value;

● Hidden Layer: The input layer processes data and passes it forward to deeper layers in the network. Hidden layers process information at different levels and adjust their behavior upon receiving new inputs. Deep learning networks can have hundreds of hidden layers, allowing analysis of problems from multiple angles. For example, if you’re given an image of an unknown animal to classify, you might compare it to animals you already know—judging by ear shape, number of legs, pupil size, etc. Hidden layers in deep neural networks work similarly. If a deep learning algorithm attempts to classify an animal image, each hidden layer processes different features of the animal to achieve accurate classification;

● Output Layer: The output layer is the final layer of the neural network, responsible for producing the network’s output. Each neuron in the output layer represents a possible output category or value. For example, in classification tasks, each output neuron may correspond to a class; in regression tasks, the output layer may contain only one neuron whose value represents the prediction;

● Parameters: In neural networks, connections between layers are represented by weights and biases—parameters optimized during training so the network can accurately recognize patterns and make predictions. Increasing parameters enhances the model capacity of the neural network—that is, its ability to learn and represent complex patterns in data. However, increased parameters also raise demands on computational resources.

Big Data

To train effectively, neural networks typically require large volumes of diverse, high-quality, and multi-source data. Big data forms the foundation for training and validating machine learning models. By analyzing big data, ML models can learn patterns and relationships within the data, enabling predictions or classifications.

Large-Scale Computing Power

The multilayered complexity of neural networks, vast numbers of parameters, big data processing needs, iterative training methods (during training, models undergo repeated iterations requiring forward propagation and backpropagation calculations across every layer—including activation functions, loss functions, gradient computation, and weight updates), high-precision computation requirements, parallel computing capabilities, optimization and regularization techniques, and model evaluation and validation processes collectively create massive demand for high-performance computing.

Sora

As OpenAI’s latest video-generation AI model, Sora marks a significant leap forward in AI’s ability to process and understand diverse visual data. By leveraging a video compression network and spatiotemporal patching techniques, Sora converts massive amounts of visual data captured globally from various devices into a unified representation, enabling efficient processing and understanding of complex visual content. Built upon a text-conditioned diffusion model, Sora generates videos or images highly aligned with textual prompts, demonstrating exceptional creativity and adaptability.

However, despite its breakthroughs in video generation and simulation of real-world interactions, Sora still faces limitations, including accuracy in physical world simulation, consistency in long video generation, comprehension of complex textual instructions, and efficiency in training and generation. Fundamentally, Sora relies on OpenAI’s monopolistic compute resources and first-mover advantage, continuing along the established technical path of “big data - Transformer - Diffusion - emergence”—a form of brute-force aesthetics. Other AI companies still have room for technological leapfrogging.

Although Sora itself has little direct connection with blockchain, I believe that over the next one to two years, its influence will drive the emergence and rapid development of other high-quality AI generation tools, impacting multiple Web3 sectors such as GameFi, social platforms, creator platforms, and DePIN. Therefore, having a basic understanding of Sora is essential. How future AI will effectively integrate with Web3 may become a key question worth pondering.

Four Pathways for AI x Web3 Integration

As mentioned above, we can see that the foundational infrastructure required by generative AI boils down to three components: algorithms, data, and compute. On the other hand, blockchain’s greatest strengths lie in two areas: redefining production relationships and decentralization. Thus, I believe the convergence of the two technologies can unfold along four primary pathways:

Decentralized Compute

Since I’ve previously written about related topics, the main goal of this section is to provide an update on recent developments in the compute sector. When discussing AI, compute is always an unavoidable component. After Sora’s release, the scale of AI’s compute demand has become almost unimaginable. Recently, during the 2024 World Economic Forum in Davos, Switzerland, OpenAI CEO Sam Altman stated outright that compute and energy are now the biggest constraints—and their future importance may rival that of money. Then on February 10, Sam Altman announced an astonishing plan on Twitter: raising $7 trillion (about 40% of China’s 2023 GDP) to reshape the global semiconductor industry and build a chip empire. When writing about compute-related topics before, my imagination was limited to national restrictions and corporate monopolies—now, a single company aiming to dominate the global semiconductor industry seems truly radical.

Thus, the importance of decentralized compute is self-evident. Blockchain’s characteristics can indeed address the current extreme concentration of compute resources and the prohibitively high cost of dedicated GPUs. From an AI perspective, compute usage falls into two categories: inference and training. Projects focused on training remain scarce due to the need to align decentralized networks with neural network design and extremely high hardware demands—making this direction inherently high-barrier and difficult to implement. In contrast, inference is relatively simpler: decentralized network designs are less complex, and hardware and bandwidth requirements are lower, making it a more mainstream current approach.

The centralized compute market offers enormous potential, often associated with the keyword “trillion-dollar,” and is one of the most frequently hyped topics in the AI era. Yet, judging from the surge of recent projects, most appear rushed and opportunistic—raising banners of decentralization while ignoring the inefficiencies inherent in decentralized networks. Furthermore, there is severe homogenization in design, with many projects looking nearly identical (e.g., one-click L2 plus mining models), potentially leading to chaotic outcomes. Under these conditions, competing with traditional AI players will be extremely challenging.

Algorithm and Model Collaboration Systems

Machine learning algorithms refer to those capable of identifying patterns and rules from data to make predictions or decisions. Algorithms are technology-intensive, requiring deep expertise and innovation in design and optimization. They are central to training AI models, defining how data is transformed into actionable insights or decisions. Common generative AI algorithms include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers—each designed for specific domains (e.g., art, language recognition, translation, video generation) or purposes, used to train specialized AI models.

With so many algorithms and models, each excelling in different ways, can we integrate them into a versatile, all-capable model? Bittensor, recently gaining popularity, leads in this direction—using mining incentives to encourage collaboration and mutual learning among different AI models and algorithms to create more efficient and universal AI systems. Others pursuing similar goals include Commune AI (focused on code collaboration). However, for current AI companies, algorithms and models are closely guarded trade secrets rarely shared externally.

Thus, the narrative of AI collaboration ecosystems is novel and intriguing—leveraging blockchain advantages to overcome the fragmentation of isolated AI algorithms. But whether such systems can generate tangible value remains uncertain. Top-tier AI firms maintain strong closed-source models with fast iteration and integration capabilities—for instance, OpenAI evolved from early text-generation models to multi-domain models in under two years. Projects like Bittensor may need to find niche areas where their models and algorithms offer distinct advantages.

Decentralized Big Data

From a simple standpoint, using private data to train AI and labeling data are directions highly compatible with blockchain, provided safeguards against spam or malicious data are implemented. Additionally, data storage benefits existing DePIN projects like FIL and AR. From a more complex perspective, using blockchain data for machine learning (ML)—to improve blockchain data accessibility—is another interesting direction (one of Giza’s exploratory paths).

In theory, blockchain data is always accessible and reflects the entire state of the chain. However, for those outside the blockchain ecosystem, accessing this vast amount of data is not easy. Fully storing a blockchain requires specialized hardware and deep technical expertise. To overcome access challenges, several solutions have emerged. RPC providers allow API-based node access, while indexing services enable data extraction via SQL and GraphQL—both playing crucial roles. However, these methods have limitations. RPC services struggle under high-volume query scenarios and often fail to meet demand. While indexing services offer structured retrieval, the complexity of Web3 protocols makes building efficient queries extremely difficult—sometimes requiring hundreds or thousands of lines of intricate code. This complexity presents a major barrier for general data practitioners and those unfamiliar with Web3 details. These cumulative limitations highlight the need for easier, more usable methods of accessing and utilizing blockchain data—to foster broader innovation and application.

By combining ZKML (Zero-Knowledge Machine Learning, reducing ML’s burden on-chain) with high-quality blockchain data, it may be possible to create datasets that enhance blockchain accessibility. AI could drastically lower the barriers to accessing blockchain data, enabling developers, researchers, and ML enthusiasts to access more high-quality, relevant datasets over time—fueling effective and innovative solutions.

AI-Empowered DApps

Since ChatGPT-3’s breakout in 2023, AI-powered DApps have become a common direction. Highly versatile generative AI can be integrated via APIs to simplify and intelligently analyze data platforms, trading bots, blockchain encyclopedias, and more. Alternatively, AI can serve as chatbots (e.g., Myshell), AI companions (Sleepless AI), or even generate NPCs for blockchain games. However, due to low technical barriers, most implementations merely involve API integration with minor fine-tuning, lacking deep integration with the core project—hence receiving little attention.

But with Sora’s arrival, I believe AI-enhanced GameFi (including metaverse) and creator platforms will become key focal points moving forward. Due to Web3’s bottom-up nature, it has struggled to produce products competitive with traditional gaming or creative studios. Sora’s emergence could break this deadlock—possibly within just two to three years. Judging from Sora’s demo, it already shows potential to compete with short-form video production companies. Combined with Web3’s vibrant community culture generating countless creative ideas, once the only constraint becomes imagination, the barriers between bottom-up Web3 industries and top-down traditional industries may finally dissolve.

Conclusion

As generative AI tools continue to advance, we will experience more epoch-defining “iPhone moments.” Although many dismiss the integration of AI and Web3, I believe most current directions are valid—the real issues to address are necessity, efficiency, and fit. While still in the exploratory phase, the convergence of AI and Web3 could very well become a dominant trend in the next bull market.

Remaining sufficiently curious and open-minded toward new technologies is a mindset we must cultivate. Historically, transitions like cars replacing horse-drawn carriages happened swiftly and decisively—just like inscriptions and past NFTs. Holding too much bias only causes us to miss out on opportunities.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Add to Favorites

Share to Social Media

Author

YBB Capital

@YBBCapital