
Closed-door Discussion with AI Entrepreneurs from China and the US: Changes and New Trends in AI Entrepreneurship After DeepSeek-R1
TechFlow Selected TechFlow Selected

Closed-door Discussion with AI Entrepreneurs from China and the US: Changes and New Trends in AI Entrepreneurship After DeepSeek-R1
A chatbot may not necessarily be the user's first AI product.
Article source: FounderPark

Image source: Generated by Wujie AI
DeepSeek was undoubtedly the highlight during the 2025 Spring Festival, from topping the Apple App Store's free app chart to cloud providers racing to deploy DeepSeek-R1. For many, DeepSeek became their first AI product experience. Entrepreneurs are discussing everything—from technological innovations and training/inference cost analysis to its impact on the entire AI industry.
On February 2, Founder Park and Global Ready, a global private community under GeekPark, hosted a closed-door discussion inviting over 60 founders and technical experts from AI companies in Silicon Valley, China, London, Singapore, and Japan. They engaged in an in-depth exploration of new technical directions and product trends sparked by DeepSeek, covering innovation, product implementation, and computing power shortages.

After anonymization, we have compiled the key points from this private discussion.
01 Where does DeepSeek’s innovation lie?
DeepSeek released its V3 base model at the end of December, one of the most powerful open-source models currently available. It features 37B activated parameters with a total parameter scale of 671B, making it a large MoE (Mixture-of-Experts) model.
The "Aha moment" of the R1 model, released in January 2025, refers to the model's ability to exhibit reflective reasoning during inference. For example, while solving a problem, the model may realize that a certain method is no longer effective and switch to a more efficient approach mid-process. This reflective capability stems from reinforcement learning (RL).
R1 is DeepSeek’s flagship model, matching OpenAI's o1 in reasoning ability. Its implementation can be summarized as follows: R1 uses two steps of reinforcement learning and two steps of SFT. The initial RL and SFT stages primarily build a data-generating teacher model to guide subsequent data generation. This model aims to become the most powerful reasoning model available.
-
The core innovation of the DeepSeek R1-Zero model lies in bypassing traditional fine-tuning (SFT) and directly optimizing reasoning through reinforcement learning (RL). Additionally, using DeepSeek R1 as a teacher model to distill open-source small-to-medium models (e.g., Qwen1.7B/7B/14B/32B) significantly enhances the capabilities of smaller models.
-
In coding ability, DeepSeek’s R1 is comparable to OpenAI’s newly released o3 mini, with o3 mini slightly stronger overall. However, R1 is open-source, which will stimulate broader adoption by application developers.
-
The key to DeepSeek’s success lies in its highly integrated engineering solution that drastically reduces costs. While each individual technique used can be found in papers from last year, DeepSeek aggressively implements the latest methods. These techniques inherently carry side effects—such as increased storage overhead—but greatly improve cluster utilization efficiency.
-
If not deployed on a large-scale cluster serving massive users, the MLA architecture may backfire. Many of DeepSeek’s methods only achieve optimal performance within specific scenarios and environments; applying them independently could lead to negative outcomes. Their system design is extremely sophisticated—so much so that extracting any single technology fails to replicate their results.
-
One should not rely solely on training a process reward model, as doing so may fail to meet expectations or even cause overfitting. DeepSeek adopted a primitive reinforcement learning approach, using heuristic rules to score final outcomes and then applying traditional RL to refine the process. This method emerged through continuous trial and error, enabled by DeepSeek’s highly efficient infrastructure.
-
Even without publicly releasing its inference code, other teams can roughly infer the methods used. The open-sourced model weights are sufficient for performance replication, but the challenge lies in identifying specific configurations, which takes time.
-
A reward model relying solely on data annotation struggles to achieve superhuman intelligence. A true reward model based on real-world data or environmental feedback is necessary for advanced reward optimization and ultimately achieving superhuman capabilities.
-
Technically speaking: if a base model has strong generalization ability and combines mathematical and coding skills, their synergy enhances overall generalization. For instance, a capable base model already proficient in writing could, when enhanced with math and code-focused reinforcement learning, generalize well across diverse tasks—such as composing various literary forms from parallel prose to regulated verse—where other models fall short.
02 Why is DeepSeek’s cost so low?
-
The model is highly sparse. Despite being a model with over 600B parameters, only 37B parameters are activated per token during inference, meaning its speed and resource consumption resemble those of a 37B-parameter model. Achieving this requires extensive system-level architectural modifications.
-
In DeepSeek V3, the MoE architecture includes 256 expert modules, but only a small subset is activated during each inference. Under high load, it dynamically adjusts resource usage, theoretically compressing costs down to 1/256 of the original. This design reflects DeepSeek’s foresight in software architecture. With sufficient system optimization, prices can drop significantly even at similar scales.
-
Model training typically involves three parallelization strategies: data-level partitioning (Data Parallelism), layer-level pipeline splitting due to inter-layer independence (Pipeline Parallelism), and weight distribution across GPUs (Tensor Parallelism). To support sparse modeling, DeepSeek made significant adjustments to its training framework and pipeline, eliminating Tensor Parallelism entirely and relying only on Data and Pipeline Parallelism, supplemented by finer-grained Expert Parallelism. By precisely dividing up to 256 experts and assigning them to different GPUs, and by abandoning Tensor Parallelism, DeepSeek circumvents hardware limitations, bringing H800 and H100 training efficiencies closer together.
-
In deployment, experiments show that computational costs remain manageable and technical barriers are relatively low—typically taking just one to two weeks to reproduce, which is highly favorable for application developers.
-
A potential model architecture: decouple reasoning RL from the large language model itself by adding an external "thinking machine" to handle reasoning, thereby reducing overall costs by several orders of magnitude.
03 Chatbots may not be users’ first AI product
-
DeepSeek R1’s success lies not only in its reasoning ability but also in its integration with search functionality. The combination of a reasoning model and search functions effectively creates a micro-agent framework. For most users, this marks their first encounter with a reasoning model. Even for those familiar with other reasoning models like OpenAI’s o1, the search-integrated DeepSeek R1 offers a completely new experience.
-
For users new to AI products, their first AI experience may not necessarily be a conversational interface like ChatGPT, but rather a model-driven application in another context.
-
The competitive advantage for AI application companies lies in product experience. Whoever delivers faster, better, and more comfortable features will gain market edge.
-
The current visualization of the model’s thought process is a satisfying design, but it represents an early stage in using reinforcement learning (RL) to enhance model capabilities. The length of the reasoning process is not the sole indicator of correctness; future developments will shift from complex, long reasoning chains toward simpler, shorter ones.
04 Vertical AI applications are easier to implement now
-
For relatively specialized tasks (vertical tasks), evaluation can be completed via rule systems (rule system), eliminating reliance on complex reward models (rewarding model). On predefined vertical tasks, models like Tiny Zero or 7B-sized models can quickly yield usable results.
-
On well-defined vertical tasks, training a 7-billion-parameter or larger model distilled from DeepSeek can rapidly achieve an "aha moment." From a cost perspective, training small models on simple arithmetic or games like blackjack—tasks with clear answers—requires only 2–4 H100 or H200 GPUs and less than half a day for the model to converge to a usable state.
-
In vertical domains, especially those involving tasks with definitive answers such as mathematical calculations or physical rule validation (object placement, motion realism), DeepSeek R1 performs better than other models with controllable costs, making it suitable for widespread use in vertical fields. However, for subjective tasks without clear answers—such as judging aesthetic appeal or emotional satisfaction—rule-based approaches struggle. Better solutions may take three to six months to emerge.
-
Using supervised fine-tuning (SFT) or similar methods makes it difficult to address time-consuming dataset queries, and domain distributions in these datasets often fail to comprehensively cover all task levels. Now, with a new, superior toolkit paired with a high-quality model, previously challenging data collection issues and clearly defined vertical tasks can be resolved.
-
While rule systems work well for defining clear rules in math and coding, they become impractical for more complex or open-ended tasks. Eventually, people may explore more suitable models to evaluate results in these complex scenarios—possibly adopting ORM (outcome-oriented reward functions) instead of PRM (process-oriented reward functions), or exploring alternative methods. Ultimately, simulators akin to "world models" might be built to provide better feedback for model decisions.
-
When training small models for reasoning, token-based solutions aren't always necessary. In one e-commerce solution, the entire reasoning function was decoupled from the Transformer-based model and handled by a separate small model, working alongside the Transformer to complete the task.
-
For companies developing models exclusively for internal use (e.g., hedge funds), the main challenge is cost. Large firms can spread development costs across clients, but small teams or startups struggle with high R&D expenses. DeepSeek’s open-source release is transformative—it enables teams previously unable to afford expensive development to now build their own models.
-
In finance, particularly quantitative funds, vast amounts of financial data—such as corporate earnings reports and Bloomberg data—must be analyzed. These firms typically build proprietary datasets and conduct supervised training, but data labeling is extremely costly. For them, applying reinforcement learning (RL) during fine-tuning can significantly boost model performance, enabling qualitative leaps.
05 Domestic chips show promise in addressing inference compute shortages
-
There are now many domestic chips targeting A100 and A800 equivalents. However, the biggest bottleneck isn’t chip design—it’s fabrication (tape-out). DeepSeek chose to partner with Huawei because it can reliably produce chips, ensuring stable training and inference even under stricter sanctions.
-
Looking ahead, Nvidia’s high-end chips may suffer from over-provisioned compute power in certain applications when viewed from a single-GPU training perspective. For example, extra cache and memory constraints may prevent full utilization of single-card compute during training, making them suboptimal for training workloads.
-
In the domestic chip market, if focus shifts entirely to AI applications—excluding scientific computing—and high-precision floating-point capabilities are significantly reduced to concentrate solely on AI tasks—then partial performance metrics could catch up with Nvidia’s flagship chips.
06 Stronger agents and cross-application calling capabilities
-
Agent capabilities will see major improvements in many vertical domains. Start with a base model and encode certain rules into a rule model—a potentially pure engineering solution. Then, use this engineering framework to iteratively train and refine the base model. You may obtain outputs exhibiting signs of superhuman intelligence (super human intelligence). Further preference tuning (preference tuning) can align responses with human readability, resulting in a more powerful reasoning agent for a specific vertical domain.
-
This raises a limitation: you may not achieve an agent with strong generalization across all verticals. An agent trained in one domain may perform well there but fail to generalize elsewhere. Yet, this remains a viable path forward, given DeepSeek’s low inference cost. One can select a model, conduct targeted reinforcement training, and deploy it exclusively for a specific vertical, ignoring others. For vertical AI companies, this is an acceptable trade-off.
-
From an academic standpoint, a key trend over the next year will be transferring established reinforcement learning methods into large model applications to address current shortcomings in generalization and evaluation accuracy. This will further enhance model performance and generalization. As RL adoption grows, structured information output will improve dramatically, better supporting diverse applications—especially in generating charts and other structured content.
-
More people will use R1 for post-training, enabling everyone to create their own agent. The model layer will evolve into various agent models, each leveraging different tools to solve domain-specific problems, eventually forming multi-agent systems.
-
2025 could become the "year of the agent," with many companies launching agents capable of planning tasks. However, sufficient data to support these tasks is currently lacking. Examples include helping users order food, book travel, or check ticket availability. These require vast datasets and reward mechanisms to assess model accuracy—for instance, how do you judge whether a trip plan to Zhangjiajie is correct or incorrect, and how does the model learn? These questions will become research focal points, with reasoning ultimately applied to solve real-world problems.
-
Cross-application calling capabilities will be a major focus in 2025. Android’s open-source nature allows developers to access low-level permissions for cross-app operations—future agents could control your browser, phone, and computer. However, Apple’s strict permission controls make full device-wide app control extremely difficult. Apple would need to develop its own agent capable of managing all apps internally. Although Android is open-source, collaboration with manufacturers like OPPO and Huawei is still required to open low-level permissions across phones, tablets, and computers, enabling data access and supporting agent development.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News













