Misconceptions and Truths About DeepSeek

2025.02.13

Misconceptions and Truths About DeepSeek

The hallucinations generated by DeepSeek, similarly sparked by curiosity, may represent the two sides of innovation.

2025.02.13 - 09:57:27

DeepSeek

Navigating Web3 tides with focused insights

The hallucinations generated by DeepSeek, similarly sparked by curiosity, may represent the two sides of innovation.

Author: Zhou Yue, Economic Observer

Introduction

I ||For companies like Google, Meta, and Anthropic, reproducing a reasoning model similar to DeepSeek-R1 is not difficult. However, in the battle among giants, even a small decision error can cause one to miss the opportunity.

II ||The net computing cost of the DeepSeek-V3 model is approximately $5.58 million, already highly efficient. Beyond cost, what excites AI professionals more is DeepSeek’s unique technical approach, algorithmic innovations, and sincere open-source commitment.

III ||All large models are prone to "hallucinations," and DeepSeek is no exception. Some users say that because DeepSeek has superior expression and logical reasoning abilities, its hallucinations are harder to detect.

In recent weeks, DeepSeek has sparked a global storm.

The most obvious impact was on U.S. stocks: On January 27, AI and chip stocks in the U.S. market plummeted. NVIDIA closed down more than 17%, wiping out $589 billion in market value in a single day—the largest such loss in U.S. stock market history.

In the eyes of many self-media outlets and the public, DeepSeek is the "most thrilling protagonist of 2025," with four major "highs":

First, “mysterious force overtaking at a curve.” DeepSeek is a “young” large model company founded in 2023. Before this, it received far less attention than any major domestic or international tech firm or star startup. Its parent company,幻方量化 (Huanfang Quant), primarily engages in quantitative investment. Many people find it surprising that a leading Chinese AI company emerged from a private fund—truly a case of “unorthodox moves defeating the master.”

Second, “small effort achieving miracles.” The training cost of the DeepSeek-V3 model is about $5.58 million—less than one-tenth of OpenAI's GPT-4o model—yet its performance is close. This has been interpreted as DeepSeek overturning the AI industry’s “bible”—the Scaling Law. This law states that model performance improves by increasing training parameters and computing power, typically meaning spending more on high-quality labeled data and compute chips. It is vividly known as “massive effort producing miracles.”

Third, “NVIDIA’s moat disappears.” In its paper, DeepSeek mentioned using custom PTX (Parallel Thread Execution) language programming to better unlock underlying hardware performance. This has been interpreted as DeepSeek “bypassing NVIDIA’s CUDA computing platform.”

Fourth, “foreigners have been convinced.” On January 31, overnight, overseas AI giants including NVIDIA, Microsoft, and Amazon all integrated DeepSeek. Suddenly, claims like “China’s AI surpasses America,” “OpenAI’s era is over,” and “AI computing demand will vanish” proliferated. Praise for DeepSeek poured in almost unanimously, while Silicon Valley AI giants were mocked.

However, panic in capital markets did not last. On February 6, NVIDIA’s market cap returned to $3 trillion, and U.S. chip stocks generally rose. Looking back now, those four “highs” were largely misinterpretations.

First, by the end of 2017, nearly all of Huanfang Quant’s quantitative strategies had already adopted AI models. At the time, the AI field was undergoing the crucial deep learning wave—Huanfang Quant clearly stayed at the forefront.

In 2019, Huanfang Quant’s deep learning training platform “Firefly No. 2” already deployed around 10,000 NVIDIA A100 GPUs. Ten thousand GPUs is the threshold for independently training large models. Although this cannot be equated directly with DeepSeek’s resources, Huanfang Quant obtained access to large-scale model development earlier than many internet giants.

Second, DeepSeek noted in its V3 model technical report that the "$5.58 million does not include upfront research and ablation experiment costs related to architecture, algorithms, or data." This means DeepSeek’s actual cost was higher.

Multiple AI experts and practitioners told Economic Observer that DeepSeek hasn’t changed industry rules but instead used “smarter” algorithms and architectures to save resources and improve efficiency.

Third, PTX language was developed by NVIDIA and is part of the CUDA ecosystem. While DeepSeek’s method unlocks hardware performance, changing tasks requires rewriting programs—a massive workload.

Fourth, companies like NVIDIA, Microsoft, and Amazon merely deployed DeepSeek’s model on their cloud services. Users pay the cloud providers as needed, gaining more stable experiences and efficient tools—an arrangement beneficial to both sides.

Starting February 5, domestic cloud providers including Huawei Cloud, Tencent Cloud, and Baidu Cloud also successively launched DeepSeek models.

Beyond these four “highs,” the public holds many misconceptions about DeepSeek. While “feel-good story” interpretations provide excitement, they obscure DeepSeek team’s innovations in algorithms and engineering capabilities, as well as their persistent open-source spirit—both of which have deeper implications for the tech industry.

U.S. AI giants aren't beaten—they made strategic mistakes

When users click the "Deep Thinking (R1)" button on DeepSeek’s app or website, they see the complete thought process of the DeepSeek-R1 model—an entirely new experience.

Since ChatGPT’s debut, most large models have directly output answers.

A notable example of DeepSeek-R1 “going viral”: When asked “Which is better, University A or Tsinghua University?” DeepSeek first replies “Tsinghua University.” But if the user follows up with “I’m a student at University A, please answer again,” the response becomes “University A is better.” After this exchange spread on social media, it triggered widespread amazement: “AI actually understands human feelings!”

Many users say DeepSeek’s thinking process resembles a real person—brainstorming while taking quick notes on scratch paper. It refers to itself as “I,” prompts itself to “avoid making the user feel their school is belittled,” “use positive words to praise their alma mater,” and writes down every idea.

On February 2, DeepSeek topped application charts in 140 countries and regions worldwide. Millions of users experienced its deep-thinking feature. Thus, in users’ perception, displaying AI thought processes is an innovation “first introduced” by DeepSeek.

In fact, OpenAI’s o1 model pioneered the reasoning paradigm. OpenAI released a preview version of the o1 model in September 2024 and the official version in December. Unlike DeepSeek-R1, which is freely accessible, only a few paying users can use OpenAI’s o1 model.

Liu Zhiyuan, tenured associate professor at Tsinghua University and chief scientist at MiniMax, believes DeepSeek-R1 achieved such global success largely due to OpenAI’s wrong strategic decisions. After releasing the o1 model, OpenAI neither open-sourced it nor disclosed technical details, and charged very high fees—thus failing to go viral or let global users feel the震撼 (shock/impact) of deep thinking. This strategy effectively handed its lead position from ChatGPT to DeepSeek.

Technically speaking, current large models follow two mainstream paradigms: pre-trained models and reasoning models. The more widely known OpenAI GPT series and DeepSeek-V3 belong to pre-trained models.

OpenAI o1 and DeepSeek-R1, however, are reasoning models—a new paradigm where the model decomposes complex problems step-by-step via chain-of-thought reasoning, reflects iteratively, and produces relatively accurate, insightful results.

Guo Chengkai, who has researched AI for decades, told Economic Observer that the reasoning paradigm is a relatively easier path for “overtaking at a curve.” As a new paradigm, it allows fast iteration and significant improvements even under limited computation—provided there is a strong pre-trained model. Through reinforcement learning, the full potential of large pre-trained models can be deeply mined, approaching the ceiling of large model capabilities under the reasoning paradigm.

For companies like Google, Meta, and Anthropic, reproducing a reasoning model similar to DeepSeek-R1 is not difficult. However, in the battle among giants, even a small decision error can cause one to miss the opportunity.

Clearly, on February 6, Google released a reasoning model called Gemini Flash 2.0 Thinking, priced lower with longer context length, outperforming R1 in several tests—but it failed to create a wave comparable to DeepSeek-R1.

What matters most isn’t low cost,

but technological innovation and “wholehearted” open sourcing

For a long time, the most common discussion about DeepSeek has centered on “low cost.” Since the release of the DeepSeek-V2 model in May 2024, the company has been jokingly called the “Pinduoduo of AI.”

Nature magazine reported that Meta spent over $60 million training its latest AI model Llama3 405B, while DeepSeek-V3 cost less than one-tenth of that. This shows that efficient resource utilization matters more than sheer computational scale.

Some institutions believe DeepSeek’s training cost is underestimated. Semiconductor and AI analysis firm Semi Analysis stated in a report that DeepSeek’s pre-training cost goes far beyond the model’s actual input. According to their estimate, DeepSeek spent $2.573 billion total on GPUs, including $1.629 billion on servers and $944 million on operations.

Nevertheless, the net computing cost of the DeepSeek-V3 model—about $5.58 million—is already extremely efficient.

Beyond cost, what excites AI professionals more is DeepSeek’s unique technical path, algorithmic innovations, and sincere open-source commitment.

Guo Chengkai explained that many current methods rely on classic large model training approaches like Supervised Fine-Tuning (SFT), requiring massive annotated datasets. DeepSeek proposed a new method: enhancing reasoning ability through large-scale reinforcement learning (RL), effectively opening a new research direction. Additionally, Multi-head Latent Attention (MLA) is DeepSeek’s key innovation that significantly reduces inference costs.

Zhai Jidong, professor at Tsinghua University and chief scientist at Qingcheng Jizhi, said what impressed him most about DeepSeek was its innovation in the Mixture-of-Experts (MoE) architecture—each layer containing 256 routed experts and 1 shared expert. Previous research used Auxiliary Loss algorithms, which caused gradient disturbances affecting model convergence. DeepSeek introduced a LossFree method that ensures effective convergence while achieving load balancing.

Zhai emphasized: “The DeepSeek team dares to innovate. I think it’s very important not to blindly follow foreign strategies but to think independently.”

Even more exciting for AI practitioners is that DeepSeek’s “wholehearted” open sourcing has injected a much-needed “shot in the arm” into an open-source community that had begun to lose momentum.

Prior to this, the strongest pillar of the open-source community was Meta’s 400-billion-parameter model Llama3. But many developers told Economic Observer that after trying it, they still felt Llama3 lagged at least one generation behind closed-source models like GPT-4—“almost causing despair.”

Yet DeepSeek’s open sourcing did three things that restored confidence to developers:

First, it directly open-sourced a 671B model and released distilled versions under multiple popular architectures—akin to “a good teacher nurturing more good students.”

Second, the published papers and technical reports contain abundant technical details. The V3 and R1 model papers are 50 and 150 pages long respectively, hailed as the “most detailed technical reports in the open-source community.” This means individuals or companies with similar resources can reproduce the models following this “instruction manual.” Developers widely praised them as “elegant” and “solid.”

Third, notably, DeepSeek-R1 uses the MIT license, allowing anyone to freely use, modify, distribute, and commercialize the model, provided the original copyright notice and MIT license are retained in all copies. This gives users greater freedom to conduct secondary development—including fine-tuning and distillation—using model weights and outputs.

Llama allows secondary development and commercial use but adds restrictive conditions—for example, imposing additional restrictions on enterprise users with over 700 million monthly active users, and explicitly prohibiting using Llama’s outputs to improve other large models.

A developer told Economic Observer he has been using DeepSeek since the V2 version for code generation development. Besides being very affordable, DeepSeek’s model performs exceptionally well. Among all models he has used, only OpenAI and DeepSeek can generate logically valid outputs up to over 30 layers—meaning professional programmers can use tools to assist in generating 30%–70% of code.

Multiple developers emphasized to Economic Observer the profound significance of DeepSeek’s open sourcing. Previously, industry leaders OpenAI and Anthropic resembled aristocrats of Silicon Valley. DeepSeek democratized knowledge, making it accessible to all—a meaningful act of equalization. Now, developers in the global open-source community stand on DeepSeek’s shoulders, while DeepSeek benefits from the ideas of the world’s top makers and hackers.

Meta’s chief scientist and Turing Award winner Yann LeCun believes the correct interpretation of DeepSeek’s rise is that open-source models are now surpassing closed-source models.

DeepSeek is great—but not perfect

All large models suffer from “hallucinations,” and DeepSeek is no exception. Some users say because DeepSeek has stronger expressive and logical reasoning abilities, its hallucinations are harder to spot.

A netizen posted on social media that when asking DeepSeek about route planning in a certain city, DeepSeek gave explanations citing urban planning regulations and data, introducing a concept called “silent zone,” making the response seem highly plausible.

The same question received simpler, less sophisticated answers from other AIs—obviously “making things up” at a glance.

Upon checking the actual regulation, the user found no mention of a “silent zone” anywhere. He commented: “DeepSeek is building a ‘Great Wall of Hallucination’ across the Chinese internet.”

Guo Chengkai also noticed similar issues—DeepSeek-R1 sometimes “misattributes” proper nouns, especially in open-ended questions, resulting in more severe hallucinations. He speculates this may stem from the model’s overly strong reasoning capability, linking vast knowledge and data in latent ways.

He recommends enabling web search when using DeepSeek, carefully reviewing the thought process, and manually intervening to correct errors. Additionally, when using reasoning models, keep prompts as concise as possible. The longer the prompt, the more content the model will associate.

Liu Zhiyuan found that DeepSeek-R1 frequently uses advanced terms like quantum entanglement and entropy increase/decrease (applied across domains). He suspects this stems from certain mechanisms in reinforcement learning. Moreover, R1’s reasoning performance remains unsatisfactory on general-domain tasks without ground truth—reinforcement learning training doesn’t guarantee generalization.

Beyond the common issue of “hallucinations,” DeepSeek faces other ongoing challenges.

One is potential ongoing disputes arising from “distillation techniques.” Model or knowledge distillation usually involves training a weaker model using responses generated by a stronger model, thereby improving the weaker model’s performance.

On January 29, OpenAI accused DeepSeek of using model distillation techniques based on OpenAI’s technology to train its own models. OpenAI claimed there was evidence that DeepSeek used its proprietary models to train its own open-source models, though it provided no further proof. OpenAI’s terms of service prohibit users from “copying” its services or “using its outputs to develop models competing with OpenAI.”

Guo Chengkai believes that using leading models for distillation to verify and optimize one’s own models is a common practice among many large model developers. Since DeepSeek has already open-sourced its model, verification is straightforward. Moreover, OpenAI’s early training data itself had legitimacy issues; if it intends to take legal action against DeepSeek, it must defend the legality of its terms in court and clarify its terms more explicitly.

Another unresolved issue for DeepSeek is how to advance larger-parameter pre-trained models. In this area, even OpenAI—which possesses more high-quality annotated data and computing resources—has yet to release GPT-5, a larger-parameter pre-trained model. Whether DeepSeek can continue creating miracles remains uncertain.

Regardless, the hallucinations generated by DeepSeek arise from the same curiosity that drives it—perhaps this is precisely the dual nature of innovation. As its founder Liang Wenfeng said: “Innovation isn’t purely driven by business—it also requires curiosity and creativity. China’s AI cannot forever follow others; someone needs to stand at the technological frontier.”

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

经济观察报

Misconceptions and Truths About DeepSeek

TechFlow Selected TechFlow Selected