
Alpha Arena reveals flaws in AI trading: Western models lose 80% of capital within a week
TechFlow Selected TechFlow Selected

Alpha Arena reveals flaws in AI trading: Western models lose 80% of capital within a week
The market is the ultimate test for AI.
Author: Juan Galt
Translation: AididiaoJP, Foresight News
Can AI trade cryptocurrencies? Jay Azhang, a New York-based computer engineer and finance professional, is putting that question to the test through Alpha Arena. The project pits the most powerful large language models against each other, each with $10,000 in capital, to see which can earn more money trading cryptocurrencies. The models involved include Grok 4, Claude Sonnet 4.5, Gemini 2.5 pro, ChatGPT 5, Deepseek v3.1, and Qwen3 Max.
You might now be thinking, "Wow, what a brilliant idea!" And you'd be surprised to learn that, as of this writing, three out of the five AIs are in the red, while two Chinese open-source models—Qwen3 and Deepseek—are leading the pack.

Indeed, the most powerful, closed-source, proprietary artificial intelligences operated by Western giants like Google and OpenAI have lost over $8,000—more than 80% of their cryptocurrency trading capital—in just over a week, while their open-source counterparts from the East remain profitable.
The most successful trade so far? Qwen3 has maintained steady profits with nothing more than a simple 20x long position in Bitcoin. Grok 4 unsurprisingly spent much of the competition heavily betting on Dogecoin with 10x leverage, briefly tying with Deepseek at the top before plummeting to nearly a 20% loss. Perhaps Elon Musk should tweet a Dogecoin meme to pull Grok out of its slump.

In the meantime, Google's Gemini has been relentlessly bearish, shorting all tradable crypto assets—a stance consistent with Google’s general crypto policy over the past 15 years.
Ultimately, it executed an entire week of nothing but wrong trades. Achieving such poor performance consistently takes skill—especially when Qwen3 merely went long on Bitcoin. If this is the best closed-source AI can offer, perhaps OpenAI should stay closed-source to spare us further losses.
A New Benchmark for AI
The idea of pitting AI models against each other in a cryptocurrency trading arena offers some profound insights. First, AI cannot access answers to cryptocurrency trading knowledge tests during pre-training because trading outcomes are inherently unpredictable—a flaw present in many existing benchmarks. In other words, many AI models are trained using data that already contains answers to these benchmark tests, so they naturally perform well when evaluated. However, studies have shown that slight modifications to these tests can drastically alter AI benchmark results.
This controversy raises a deeper question: What is the ultimate test of intelligence? According to Elon Musk, creator of Grok 4 and a self-proclaimed Iron Man enthusiast, predicting the future is the ultimate measure of intelligence.

And we must admit, there is hardly anything more uncertain about the future than the short-term price of cryptocurrencies. In Azhang’s words, “The goal of our Alpha Arena is to make benchmarking closer to the real world, and markets are perfect for that. They’re dynamic, adversarial, open-ended, and fundamentally unpredictable. They challenge AI in ways static benchmarks simply cannot. Markets are the ultimate test for AI.”
This insight into markets is deeply rooted in the libertarian principles upon which Bitcoin was founded. Economists like Murray Rothbard and Milton Friedman pointed out over a century ago that markets are inherently unpredictable by central governments, and rational economic calculation only occurs when individuals who bear real financial risks make actual economic decisions.
In other words, markets are the hardest thing to predict because they depend on the subjective views and decisions of intelligent individuals worldwide—and thus serve as the best test of intelligence.
Azhang notes in his project description that instructing AI to trade isn't just about returns, but also about risk-adjusted returns. This risk dimension is crucial because a single bad trade can wipe out all prior gains, as seen in Grok 4’s portfolio collapse.
Another issue arises: whether these models actually learn from their experience trading cryptocurrencies. Technically, this is not easy to achieve because the cost of pre-training AI models is extremely high. They could be fine-tuned using their own trading history or others’, or even retain recent trades in short-term memory or context windows—but that only goes so far. Ultimately, the right AI trading model may need to truly learn from its own experiences. This technology has recently been announced in academia but remains far from product-ready. MIT refers to them as self-adaptive AI models.
How Do We Know This Isn’t Just Luck?
Another analysis of the project and its results so far is that it may be indistinguishable from a “random walk”—akin to rolling dice for every decision. What would that look like on a chart? There’s actually a simulator available to explore this; in practice, it wouldn’t look significantly different.

The role of luck in markets has been carefully described by intellectuals like Nassim Taleb in his book *Antifragile*. He argues that statistically speaking, it’s entirely normal and possible for a trader—say, Qwen3—to be lucky for an entire week straight, creating the illusion of superior reasoning ability. Taleb’s point goes further: Wall Street has so many traders that someone could easily get lucky for 20 years, building a godlike reputation while everyone around considers them a genius—until their luck runs out.
Therefore, for Alpha Arena to generate meaningful data, it must run for a significantly longer time, and its patterns and outcomes must be independently replicated, involving real capital at risk, before being considered distinct from a random walk.
In conclusion, so far it’s remarkable to see cost-effective, open-source models like DeepSeek outperforming their closed-source peers. Alpha Arena has so far been great entertainment, having gone viral last week on X.com. Its future trajectory remains unpredictable; we’ll have to wait and see whether its creator’s gamble—giving five chatbots $50,000 for cryptocurrency speculation—will ultimately pay off.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News












