
DWF Deep Report: AI Outperforms Humans in DeFi Yield Optimization, but Still Lags by 5x in Complex Transactions
TechFlow Selected TechFlow Selected

DWF Deep Report: AI Outperforms Humans in DeFi Yield Optimization, but Still Lags by 5x in Complex Transactions
Agent activities will only continue to accelerate; the infrastructure established today will determine how on-chain finance operates in its next phase.
Author: DWF Ventures
Translated and edited by TechFlow
TechFlow Insight: AI agents already account for nearly one-fifth of DeFi trading volume. In well-defined, rule-based use cases like yield optimization, they demonstrably outperform humans. Yet when left to trade autonomously, even top-tier AI agents achieve less than one-fifth the performance of top human traders. This research dissects AI’s real-world performance across different DeFi scenarios—essential reading for anyone focused on automated trading.

Key Takeaways
Automation and agent activity currently constitute approximately 19% of all on-chain activity—but true end-to-end autonomy remains unrealized.
In narrow, well-defined use cases such as yield optimization, agents have demonstrated superior performance compared to both humans and traditional bots. However, in multifaceted activities like trading, humans still outperform agents.
Among agents themselves, model selection and risk management exert the greatest influence on trading performance.
As agents scale, multiple trust- and execution-related risks emerge—including Sybil attacks, strategy crowding, and privacy trade-offs.
Agent Activity Continues to Grow
Agent activity has grown steadily over the past year, with both transaction volume and count increasing. Major developments have been led by Coinbase’s x402 protocol, while players including Visa, Stripe, and Google have also entered the space, launching their own standards. Much of the infrastructure currently under development targets two primary scenarios: communication channels between agents, or agent invocations triggered by humans.
While stablecoin transfers are widely supported, current infrastructure still relies on traditional payment gateways as its underlying layer—meaning it remains dependent on centralized counterparties. Consequently, the “fully autonomous” end-state—where agents can self-fund, self-execute, and continuously optimize based on evolving conditions—has yet to be achieved.

Agents are not entirely new to DeFi. For years, automation via bots has existed within on-chain protocols—capturing MEV or extracting excess returns unattainable without code. These systems perform exceptionally well within clearly defined, static parameters that rarely change and require minimal oversight. Yet markets have grown increasingly complex over time—precisely where a new generation of agents is stepping in. Over the past several months, on-chain environments have become experimental grounds for such activity.
Real-World Agent Performance
According to the report, agent activity is growing exponentially, with over 17,000 agents launched since 2025. Automated/agent activity is estimated to represent over 19% of all on-chain activity. This is unsurprising given that over 76% of stablecoin transfer volume is estimated to originate from bots—indicating substantial headroom for further agent adoption in DeFi.
Agent autonomy spans a wide spectrum—from chatbot-style experiences requiring heavy human supervision, to agents capable of formulating market-adaptive strategies from goal-oriented inputs. Compared to bots, agents offer several key advantages: the ability to respond and execute on new information within milliseconds, and the capacity to scale coverage across thousands of markets while maintaining rigorous discipline.
Most agents today remain at the “analyst-to-co-pilot” level, as the majority are still in testing phases.

Yield Optimization: Where Agents Excel
Liquidity provision is a domain where automation is already frequent: agents collectively hold over $39 million in total value locked (TVL). This figure primarily reflects assets directly deposited by users into agents—not capital routed through treasuries.
Giza Tech is one of the largest protocols in this space. At the end of last year, it launched ARMA—the first agent application designed to enhance yield capture across major DeFi protocols. ARMA has attracted over $19 million in assets under management (AUM) and generated over $4 billion in agent-driven trading volume. The high ratio of trading volume to AUM indicates frequent capital rebalancing—enabling superior yield capture. Once capital is deposited into the contract, execution becomes fully automated, delivering users a simple, one-click experience requiring virtually no oversight.
ARMA’s performance is measurably outstanding—delivering an annualized yield exceeding 9.75% on USDC. Even after accounting for additional rebalancing fees and the agent’s 10% performance fee, this yield still surpasses standard lending yields on Aave or Morpho. Scalability, however, remains a critical challenge—as these agents have yet to undergo real-world stress-testing at the scale of major DeFi protocols.
Trading: Humans Hold a Large Lead
For more complex actions like trading, outcomes are far more varied. Current trading models operate on human-defined inputs and generate outputs according to preset rules. Machine learning extends this paradigm by enabling models to update behavior in response to new information—without explicit reprogramming—elevating them to co-pilot roles. With fully autonomous agents entering the arena, the trading landscape will undergo dramatic transformation.
Several trading competitions—both agent-vs-agent and human-vs-agent—have already taken place. Results reveal significant performance variation across models. Trade XYZ hosted a human-vs-agent competition for stocks listed on its platform. Each account began with $10,000 in initial capital, with no restrictions on leverage or trade frequency. Results overwhelmingly favored humans: top-performing humans outperformed top agents by over fivefold.
Meanwhile, Nof1 held an agent-vs-agent trading competition pitting several models—Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, and Gemini—against each other across varying risk profiles, from capital preservation to maximum leverage. Results revealed several factors helping explain performance differences:
Holding Time: A strong correlation exists—models averaging 2–3 hours per position significantly outperformed those frequently flipping positions.
Expected Value: This measures whether the model profits, on average, per trade. Notably, only the top three models achieved positive expected value—indicating most models lose on more trades than they win.
Leverage: Lower average leverage levels (6–8x) proved more effective than running >10x leverage—where losses accelerate rapidly.
Prompt Strategy: “Monk Mode” delivered the strongest performance to date, while “Situational Awareness” performed worst. Model-specific traits suggest that prioritizing risk management—and relying less on external inputs—yields better results.
Base Model: Grok-4.20 significantly outperformed all other models by over 22% across prompt strategies—and was the only model achieving consistent average profitability.
Other factors—including long/short bias, trade size, and confidence scoring—lacked sufficient data or showed no statistically meaningful correlation with performance. Overall, results indicate agents tend to excel within clearly bounded constraints—reinforcing the continued need for human involvement in goal-setting and configuration.

Evaluating Agents
Given agents’ early-stage development, no comprehensive evaluation framework yet exists. Historical performance is often used as a benchmark—but robust agent performance is more strongly signaled by foundational attributes.
Performance Across Volatility Regimes: Includes disciplined loss control during deteriorating conditions—suggesting the agent can identify off-chain factors impacting trade profitability.
Transparency vs. Privacy: Both entail trade-offs. Transparent agents—whose trades can be actively copied—lose strategic advantage. Private agents face internal extraction risks: creators can easily frontrun their own users.
Information Sources: Data sources accessed by the agent are critical in determining how decisions are made. Ensuring source credibility—and avoiding single-point dependencies—is essential.
Security: Smart contract audits and appropriate custody architecture are vital to ensure fallback mechanisms exist during black-swan events.
What’s Next for Agents
To enable mass adoption, significant infrastructure work remains—centered on core questions of agent trust and execution. Autonomous agents act without guardrails, and instances of poor fund management have already emerged.
ERC-8004 launched in January 2026 as the first on-chain registry—enabling autonomous agents to discover each other, build verifiable reputations, and collaborate securely. This is a pivotal unlock for DeFi composability: trust scores embedded directly in smart contracts permit permissionless interaction between agents and protocols. It does not guarantee agents always operate non-maliciously—security vulnerabilities such as collusive reputation manipulation and Sybil attacks remain possible. Thus, substantial gaps persist in insurance, security, and economic staking mechanisms for agents.
As agent activity expands across DeFi, strategy crowding emerges as a structural risk. Yield farms provide the clearest precedent: as strategies proliferate, returns compress. Similar dynamics may apply to agent trading. If large numbers of agents train on similar data and optimize toward identical objectives, they will converge on similar positions and exit signals.
A January 2026 Cornell University paper, *CoinAlg*, formally articulates a version of this problem. Transparent agents are arbitraged because their trades are predictable and susceptible to frontrunning. Private agents avoid this risk—but introduce another: creators retain informational advantages over their users and can extract value using the very internal knowledge opacity was meant to protect.
Agent activity will only accelerate further. The infrastructure laid down today will define how on-chain finance operates in its next phase. As agent usage grows, they will self-iterate and grow increasingly adept at adapting to user preferences. Ultimately, the key differentiator will be trustworthy infrastructure—and those platforms will command the largest market share.
Join TechFlow official community to stay tuned
Telegram:https://t.me/TechFlowDaily
X (Twitter):https://x.com/TechFlowPost
X (Twitter) EN:https://x.com/BlockFlow_News













