Anthropic Data: Nearly Half of AI Agent Calls Concentrated in Software Engineering; These 16 Vertical Domains Remain Blue Oceans

2026.02.24

Anthropic Data: Nearly Half of AI Agent Calls Concentrated in Software Engineering; These 16 Vertical Domains Remain Blue Oceans

Healthcare accounts for 1%, legal services for 0.9%, and education for 1.8%. These are not saturated markets but rather markets that barely exist.

2026.02.24 - 07:50:25

AnthropicAI Agent

Navigating Web3 tides with focused insights

Healthcare accounts for 1%, legal services for 0.9%, and education for 1.8%. These are not saturated markets but rather markets that barely exist.

Author: Garry's List

Translation: TechFlow

TechFlow Intro: Anthropic has just released the most comprehensive real-world study of AI Agent usage to date. Its core finding: software engineering accounts for nearly 50% of all AI Agent tool calls—while healthcare, law, education, and 13 other vertical domains combined account for less than half of the remainder, with each individually representing under 5%.

This is not a signal of market saturation—but rather a map pointing to 300 vertical AI unicorns waiting to be built. Even more valuable is a counterintuitive insight cited in the article: models can now operate autonomously for nearly five hours, yet users only let them run for 42 minutes. This “trust deficit” itself represents the next product opportunity.

Full Text Below:

Software engineering accounts for nearly 50% of all AI Agent tool calls. Sixteen vertical domains—including healthcare, law, and finance—remain largely untouched, with none exceeding 5%. This signals the potential for 300 vertical AI unicorns yet to be built.

If I were launching a startup today, I’d stare at the red bar in that chart above until I saw my future.

Aaron Levie, founder of Box, commented:

“This chart powerfully reminds us how massive the opportunity is in the AI Agent space today.”

Horizontal opportunities abound, of course—but many workflows require deep domain expertise to meaningfully automate unique processes within specific verticals.

The template is clear: build Agent software that integrates proprietary data; enables effective human-Agent collaboration in workflow execution; applies deep, domain-specific context engineering; and drives change management on the customer side.

Today, vast gaps remain across numerous domains.

Software engineering dominates half of all AI Agent activity. The other half is scattered across 16 vertical domains—none surpassing 9%. Healthcare accounts for 1%, law for 0.9%, and education for 1.8%. These are not saturated markets—they’re markets that barely exist.

Anthropic has just published the most comprehensive real-world study of AI Agent usage to date. Its central finding: software engineering accounts for 49.7% of Agent tool calls on its API. Buried beneath this headline lies the crucial conclusion: everything else remains wide open.

Deployment Lag

Here’s a data point that should excite founders: model capability has far outpaced user willingness to trust it.

METR’s capability evaluation shows Claude can solve tasks requiring nearly five hours for humans—but in practice, the 99.9th percentile session duration is only ~42 minutes. That gap—the difference between what AI *can* do and what we *allow* it to do—is a massive opportunity.

Figure: Maximum training duration for Claude Code nearly doubled over three months. This boosted both capability and trust.

Source: x.com

From October 2025 to January 2026, the 99.9th percentile single-session duration nearly doubled—from under 25 minutes to over 45 minutes. Growth was steady across model versions. This reflects not just stronger models, but users progressively extending their trust through repeated use.

“From August to December, Claude Code’s success rate on the most challenging internal tasks doubled—while average human interventions per session fell from 5.4 to 3.3.”

The capability is already here. Deployment hasn’t caught up yet. That’s not a problem—it’s a product opportunity.

How Trust Evolves

Among new users, 20% automatically approve Claude Code’s actions. By the 750th session, over 40% of sessions run entirely in auto-approval mode. Yet there’s a counterintuitive finding: experienced users intervene *more*, not less. New users intervene in 5% of turns; veterans do so in 9%.

Figure: Trust is a skill that accumulates over time. New users auto-approve 20% of sessions. By 750 sessions, that share exceeds 40%.

Image: Anthropic

Source: x.com

This isn’t contradictory—it reflects a shift in supervision strategy. Beginners approve step-by-step before actions occur; veterans grant broad authority upfront and intervene only when issues arise—they’ve moved from pre-approval to active monitoring.

Here’s a safety-relevant finding: on complex tasks, Claude Code proactively requests clarification over twice as often as humans intervene. Agents pause to confirm—not rush ahead blindly. This is a feature, not a bug.

“The core insight of this research is that autonomy exercised by Agents in practice is co-constructed by model, user, and product. When uncertain, Claude pauses to ask questions—deliberately constraining its independence. Users build trust through collaboration and adapt their supervision strategies accordingly.”

Levie’s Vertical AI Playbook

Aaron Levie points to the immense wealth and value awaiting unlocking: build Agent software that integrates proprietary data; solves real human problems; is densely packed with domain-specific context to maximize intelligent output; and—this part most founders overlook—drives change management on the customer side.

This final element is precisely why vertical AI is so hard to replicate. Anyone can wrap an API—but few can truly navigate the unique workflows, regulatory constraints, and organizational resistance embedded in medical billing, legal discovery, or building permit approvals.

SaaS grew tenfold every decade over the past several decades. Over the past 20 years, more than 40% of venture capital funding flowed into SaaS companies. The sector birthed over 170 SaaS unicorns. The logic is simple: each of those unicorns has a vertical AI counterpart waiting to emerge—and that AI version could be ten times larger, because it displaces not just software, but people.

The Co-Construction Imperative

Anthropic’s core finding deserves serious attention from anyone shaping AI policy. Autonomy is not an intrinsic property of models—it’s co-constructed by model, user, and product. Pre-deployment evaluations cannot capture this; you must measure it in real-world usage.

Anthropic stated officially:

“Software engineering accounts for roughly 50% of Agent tool calls on our API—but we’re also seeing emerging adoption across other industries. As the boundaries of risk and autonomy continue expanding, post-deployment monitoring becomes critical. We encourage other model developers to extend this research.”

Safety metrics are reassuring: 73% of tool calls involve humans-in-the-loop, and only 0.8% of actions are irreversible. Highest-risk deployment scenarios—like API key leakage or autonomous crypto transactions—are mostly security assessments, not production deployments.

“Regulatory requirements mandating specific interaction patterns—for example, requiring human approval for every action—create friction without necessarily improving safety.”

Policies forcing “approval for every action” kill productivity gains without enhancing safety. A better goal is ensuring humans can monitor and intervene—not prescribing rigid approval workflows.

Where Unicorns Hide

The map is drawn. Software engineering is already underway. Healthcare, law, finance, education, customer support, logistics—16 vertical domains, each with single-digit market share—are all waiting for someone to embed genuine domain expertise into Agents.

Three hundred SaaS unicorns have already emerged. The next 300 vertical AI unicorns are imminent. Founders who select a vertical, deeply embed domain expertise into their Agent, and figure out how to drive change management will own the enterprise software market for the next decade.

Models can already work for five hours—yet users only let them run for 42 minutes. That’s the signal: we’re still extremely early, with enormous scope for building—and countless places where intelligent automation hasn’t yet been applied for even one minute.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

Garryslist

@garryslist

Anthropic Data: Nearly Half of AI Agent Calls Concentrated in Software Engineering; These 16 Vertical Domains Remain Blue Oceans

TechFlow Selected TechFlow Selected