IOSG: When your browser becomes an agent

2025.08.19

IOSG: When your browser becomes an agent

The future of AI lies in agents capable of autonomously navigating web pages.

2025.08.19 - 07:46:18

代理加密技术

Navigating Web3 tides with focused insights

The future of AI lies in agents capable of autonomously navigating web pages.

By: Mario Chow & Figo @IOSG

Introduction

Over the past 12 months, the relationship between web browsers and automation has undergone a dramatic shift. Nearly every major tech company is racing to build autonomous browser agents. This trend became increasingly evident starting in late 2024: OpenAI launched its Agent mode in January, Anthropic released a "Computer Use" feature for its Claude models, Google DeepMind introduced Project Mariner, Opera announced its agent-powered browser Neon, and Perplexity AI unveiled the Comet browser. The signal is clear: the future of AI lies in agents capable of autonomously navigating the web.

This trend isn't just about adding smarter chatbots to browsers—it represents a fundamental transformation in how machines interact with digital environments. Browser agents are AI systems that can "see" web pages and take actions: clicking links, filling forms, scrolling, typing—just like human users. This paradigm promises immense productivity and economic value by automating tasks that currently require manual labor or are too complex for traditional scripting.

▲ GIF demonstration: An AI browser agent following instructions, navigating to a target dataset page, automatically capturing screenshots and extracting required data.

Who Will Win the AI Browser War?

Almost all major tech companies (and some startups) are developing their own browser AI agent solutions. Below are the most representative projects:

OpenAI – Agent Mode

OpenAI's Agent mode (formerly known as Operator, launched in January 2025) is a browser-integrated AI agent capable of handling various repetitive online tasks—such as filling web forms, ordering groceries, and scheduling meetings—all through standard web interfaces used by humans.

▲ An AI agent scheduling meetings like a professional assistant: checking calendars, finding available time slots, creating events, sending confirmations, and generating .ics files for you.

Anthropic – Claude's "Computer Use"

At the end of 2024, Anthropic introduced a new "Computer Use" capability for Claude 3.5, enabling it to operate computers and browsers like a human. Claude can view screens, move cursors, click buttons, and type text. This was the first large-model agent tool of its kind to enter public beta, allowing developers to let Claude automatically navigate websites and applications. Anthropic positions this as an experimental feature primarily aimed at automating multi-step workflows on the web.

Perplexity – Comet

AI startup Perplexity, known for its question-answering engine, launched the Comet browser in mid-2025 as an AI-driven alternative to Chrome. At its core, Comet features a conversational AI search engine built into the address bar (omnibox), delivering instant answers and summaries instead of traditional search links.

In addition, Comet includes the built-in Comet Assistant—an agent residing in the sidebar that can automate daily tasks across websites. For example, it can summarize your open emails, schedule meetings, manage browser tabs, or browse and extract web content on your behalf.

By using a sidebar interface that allows the agent to perceive current webpage content, Comet aims to seamlessly integrate browsing with AI assistance.

Real-World Applications of Browser Agents

Earlier, we reviewed how major tech companies (OpenAI, Anthropic, Perplexity, etc.) are equipping browser agents with functionality through different product forms. To better understand their value, let’s explore how these capabilities apply to real-life scenarios and enterprise workflows.

Daily Web Automation

#E-commerce & Personal Shopping

A highly practical use case involves delegating shopping and booking tasks to agents. Agents can automatically populate your online cart and check out based on a fixed list, find the lowest prices across retailers, and complete checkout processes on your behalf.

For travel, you could instruct an AI: “Book me a flight to Tokyo next month under $800, plus a hotel with free Wi-Fi.” The agent would handle the entire process—searching flights, comparing options, entering passenger details, and completing hotel bookings—entirely through airline and hotel websites. This level of automation goes far beyond existing travel bots: it doesn’t just recommend, it executes purchases directly.

#Boosting Workplace Productivity

Agents can automate many repetitive business operations people perform in browsers. Examples include organizing emails and extracting to-dos, checking availability across multiple calendars, and automatically scheduling meetings. Perplexity’s Comet Assistant already summarizes inbox content via the web interface or adds calendar events for you. With authorization, agents can log into SaaS tools to generate routine reports, update spreadsheets, or submit forms. Imagine an HR agent automatically posting jobs on various recruitment sites, or a sales agent updating lead data in a CRM system. These mundane tasks consume significant employee time, but AI can automate them through web form and page interactions.

Beyond single tasks, agents can chain together full workflows across multiple web systems. All these steps require operation within different web interfaces—the exact strength of browser agents. Agents can log into dashboards for troubleshooting or orchestrate processes such as onboarding new employees (creating accounts across multiple SaaS platforms). Essentially, any multi-step operation currently requiring navigation across multiple websites can be delegated to an agent.

Current Challenges and Limitations

Despite their vast potential, today’s browser agents still fall short of perfection. Current implementations reveal persistent technical and infrastructural challenges:

Architectural Mismatch

The modern web was designed for human-operated browsers and has evolved over time to actively resist automation. Data is often buried within HTML/CSS optimized for visual display, locked behind interactive gestures (hover, swipe), or accessible only via undocumented APIs.

On top of this, anti-bot and anti-fraud systems add artificial barriers. These tools combine IP reputation, browser fingerprinting, JavaScript challenge responses, and behavioral analysis (e.g., randomness in mouse movements, typing rhythm, dwell time). Ironically, the more "perfect" and efficient an AI agent behaves—such as instantly filling forms without errors—the more likely it is to be flagged as malicious automation. This can lead to hard failures: for example, OpenAI or Google’s agents may complete all checkout steps successfully but ultimately be blocked by CAPTCHA or secondary security filters.

The combination of human-optimized interfaces and robot-hostile defenses forces agents into fragile "human mimicry" strategies. These approaches are highly prone to failure, with low success rates (fewer than one in three full transactions succeed without human intervention).

Trust and Security Concerns

Granting agents full control typically requires access to sensitive information: login credentials, cookies, two-factor authentication tokens, and even payment details. This raises understandable concerns for both users and enterprises:

What if the agent makes a mistake or is tricked by a malicious website?
If an agent agrees to terms of service or completes a transaction, who is responsible?

Due to these risks, current systems generally adopt cautious approaches:

Google’s Mariner won’t input credit card details or accept terms of service—it hands control back to the user.
OpenAI’s Operator prompts users to take over for login or CAPTCHA challenges.

Anthropic’s Claude-powered agent may outright refuse to log in due to security considerations.

The result: frequent pauses and handoffs between AI and humans, undermining the seamless automation experience.

Despite these obstacles, progress continues rapidly. Companies like OpenAI, Google, and Anthropic learn from failures with each iteration. As demand grows, a "co-evolution" may emerge: websites becoming more agent-friendly in beneficial scenarios, while agents improve their ability to mimic human behavior and bypass existing barriers.

Approaches and Opportunities

Current browser agents face two starkly different realities: on one hand, the hostile environment of Web2, filled with anti-bot and security defenses; on the other, the open landscape of Web3, where automation is often encouraged. This divergence shapes the direction of various solutions.

The following solutions can be broadly categorized into two types: those helping agents bypass Web2’s hostile environment, and those natively built for Web3.

Although challenges remain significant, new projects continue to emerge, aiming to directly address these issues. Cryptocurrency and decentralized finance (DeFi) ecosystems are becoming natural testing grounds, thanks to their openness, programmability, and lower hostility toward automation. Open APIs, smart contracts, and on-chain transparency eliminate many friction points common in the Web2 world.

Below are four categories of solutions, each addressing one or more core limitations of today’s browser agents:

Native Agent Browsers for On-Chain Operations

These browsers are built from the ground up for autonomous agent operation and deeply integrated with blockchain protocols. Unlike traditional Chrome browsers—which rely on additional tools like Selenium, Playwright, or wallet extensions for on-chain automation—native agent browsers provide direct APIs and trusted execution paths for agents to invoke.

In decentralized finance, transaction validity depends on cryptographic signatures, not whether a user "acts like a human." Thus, in on-chain environments, agents can bypass common Web2 obstacles such as CAPTCHA, fraud detection scores, and device fingerprinting. However, when accessing Web2 sites like Amazon, these browsers cannot circumvent associated defenses and will still trigger standard anti-bot measures.

The value of agent browsers isn’t magical access to all websites, but rather:

Native blockchain integration: Built-in wallets and signature support, eliminating MetaMask popups or DOM parsing of dApp frontends.
Automation-first design: Stable high-level commands that map directly to protocol operations.
Security model: Granular permission controls and sandboxing to keep private keys secure during automation.
Performance optimization: Ability to parallelize multiple on-chain calls without browser rendering or UI latency.

#Example: Donut

Donut treats blockchain data and operations as first-class citizens. Users (or their agents) can hover to see real-time risk metrics for tokens, or simply enter natural language commands like "/swap 100 USDC to SOL". By skipping the hostile friction points of Web2, Donut enables agents to run at full speed in DeFi, enhancing liquidity, arbitrage, and market efficiency.

Verifiable and Trustworthy Agent Execution

Granting agents access to sensitive permissions carries significant risk. Solutions in this category use Trusted Execution Environments (TEEs) or Zero-Knowledge Proofs (ZKPs) to cryptographically verify an agent’s intended behavior before execution, enabling users and counterparties to validate agent actions without exposing private keys or credentials.

#Example: Phala Network

Phala uses TEEs (e.g., Intel SGX) to isolate and protect the execution environment, preventing Phala operators or attackers from spying on or tampering with agent logic and data. A TEE acts like a hardware-backed "secure vault," ensuring confidentiality (external parties can't see inside) and integrity (external parties can't modify contents).

For browser agents, this means they can log in, hold session tokens, or process payment information—all while sensitive data never leaves the secure vault. Even if the user’s machine, operating system, or network is compromised, the data remains protected. This directly addresses one of the biggest hurdles to agent adoption: trust in handling sensitive credentials and operations.

Decentralized Structured Data Networks

Modern anti-bot detection systems don’t just flag requests as "too fast" or "automated"—they also analyze IP reputation, browser fingerprints, JavaScript challenge responses, and behavioral signals (e.g., cursor movement, typing rhythm, session history). Agents originating from data center IPs or fully reproducible browsing environments are easily detected.

To solve this, such networks avoid scraping human-optimized web pages altogether, instead collecting and providing machine-readable data—or routing traffic through real human browsing environments. This bypasses the fragility of traditional crawlers in parsing and anti-scraping battles, offering cleaner, more reliable inputs for agents.

By proxying agent traffic through these real-world sessions, distributed networks allow AI agents to access web content like humans, avoiding immediate blocks.

#Examples

Grass: A decentralized data / DePIN network where users share idle residential broadband, providing bot-friendly, geographically diverse access for public web data collection and model training.
WootzApp: An open-source mobile browser supporting cryptocurrency payments, featuring background agents and zero-knowledge identity; it gamifies AI/data tasks for consumers.
Sixpence: A distributed browser network routing AI agent traffic through global contributors’ browsing sessions.

However, this isn’t a complete solution. Behavioral detection (mouse/scroll patterns), account-level restrictions (KYC, account age), and fingerprint consistency checks can still trigger blocks. Therefore, distributed networks should be seen as a foundational obfuscation layer, best combined with human-like execution strategies for maximum effectiveness.

Agent-Friendly Web Standards (Forward-Looking)

Increasingly, tech communities and organizations are exploring how websites should securely and compliantly interact with automated agents—not just humans—in the future.

This has sparked discussions around emerging standards and mechanisms designed to let websites explicitly declare "I allow trusted agents to access," and provide secure channels for interaction—instead of today’s default approach of treating agents as "bot attacks."

"Agent Allowed" tags: Just as search engines respect robots.txt, future web pages might include a tag signaling to browser agents: "Safe access permitted here." For instance, when booking a flight via an agent, the site wouldn’t serve CAPTCHAs but instead offer an authenticated API endpoint.
API gateways for verified agents: Websites could open dedicated entry points for authenticated agents—like a "fast lane." Instead of mimicking clicks and keystrokes, agents could use stable API paths to complete orders, payments, or data queries.
W3C discussions: The World Wide Web Consortium (W3C) is already exploring standardized channels for "managed automation." This could lead to globally accepted rules enabling websites to recognize and accept trusted agents while maintaining security and accountability.

While still early, these initiatives could dramatically improve human↔agent↔website relationships. Imagine no longer needing agents to simulate human mouse movements to "trick" risk controls—but instead openly completing tasks through an "officially allowed" channel.

On this path, crypto-native infrastructure may take the lead. Since on-chain apps inherently rely on open APIs and smart contracts, they’re naturally automation-friendly. In contrast, traditional Web2 platforms—especially those dependent on ads or fraud prevention—may remain defensive. But as users and businesses increasingly embrace the efficiency gains of automation, these standardization efforts could become key catalysts for the internet’s evolution toward "agent-first architecture."

Conclusion

Browser agents are evolving from simple conversational tools into autonomous systems capable of executing complex online workflows. This shift reflects a broader trend: embedding automation directly into the core interface of human-internet interaction. While the potential for productivity gains is immense, so are the challenges—including overcoming entrenched anti-bot mechanisms and ensuring security, trust, and responsible usage.

In the short term, improvements in agent reasoning, speed, tighter integration with existing services, and advances in distributed networks may gradually increase reliability. Long-term, we may see the gradual adoption of "agent-friendly" standards in scenarios where automation benefits both service providers and users. However, this transition won’t be uniform: adoption will be faster in automation-friendly environments like DeFi, and slower on Web2 platforms heavily reliant on human interaction control.

Going forward, competition among tech companies will increasingly center on how well their agents navigate real-world constraints, integrate securely into critical workflows, and deliver consistent results across diverse online environments. Whether this ultimately reshapes the "browser wars" depends not just on technical prowess, but on building trust, aligning incentives, and demonstrating tangible value in everyday use.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

IOSG Ventures