Hunting Airdrop "Sheepherders": How to Use AI to Identify 90% of Sybil Addresses?

2025.06.13

Hunting Airdrop "Sheepherders": How to Use AI to Identify 90% of Sybil Addresses?

Binance's risk management team, in collaboration with academia, has proposed a new detection system based on "AI + blockchain graph analysis" for identifying Sybil addresses.

2025.06.13 - 03:04:26

空投AI女巫

Navigating Web3 tides with focused insights

Binance's risk management team, in collaboration with academia, has proposed a new detection system based on "AI + blockchain graph analysis" for identifying Sybil addresses.

By Nicky, Foresight News

This article is compiled and summarized based on the research paper titled "Detecting Sybil Addresses in Blockchain Airdrops: A Subgraph-based Feature Propagation and Fusion Approach."

Recently, Binance's Risk Management team collaborated with Zand AI and ZEROBASE to publish a new academic paper on Sybil attacks. To help readers quickly grasp the core insights of this study, the author has summarized the key findings after thoroughly reviewing the paper.

In cryptocurrency airdrop campaigns, there exists a group of special participants operating in the shadows. They are not ordinary users but instead use automated scripts to generate hundreds or even thousands of fake addresses—these are the notorious "Sybil addresses." Like parasites, these addresses latch onto major airdrop events from well-known projects such as Starknet and LayerZero, draining project budgets, diluting rewards for genuine users, and undermining the foundational fairness of blockchain ecosystems.

To counter this ongoing technological cat-and-mouse game, Binance’s risk control team, together with academic partners, developed an AI-powered detection system called “subgraph-based lightGBM,” achieving a 90% accuracy rate in identifying Sybil addresses during real-world data testing.

The Three "ID Cards" of Sybil Addresses

Why can these cheating addresses be precisely identified? By analyzing transaction records of 193,701 real addresses (including 23,240 confirmed Sybil addresses), the research team discovered three distinct behavioral patterns inevitably left behind:

Temporal Fingerprints represent the primary clue. Sybil addresses exhibit an uncanny precision in timing: critical actions—from receiving initial gas fees and executing their first transaction to participating in an airdrop—are often completed within extremely short timeframes. In contrast, legitimate users display random behavior patterns; no one would create a new address solely for an airdrop and then immediately discard it after claiming rewards.

Funding Trajectories reveal economic motives. These addresses maintain balances just above the minimum required threshold ("just enough to function"), minimizing capital costs. Once they receive rewards, funds are rapidly withdrawn. Even more telling, transfer amounts across multiple operations show high consistency, lacking the natural variability seen in authentic user transactions.

Relationship Networks serve as definitive evidence. By constructing transaction graphs, the team identified three typical topological structures:

Star Network: A central "command hub" distributes funds to dozens of subordinate addresses.
Chain Structure: Funds are passed linearly between addresses like a relay baton, fabricating artificial activity records.
Tree Diffusion: Multi-layer branching structures are used in attempts to evade detection.

These patterns expose coordinated, programmatic behavior—the very feature most difficult for traditional detection methods to replicate.

Two-Layer Networks: The AI Detective’s Investigation Tool

Tracking all blockchain transaction data is akin to finding a needle in a haystack. The research team adopted a two-layer subgraph model—similar to how detectives investigate not only the target individual (Address A) but also their direct contacts (addresses that sent funds to A or received transfers from A), as well as those contacts’ own connections (second-degree relationships).

More importantly, the team introduced an innovative "feature fusion technique": the system aggregates behavioral characteristics of neighboring addresses into a comprehensive "behavioral profile" for the target address. For example, it calculates composite metrics such as the minimum, maximum, average, and volatility of outgoing transfer amounts among all connected addresses to describe fund flow patterns. It also analyzes in-degree and out-degree (number of linked addresses) to assess network density. This design enables efficient analysis of over 5.8 million transactions without suffering the computational burden of scanning the entire network—a common pitfall of traditional approaches.

Real-World Validation: Catching "Ghosts" in Binance Airdrops

The system was tested against real-world data from Binance’s soulbound token (BAB) airdrop. BAB, launched by Binance in 2022 as a soulbound token to verify KYC-completed authentic users, serves as an ideal testbed for detecting Sybil behavior.

The team first identified suspicious addresses through manual analysis and clustering, then established an appeal review process to confirm final Sybil labels. During data cleaning, institutional addresses (e.g., exchange hot wallets), smart contracts, and addresses active for over one year were excluded—since Sybil operators typically abandon old addresses to avoid detection—ensuring dataset purity.

The results showed high detection accuracy across all three types of cheating networks:

Star Network: 99% detection rate (previous best: 95%)
Chain Structure: 100% detection rate (previous best: 95%)
Tree Diffusion: 97% detection rate (previous best: 95%)

All four key evaluation metrics surpassed 0.9: Precision reached 0.943 (up from 0.796 in prior state-of-the-art models), Recall hit 0.918 (indicating over 91% of Sybil addresses were successfully caught), F1-score achieved 0.930, and AUC climbed to 0.981 (approaching perfect classification). This means projects can significantly reduce false positives affecting real users while effectively closing loopholes exploited by fraudsters.

Technical Boundaries and the Future Battlefield

Currently, this technology is primarily suited for long-term airdrop scenarios (such as phased soulbound token distributions), where sufficient labeled data can be accumulated for AI training. In terms of blockchain compatibility, it supports EVM-compatible chains (e.g., BNB Chain, Polygon), but not UTXO-model blockchains like Bitcoin. However, the paper notes that high gas fees make airdrops extremely rare on UTXO chains, so this limitation has limited practical impact.

The research team emphasizes that the potential of this technology extends far beyond airdrop monitoring. Since it detects anomalies via transaction networks and behavioral patterns, it could equally apply to:

Detecting market manipulation (e.g., coordinated addresses involved in pump-and-dump schemes).
Assessing token liquidity risks (identifying fake trading pairs).
Building on-chain credit scoring systems.

As Sybil attack strategies continue evolving, the technological arms race to preserve fairness in Web3 will drive detection systems toward greater intelligence and broader applicability.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

Foresight News

Hunting Airdrop "Sheepherders": How to Use AI to Identify 90% of Sybil Addresses?

TechFlow Selected TechFlow Selected