EthStorage Founder: How does data availability ensure Rollup security?

2023.07.03

EthStorage Founder: How does data availability ensure Rollup security?

This episode explores Rollup decentralization from the perspective of "data availability and decentralized storage."

2023.07.03 - 02:35:38

EthStorageRollup

Navigating Web3 tides with focused insights

This episode explores Rollup decentralization from the perspective of "data availability and decentralized storage."

Host: Franci

Guest: Qi Zhou, Founder of EthStorage

Introduction

This is the final episode of the decentralized rollup interview series. In this episode, we explore rollup decentralization from the perspective of "data availability and decentralized storage." We are honored to have Qi Zhou, founder of EthStorage, join us to discuss how DA reuses Ethereum's security properties, EIP-4844 and danksharding, comparative security across different DA models, and how EthStorage will integrate with EIP-4844 in the next Ethereum upgrade.

About the Guest

I'm excited to share our thoughts on Ethereum's DA technology and our work building a decentralized storage layer on top of it. I joined the Web3 industry full-time in 2018, previously working as an engineer at major tech companies like Google and Facebook. I also hold a PhD from the Georgia Institute of Technology. Since 2018, I’ve been deeply involved in Web3 infrastructure development—driven both by my prior experience in distributed systems and distributed storage at big tech firms, and by the belief that there’s significant room for innovation in blockchain infrastructure. From early efforts in execution sharding (Ethereum sharding 1.0), to today’s data sharding (sharding 2.0), and now data availability (DA), our focus has always been advancing core Web3 infrastructure through research and engineering.

We closely follow the Ethereum roadmap, conduct research, and actively participate in community-driven improvements. At the end of last year, we were honored to receive support from the Ethereum Foundation for our research on "data availability sampling"—contributing theoretical work on danksharding topics such as efficient data recovery. At the same time, we're building EthStorage—an Ethereum data layer based on Ethereum’s DA technology. Using Ethereum smart contracts, we enable large-scale verification of off-chain data storage, which brings meaningful value to Ethereum. Today, I’m thrilled to share insights into how EthStorage leverages DA technologies to build a more robust decentralized data storage network.

Interview Section

Part One: Defining Data Availability (DA)

How Does Data Availability (DA) Ensure Rollup Security?

In my research on DA, I've found that many people misunderstand its definition. It’s great to clarify this today—I’ve discussed DA extensively with members of the Ethereum Foundation, including Dankrad Feist, about its critical role in Ethereum L2 ecosystems.

As you may know, rollups move transactions off the main chain and then use proof mechanisms—fraud proofs or validity proofs—to convince L1 smart contracts that their execution results are valid.

A key goal here is to reuse Ethereum’s existing network security while significantly scaling its computational capacity. This scalability comes from moving computation off-chain—but how do we preserve Ethereum-level security?

Take Optimistic Rollups: to challenge a malicious sequencer, users must access the original off-chain transaction data. If that raw data isn’t available, no one can retrieve the records needed to submit a fraud proof against the sequencer. Therefore, DA ensures security by requiring all off-chain transaction metadata to be published on-chain.

Scaling Block Space

Even if Ethereum doesn't execute these transactions, it still needs to store the transaction data—generating massive amounts of data. The core problem DA solves can be seen as a highly effective method for expanding block space. For those familiar with blockchain structure, each block contains numerous transactions—the capacity within each block is known as "block space."

Currently, an Ethereum block is roughly 200–300 KB. This clearly won't suffice for future scaling demands. A quick calculation: 200 KB divided by ~100 bytes per transaction gives ~2,000 transactions per block. Dividing that by Ethereum’s 12-second block time yields a TPS cap of around 100—far too low for Ethereum’s long-term growth.

Thus, Ethereum L2s focus on how to securely store vast amounts of block data within this limited block space, enabling both fraud and validity proofs to verify data directly on Ethereum. This way, the correctness and security of off-chain computations are ultimately guaranteed by Ethereum. This illustrates the fundamental relationship between DA and Ethereum’s security model.

Understanding DA Through Bandwidth and Storage Costs

The main costs associated with DA fall into two categories: network bandwidth and storage.

From a bandwidth perspective, current P2P networks like Bitcoin and Ethereum broadcast new blocks via gossip protocols, ensuring every node eventually receives a copy. While secure, this approach imposes high bandwidth and latency overheads.

After Ethereum’s transition to PoS, blocks are produced every 12 seconds. If block size grows too large, propagation delays could exceed this interval, causing missed blocks and degrading overall network performance. Thus, DA fundamentally addresses the challenge of uploading large volumes of data without overwhelming the network’s bandwidth capacity.

Secondly, there's the issue of storage cost. The Ethereum Foundation has actively discussed this. Their design principle avoids permanently storing DA-uploaded data on-chain.

This raises another question: once data is uploaded but later discarded by the protocol (e.g., after one or two weeks), how can we ensure its long-term preservation through better decentralized solutions?

This was one of our motivations in designing EthStorage. First, many rollups need longer data retention periods. Second, with persistent data access, we can build richer on-chain applications—such as fully on-chain NFTs, DApp frontends, or even social media content like articles and comments. These could leverage the DA network to upload vast amounts of data onto the blockchain at lower cost, while inheriting Ethereum L1’s security guarantees.

Through our research and discussions with core Ethereum contributors, we realized Ethereum needs a dedicated storage layer—one that is decentralized, modular, and does not require changes to Ethereum’s base protocol—to solve the problem of long-term data preservation.

Part Two: Comparing Different DA Solutions

The Relationship Between EIP-4844 and Danksharding, and Why EIP-4844 Is Necessary

Proto-danksharding, also known as EIP-4844, is one of Ethereum’s most significant upcoming upgrades. The reason for introducing EIP-4844 stems from the Ethereum Foundation’s estimate that full danksharding deployment would take several years—possibly three to five—starting from around 2020–2021.

During this period, they anticipated a surge in L2 rollups operating on Ethereum. However, danksharding introduces a completely new data interface, whereas current rollups rely on calldata. Without a transitional solution, applications would face costly and complex migrations when upgrading to danksharding.

At last year’s Devcon, Vitalik emphasized the importance of enabling Layer 2 projects to develop using the same interface that danksharding will eventually provide. That way, once danksharding launches, they can seamlessly benefit from its capabilities without needing to rewrite or retest their existing contracts.

EIP-4844 is essentially a simplified version of danksharding. It provides the same application-facing interface, including a new opcode called datahash and a new data object called Blob (Binary Large Object). These elements allow rollups to begin adopting danksharding-compatible data structures ahead of time. When danksharding rolls out, blobs and data hashes will function identically—so early adopters via EIP-4844 gain a smooth migration path.

You can already see glimpses of danksharding’s future architecture in EIP-4844’s design—its precompiles, opcodes, and interfaces reflect how Ethereum plans to interact with applications post-upgrade.

From an application developer’s standpoint, EIP-4844 allows Ethereum to proactively support L2 innovation, letting teams scale efficiently without future upgrade burdens.

However, EIP-4844 does not solve the underlying block space scalability issue—that’s reserved for full danksharding. Current Ethereum blocks are ~200 KB; danksharding aims for 32 MB per block, a nearly 100x increase. EIP-4844 alone does not address on-chain bandwidth limitations.

How Danksharding Scales Block Space

Under EIP-4844, blob data is still propagated across the P2P network using traditional gossip protocols—same as calldata. This method remains constrained by physical bandwidth limits of peer-to-peer networks.

Danksharding, however, overhauls this model by introducing data sampling techniques. Validators don’t need to download entire 32 MB blocks—they only sample small portions. Yet, through cryptographic assurance, they can confirm that the full data is available and retrievable.

In a sense, this resembles zero-knowledge principles: I can cryptographically verify that the network contains 32 MB of block data per slot introduced by danksharding, without downloading all of it locally. High-end nodes with sufficient bandwidth and storage may choose to store everything—but ordinary validators are not required to.

Experience Developing on EIP-4844 Testnets

We’ve recently launched our internal EIP-4844 testnet and successfully deployed and tested contracts involving blob data uploads, contract calls, and data verification. Once EIP-4844 goes live, we’ll be ready to deploy our contracts immediately.

Through collaboration with Ethereum developers and our own tooling, we aim to accelerate adoption for other rollups—providing resources, documentation, and development tools.

We’ve contributed extensive code to Ethereum’s ecosystem, particularly around EIP-4844 tooling—including new smart contracts supporting the datahash opcode, since Solidity currently lacks native support. All of this work is being coordinated with Ethereum Foundation developers.

Applications and Limitations of Data Availability Committees (DACs)

Currently, over 90% of user fees on L2s go toward data availability costs. To reduce upload expenses, some projects like ZKSync (with ZKPorter) and Arbitrum (with Arbitrum Nova) have implemented proprietary DACs—Data Availability Committees—as their data layers.

These DACs introduce additional trust assumptions to achieve security comparable to Ethereum. Typically, they include well-known entities—large cloud providers or reputable organizations—as committee members responsible for data storage.

Yet, this model faces criticism for violating decentralization and permissionless access principles. Most DACs consist of only a few organizations closely tied to the respective L2 project.

For example, Arbitrum Nova reportedly has six or seven nodes—hosted on Google Cloud or AWS—responsible for storing all execution data. This setup reduces transaction costs to about 0.1% of Ethereum’s, since data isn’t written to L1. But due to its centralized nature, high-value applications remain cautious—especially when managing millions or billions in assets, where trust in DAC availability becomes a critical risk.

In contrast, EthStorage operates without any DAC concept. Our design enables anyone to become a data provider in a permissionless manner. Providers must cryptographically prove they actually store the data.

Theoretically, a DAC claiming to have seven or eight nodes might actually run just one physical copy while appearing distributed. How can we prove sufficient physical redundancy to ensure data safety?

This is a key innovation in EthStorage—and a point we emphasized when presenting to the Ethereum Foundation’s ESP (Ecosystem Support Program). Using ZK-based cryptography, we ensure L2 data providers can join permissionlessly and cryptographically prove they maintain multiple physical copies, thereby enhancing data security.

We believe DACs are a temporary workaround to reduce L1 data costs. With EthStorage’s cryptographic techniques and Ethereum-based contract verification, we offer a superior long-term solution for secure, decentralized data storage. As EIP-4844 launches, we’ll actively share our innovations and real-world performance metrics.

How EthStorage Differs from DACs

EthStorage is essentially a storage rollup—a "storage layer" built atop Ethereum. Imagine a Layer 2 not executing EVM logic, but functioning as a massive database—a key-value store capable of handling tens, hundreds of terabytes, or even petabyte-scale data.

How do we ensure this database enjoys Ethereum-level security? Step one: publish all large-scale data via DA onto Ethereum L1, making it publicly accessible. However, we cannot guarantee permanent availability—Ethereum DA will discard this data after ~2–4 weeks.

Step two: after uploading, we distribute and preserve the data across our Layer 2 nodes. Unlike DACs, our storage nodes are permissionless—anyone can participate, prove storage, and earn rewards.

This relies on a novel proof-of-storage mechanism inspired by Filecoin and Arweave, but specifically adapted to Ethereum’s DA framework and smart contract verification. We believe this represents a unique contribution—not only to Ethereum’s ecosystem but to decentralized storage as a whole.

How the Proof-of-Storage Mechanism Works

Most proof-of-storage systems—including Filecoin and Arweave—begin by encoding user metadata. Crucially, this encoding is tied to the storage provider’s address. Each provider must have a unique address, and after encoding, they store what’s called a "unique replica."

For example, in traditional databases or distributed systems, "hello world" might be copied four or five times across machines—each identical. In EthStorage, even if we store ten or twenty copies, each instance of "hello world" is uniquely encoded based on the provider’s address, resulting in distinct data stored in different locations.

This allows us to use cryptography to prove that multiple independent providers—with different addresses—have stored encoded versions of the data and submitted corresponding storage proofs. Like Filecoin and Arweave, but with a crucial difference: we focus on hot, dynamic DA data rather than static files, and our proofs are verifiable on Ethereum smart contracts.

Each encoded piece of data is provably unique because it depends on the provider’s address. This ensures true replication across the network—not just logical duplication of the same file.

Our design improves upon existing decentralized storage paradigms while optimizing for Ethereum’s DA requirements—especially dynamic data updates, efficient gas usage, and scalable verification. Many cutting-edge research challenges remain, but we’re making strong progress.

How EthStorage Maintains Permissionless Proof-of-Storage

Ethereum has archive nodes that retain full historical transaction and state data. But danksharding poses a major challenge: it’s projected to generate ~80 TB of data annually. After 3–4 years, that could reach 200–300 TB and keep growing. Archive nodes currently lack economic incentives to sustain this burden.

EthStorage first addresses the incentive problem for permanent data storage. We adopt Arweave’s discounted cash flow model to create sustainable tokenomics, efficiently executable within smart contracts.

Second, we maintain permissionless participation. Our incentive design encourages 10, 50, or even 100 nodes to store data. Any node can sync data from peers and become a storage provider—enabling organic network growth. Further optimizations in incentive design are underway.

Third, storing hundreds of terabytes—or eventually petabytes—is prohibitively expensive for individual nodes. To address this, we implement data sharding. Ordinary nodes only need ~4 TB of storage (currently designed at 4 TB, potentially upgradable to 8 TB) to store a portion of the archival data. Through incentives, we ensure the complete dataset is collectively preserved across the network.

We tackle multiple challenges: excessive data volume for archive nodes, incentive alignment, and permissionless access—all solvable via Ethereum smart contracts on L1. Our role is to provide the data network: anyone with sufficient hardware can download data, generate proofs, submit them to Ethereum, and earn rewards. Our contracts are nearly finalized and already undergoing testing on the EIP-4844 Devnet.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

ECN | Ethereum.cn

EthStorage Founder: How does data availability ensure Rollup security?

TechFlow Selected TechFlow Selected