Deep Dive into Proof of Storage: Achieving Cross-Time, Cross-Chain Blockchain State Awareness

2023.10.13

Deep Dive into Proof of Storage: Achieving Cross-Time, Cross-Chain Blockchain State Awareness

Proof of storage enables Ethereum to emerge as a layer for identity and asset ownership, not just a settlement layer.

2023.10.13 - 02:44:51

Navigating Web3 tides with focused insights

Proof of storage enables Ethereum to emerge as a layer for identity and asset ownership, not just a settlement layer.

Authored by: LongHash Ventures

Translated by: TechFlow

What if you lost your memory every hour and had to constantly ask others what you’ve done? This is the current state of smart contracts. On blockchains like Ethereum, smart contracts cannot directly access states beyond 256 blocks. This problem becomes even more severe in a multi-chain ecosystem, where retrieving and verifying data across different execution layers is even more challenging.

In 2020, Vitalik Buterin and Tomasz Stanczak proposed a method for accessing data across time. Although this EIP proposal stalled, its need has reemerged in a rollup-centric multi-chain world. Today, proof of storage has become a frontier field, empowering smart contracts with awareness and memory.

Ways to Access On-Chain Data

Dapps can access data and state through multiple methods. All these methods require applications to trust humans/entities, cryptographic-economic security, or code to some extent, each involving trade-offs:

Trust Humans/Entities:

Archive Nodes: Operators can run their own archive nodes or rely on providers such as Alchemy and Infura to access all data from genesis block onward. They provide the same data as full nodes, including all historical state data of the entire blockchain. Off-chain services like Etherscan and Dune Analytics use archive nodes to access on-chain data. Off-chain participants can attest to the validity of this data, and on-chain smart contracts can verify that the data was signed by trusted parties or committees. However, the integrity of the underlying data cannot be verified. This approach requires dapps to trust archive node service providers to operate infrastructure correctly without malicious intent.

Trust Cryptographic-Economic Security:

Indexers: Indexing protocols organize all data on the blockchain, allowing developers to build and publish open APIs so applications can query data. A single indexer is a node operator that stakes tokens to provide indexing and query processing services. However, disputes may arise when incorrect data is provided, and the arbitration process takes time. Additionally, data from indexers like The Graph cannot be directly used in the business logic of smart contracts but are instead used in web2-based data analytics contexts.
Oracles: Oracle service providers aggregate data from many independent node operators. The challenge here is that data from oracles may not be frequently updated and has limited scope. Oracles like Chainlink typically maintain only specific states, such as price information, making them infeasible for application-specific or historical data. Moreover, this method introduces some bias into the data, requiring trust in node operators.

Trust Code:

Special Variables and Functions: Blockchains like Ethereum have special variables and functions mainly used to provide blockchain information or general utility functions. Smart contracts can only access block hashes from the most recent 256 blocks. For scalability reasons, block hashes are not available for all blocks. Being able to access historical block hashes would be extremely useful, as it would allow verification via proofs against them. There are no opcodes in the EVM execution environment that can access old block contents, previous transaction contents, or receipt outputs, allowing nodes to safely forget this content while still processing new blocks. This method is also limited to a single blockchain.

Given the challenges and limitations of these solutions, there is a clear need for on-chain storage and provision of block hashes. This is where proof of storage comes in. To better understand proof of storage, let’s briefly look at data storage in blockchains.

Data Storage in Blockchains

A blockchain is a public database updated and shared among many computers in a network. Data and state are stored in continuous groups of blocks, with each block cryptographically referencing its parent block by storing the hash of the previous block header.

Take an Ethereum block as an example. Ethereum uses a special Merkle tree called a "Merkle Patricia Tree" (MPT). An Ethereum block header contains the roots of four different Merkle-Patricia trees: the state tree, storage tree, receipts tree, and transactions tree. These four trees encode mappings containing all Ethereum data. Merkle trees are used due to their efficiency in data storage. Through recursive hashing, only the root hash needs to be stored, saving significant space. They allow anyone to prove the existence of elements in the tree by demonstrating that recursively hashing nodes leads to the same root hash. Merkle proofs enable lightweight clients on Ethereum to obtain answers to questions such as:

Does this transaction exist in a specific block?
What is my account's current balance?
Does this account exist?

Unlike downloading every transaction and every block, "light clients" can download just the chain of block headers and use Merkle proofs to verify information. This makes the entire process highly efficient.

See this blog research article by Vitalik and Maven11 for a deeper understanding of implementations, benefits, and challenges related to Merkle trees.

Proof of Storage

Proof of storage allows us to use cryptographic proofs to demonstrate that something was recorded in a database and is valid. If we can provide such a proof, it constitutes a verifiable claim that something occurred on the blockchain.

What can proof of storage enable?

Proof of storage enables two main capabilities:

Access historical on-chain data beyond the last 256 blocks, going all the way back to the genesis block
Access on-chain data (historical and current) from one blockchain on another blockchain, leveraging consensus validation or L2 bridges (for L2s)

How does proof of storage work?

Simply put, proof of storage checks whether a specific block is part of the canonical history of the blockchain, then verifies whether the requested specific data is part of that block. This can be achieved through:

On-chain Processing: A dapp can obtain an initial trusted block and pass previous blocks as calldata to traverse back to the genesis block. This requires massive on-chain computation and large amounts of calldata. Due to the enormous on-chain computational requirements, this method is entirely impractical. Aragon attempted using an on-chain approach in 2018 but found it unfeasible due to high on-chain costs.
Using Zero-Knowledge Proofs: Similar to on-chain processing, except zero-knowledge proofs offload complex computations off-chain.

Accessing data within the same chain: Zero-knowledge proofs can assert that any historical block header is an ancestor of one of the most recent 256 block headers accessible in the execution environment. Another approach is to index the complete history of the source chain and generate zero-knowledge proofs to demonstrate correct indexing. This proof is periodically updated as new blocks are added to the source chain.

Accessing cross-chain data: Providers collect block headers of the source chain on the target chain and use zero-knowledge consensus proofs to validate their correctness. Existing cross-chain messaging solutions such as Axelar, Celer, or LayerZero can also be used to query block headers.
Maintain a cache of block header hashes of the source chain on the target chain, or the root hash of an off-chain block hash accumulator. This cache is regularly updated and used to efficiently prove on-chain that a given block exists and has a cryptographic link to the most recently accessible block hash from state. This process is known as proving chain continuity. Alternatively, a dedicated blockchain can store block headers of all source chains.
Access historical data/blocks from off-chain indexed data or on-chain caches (depending on request complexity) based on requests from dapps on the target chain. Cached block header hashes are maintained on-chain, while actual data may be stored off-chain.
Check for data presence in the specified block via Merkle inclusion proofs and generate zero-knowledge proofs for this. The proof is combined with a zero-knowledge proof of correct indexing or a zero-knowledge consensus proof and submitted on-chain for trustless verification.
The dapp can then verify the proof on-chain and use the data to execute required operations. In addition to verifying the zero-knowledge proof, public parameters (such as block number and block hash) are also checked against the block header cache maintained on-chain.

Projects adopting this approach include Herodotus, Lagrange, Axiom, HyperOracle, Brevis Network, and nil Foundation. Despite significant efforts to make applications state-aware across multiple blockchains, IBC (Inter-Blockchain Communication) stands out as an interoperability standard supporting applications using ICQ (Interchain Queries) and ICA (Interchain Accounts). ICQ allows applications on Chain A to query the state of Chain B by including queries in simple IBC packets, while ICA enables one blockchain to securely control accounts on another blockchain. Combining them supports interesting cross-chain use cases. RaaS providers like Saga will default to offering these capabilities to all app chains using IBC.

Proof of storage can be optimized in various ways to achieve the best balance between memory consumption, proof generation time, verification time, computational efficiency, and developer experience. The entire process can be broadly divided into three main sub-processes:

Data access;
Data processing;
Zero-knowledge proof generation for data access and processing.

Data Access: In this sub-process, service providers natively access block headers of the source chain at the execution layer or maintain an on-chain cache. For cross-chain data access, source chain consensus must be validated on the target chain. Methods and optimizations adopted include:

Existing Ethereum Blockchain: Leverage the existing structure of the Ethereum blockchain, using zero-knowledge proofs to prove the value of any historical storage slot relative to the current block header. This can be viewed as a large inclusion proof—given a recent block header X at height b, there exists a block header Y at height b-k that is an ancestor of X. This relies on Ethereum consensus security and requires an efficient proof system. This is the approach taken by Lagrange.
On-chain Merkle Mountain Ranges (MMR) Cache: A Merkle Mountain Range can be seen as a list of Merkle trees that combine when two trees reach the same size. Individual Merkle trees in an MMR are combined by adding parent nodes to the previous root of the tree. MMRs are similar to Merkle trees but offer additional advantages such as efficient appending of elements and efficient data queries, especially for reading sequential data from large datasets. Appending new headers via Merkle trees requires passing all sibling nodes at each level. To efficiently append data, Axiom uses MMRs to maintain an on-chain cache of block header hashes. Herodotus stores the root hash of an on-chain MMR block hash accumulator. This allows them to check retrieved data against these block header hashes via inclusion proofs. This method requires regular cache updates and could face liveness issues if not decentralized.
To optimize efficiency and computational cost, Herodotus maintains two different MMRs. Depending on the specific blockchain or layer, accumulators can be customized with different hash functions. Poseidon hashing might be used when proving for Starknet, while Keccak hashing is used for EVM chains.
Off-chain MMR Cache: Herodotus maintains an off-chain cache of previously fetched queries and results to speed up retrieval when data is requested again. This requires more infrastructure than simply running archive nodes. Optimizations on off-chain infrastructure can potentially reduce costs for end users.
Dedicated Blockchain for Storage: Brevis relies on a dedicated zero-knowledge rollup (aggregation layer) to store block headers from all chains it proves. Without this aggregation layer, each chain would need to store block headers from every other chain, resulting in O(N²) "connections" for N blockchains. By introducing an aggregation layer, each blockchain only needs to store the rollup's state root, reducing overall connections to O(N). This layer is also used to aggregate proofs of multiple block headers/query results and submit a single verification proof on each connected blockchain.
L1-L2 Message Passing: Since L2s support native message passing to update L2 contracts via L1, source chain consensus validation can be avoided. The cache can be updated on Ethereum, and L1-L2 message passing can be used to send off-chain compiled block hashes or tree roots to other L2s. Herodotus is adopting this approach, though it is not feasible for alt L1s.

Data Processing:

Beyond data access, smart contracts should also be able to perform arbitrary computations on the data. While some use cases may not require computation, for many others, this is a crucial value-added service. Many service providers support performing computations on data in the form of zero-knowledge proofs and providing those proofs on-chain for validation. Because existing cross-chain messaging solutions like Axelar, LayerZero, and Polyhedra Network may be used for data access, data processing could become a key differentiator for proof-of-storage providers.

For example, HyperOracle allows developers to define custom off-chain computations using JavaScript. Brevis designed an open zero-knowledge query engine marketplace that accepts data queries from dapps and processes them using proven block headers. A smart contract sends a data query, which is picked up by provers in the marketplace. The prover generates a proof based on the query input, relevant block headers (from Brevis’ aggregation layer), and the result. Lagrange introduced a zero-knowledge big data tech stack to prove distributed programming models such as SQL, MapReduce, and Spark/RDD. These proofs are modular and can be generated from any block headers provided by existing cross-chain bridges and messaging protocols. Lagrange’s first product in its zero-knowledge big data tech stack is zero-knowledge MapReduce—a distributed computing engine (based on the famous MapReduce programming model) for proving computation results involving large volumes of multi-chain data. For instance, a single zero-knowledge MapReduce proof could prove liquidity changes across DEXs deployed on 4–5 chains within a specified time window. For relatively simple queries, computation can also be done directly on-chain, as Herodotus currently does.

Proof Generation:

Updatable Proofs: When computing and maintaining proofs over a stream of moving blocks, updatable proofs can be used. As new blocks are created, existing proofs can be efficiently updated—for example, to maintain a moving average proof of a contract variable (like token price)—without needing to recalculate the entire proof from scratch. To prove dynamic parallel computations on on-chain state, Lagrange builds a batched vector commitment called Recproof over parts of the MPT, updating it in real-time and performing dynamic computations on it. By recursively creating Verkle trees over the MPT, Lagrange can efficiently compute large volumes of dynamic on-chain state data.
Verkle Trees: Unlike Merkle trees, which require providing all nodes sharing a parent, Verkle trees only require the root path. This path is much smaller than all sibling nodes in a Merkle tree. Ethereum is also considering using Verkle trees in future versions to minimize the amount of state full nodes need to hold. Brevis leverages Verkle trees to store proven block headers and query results in its aggregation layer. This greatly reduces the size of data inclusion proofs, especially when the tree contains many elements, and supports efficient batch inclusion proofs.
MemPool Monitoring to Accelerate Proof Generation: Herodotus recently launched Turbo, allowing developers to add a few lines of code in smart contract code to specify data queries. Herodotus monitors the mempool for transactions interacting with Turbo contracts. As soon as a transaction enters the mempool, the proof generation process begins. Once the proof is generated and verified on-chain, the result is written to the on-chain Turbo Swap contract. Results can only be written to the Turbo Swap contract after authentication via proof of storage. Once this happens, a portion of the transaction fee is shared with sequencers or block producers, incentivizing them to wait longer to collect fees. For simple data queries, the requested data might already be available on-chain before the user’s transaction is included in a block.

Applications of State/Storage Proofs

State and storage proofs can unlock numerous new use cases for smart contracts at the application, middleware, and infrastructure layers. Some of these include:

Application Layer:

Governance:

Cross-chain voting: On-chain voting protocols can allow users on Chain B to prove ownership of assets on Chain A. Users don’t need to bridge their assets to gain voting rights on a new chain. Example: SnapshotX on Herodotus
Governance token distribution: Applications can distribute additional governance tokens to active users or early adopters. Example: RetroPGF on Lagrange.

Identity and Reputation:

Proof of Ownership: Users can prove ownership of an NFT, SBT, or asset on Chain A to perform certain actions on Chain B. For example, a gaming app chain could launch its NFT collection on existing liquid chains like Ethereum or any L2. This would allow the game to leverage existing liquidity elsewhere without actually requiring cross-chain NFTs.
Proof of Usage: Users can receive discounts or premium features based on their historical usage on a platform (e.g., proving they traded X volume on Uniswap).
OG Proof: Users can prove they own an active account older than X days.
On-chain Credit Scoring: A cross-chain credit scoring platform could aggregate data from multiple accounts of a single user to generate a credit score.

All of the above proofs can be used to deliver personalized experiences to users. Dapps can offer discounts or privileges to retain experienced traders or users, while providing simplified experiences for newcomers.

DeFi:

Cross-chain lending: Users can lock assets on Chain A and take out loans on Chain B without bridging tokens.
On-chain insurance: Faults can be determined by accessing historical on-chain data, and insurance payouts can be fully executed on-chain.
Time-weighted average price of assets in pools: Applications can calculate and retrieve the average price of an asset in AMM pools over a specified time period. Example: Uniswap TWAP oracle on Axiom.
Options pricing: On-chain options protocols can use the volatility of an asset over the past n blocks on decentralized exchanges to price options.

The last two use cases will require updating proofs as new blocks are added to the source chain.

Middleware:

Intent: Proof of storage will allow users to express intents more explicitly and clearly. While solvers' job is to execute necessary steps to fulfill user intent, users can specify conditions more precisely based on on-chain data and parameters. Solvers can also prove the validity of on-chain data used to find optimal solutions.
Account Abstraction: Users can leverage proof of storage to set rules based on data from other chains. For example: Every wallet has a nonce. We can prove that a year ago, the nonce was a specific number, and currently, it remains unchanged. This can be used to prove the wallet has never been used, enabling delegation of wallet access to another wallet.
On-chain Automation: Smart contracts can automatically execute certain actions based on predefined conditions dependent on on-chain data. Automation agents need to regularly call smart contracts to maintain optimal price flows in AMMs or keep lending protocols healthy by avoiding bad debt. HyperOracle supports both automation and on-chain data access.

Infrastructure

Trustless On-chain Oracles: Decentralized oracle networks aggregate responses from multiple individual oracle nodes within the network. Oracle networks can eliminate such redundancy by leveraging cryptographic security for on-chain data. Oracle networks can pool data from multiple chains (L1s, L2s, and alt L1s) onto a single chain and simply use proof of storage to prove existence elsewhere. Major DeFi solutions can also use custom solutions. For example, the largest liquid staking provider, Lido Finance, has partnered with Nil Foundation to fund the development of zkOracle. These solutions will enable trustless access to EVM historical data and protect Lido Finance’s $15 billion staked Ethereum liquidity.
Cross-chain Messaging Protocols: Existing cross-chain messaging solutions can enhance the expressiveness of their messages by collaborating with proof-of-storage service providers. This is the approach suggested by Lagrange in its modular paper.

Conclusion

Awareness enables tech companies to better serve customers. From user identity to purchase behavior to social relationships, tech companies leverage awareness to unlock precise targeting, customer segmentation, and viral marketing. Traditional tech companies require explicit user consent and must handle user data carefully. However, all user data on permissionless blockchains is public and doesn't necessarily reveal user identity. Smart contracts should be able to leverage publicly available data to better serve users. As specialized ecosystems evolve and gain adoption, state awareness across time and chains will become increasingly important. Proof of storage can position Ethereum as a layer for identity and asset ownership, not just settlement. Users can maintain their identity and key assets on Ethereum, usable across multiple blockchains without constantly bridging assets. We remain excited about the new possibilities and use cases this will unlock.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

LongHash Ventures

@LongHashVC

Deep Dive into Proof of Storage: Achieving Cross-Time, Cross-Chain Blockchain State Awareness

TechFlow Selected TechFlow Selected