Vitalik: What’s the Key to Ethereum’s Next Phase?

2026.05.20

Vitalik: What’s the Key to Ethereum’s Next Phase?

“Code is law”—this was one of the earliest beliefs in the blockchain world. But what if the code itself contains bugs?

2026.05.20 - 09:00:03

以太坊

Navigating Web3 tides with focused insights

“Code is law”—this was one of the earliest beliefs in the blockchain world. But what if the code itself contains bugs?

Author: Vitalik Buterin

Translated by: Jiahuan, ChainCatcher

Special thanks to Yoichi Hirai, Justin Drake, Nadim Kobeissi, and Alex Hicks for feedback and review.

Over the past few months, a new programming paradigm has rapidly gained traction at Ethereum’s cutting-edge research frontier—and across many other corners of computing: writing code directly in very low-level languages (e.g., EVM bytecode, assembly) or in Lean, and verifying its correctness using automatically checkable mathematical proofs written in Lean.

Done right, this approach can not only produce extremely efficient code but also be vastly safer than previous programming methods. Yoichi Hirai calls this “the ultimate form of software development.”

This article aims to unpack the underlying principles, explore what formal verification of software can achieve, and examine its weaknesses and limitations—both in Ethereum and beyond.

What Is Formal Verification?

Formal verification means writing proofs of mathematical theorems in a way that can be automatically checked. To give a relatively simple yet still interesting example, consider a basic theorem about the Fibonacci sequence: every third number is even, and all others are odd.

1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 ...

A simple way to prove this is mathematical induction, stepping forward three numbers at a time.

First, the base case. Let F₁ = F₂ = 1, F₃ = 2. By inspection, we see the statement (“Fᵢ is even when i is divisible by 3, otherwise odd”) holds for i = 3.

Next, the inductive step. Assume the statement holds up to 3k+3—that is, we already know the parities of F₃ₖ₊₁, F₃ₖ₊₂, and F₃ₖ₊₃ are odd, odd, and even respectively. We can compute the parity of the next triple:

F₃ₖ₊₄ = F₃ₖ₊₂ + F₃ₖ₊₃ = odd + even = odd
F₃ₖ₊₅ = F₃ₖ₊₃ + F₃ₖ₊₄ = even + odd = odd
F₃ₖ₊₆ = F₃ₖ₊₄ + F₃ₖ₊₅ = odd + odd = even

Thus, from knowing the statement holds up to 3k+3, we deduce it holds up to 3k+6. Repeated application of this inference confirms the rule holds for all integers.

This argument suffices to convince a human. But what if you want to prove something one hundred times more complex—and be *extremely* confident you made no mistake? Well, you can supply a proof that a computer will accept.

Here’s how it looks:

-- Fibonacci with fib 0 = 0, fib 1 = 1, fib 2 = 1 (indices offset by 1)
def fib : Nat → Nat | 0 => 0 | 1 => 1 | n + 2 => fib (n + 1) + fib n
-- Claim: fib (3k+1) is odd, fib (3k+2) is odd, fib (3k+3) is even.
-- Equivalently: every third Fibonacci number starting from fib 3 is even.
-- We prove all three at once by induction on k, since each case
-- of the next block is built from the previous block.
theorem fib_triple (k : Nat) : fib (3 * k + 1) % 2 = 1 ∧ fib (3 * k + 2) % 2 = 1 ∧ fib (3 * k + 3) % 2 = 0 := by induction k with | zero => decide | succ k ih => -- Rewrite the new indices into the form (something) + 2 so fib unfolds. refine ⟨?_, ?_, ?_⟩ · show (fib (3 * k + 3) + fib (3 * k + 2)) % 2 = 1 omega · show (fib (3 * k + 3) + fib (3 * k + 2) + fib (3 * k + 3)) % 2 = 1 omega · show (fib (3 * k + 3) + fib (3 * k + 2) + fib (3 * k + 3) + (fib (3 * k + 3) + fib (3 * k + 2))) % 2 = 0 omega

This is the same reasoning logic, expressed in Lean—a programming language commonly used for writing and verifying mathematical proofs.

It looks different from the “human” proof above—and for good reason: what feels intuitive to a computer (in the traditional sense—that is, a deterministic program composed of if/then statements, not large language models) differs sharply from what feels intuitive to humans.

In the above proof, you don’t emphasize the fact that fib(3k+4) = fib(3k+3) + fib(3k+2); instead, you assert that fib(3k+3) + fib(3k+2) is odd, and Lean’s powerful built-in tactic called `omega` automatically combines this with its knowledge of the definition of fib(3k+4).

In more complex proofs, you sometimes must explicitly name which mathematical law justifies each step—and occasionally use obscure names like `Prod.mk.inj`.

On the other hand, you can expand huge polynomial expressions in one step, proving their validity with a single-line tactic like `omega` or `ring`.

This unintuitiveness and tedium largely explains why machine-checkable proofs—though nearly 60 years old—remain niche. Yet, thanks to rapid AI progress, many things once thought impossible are now becoming feasible.

When Mathematical Proofs Start Guarding Code

So far, you might think: okay, computers can verify proofs of mathematical theorems—so finally we can confirm which wild new claims about primes are true, and which are just errors buried in 100-page PDFs.

Maybe we’ll even settle whether Shinichi Mochizuki’s ABC conjecture arguments hold!

But setting aside curiosity—so what?

There are many possible answers. For me, one especially important answer is verifying the correctness of computer programs—particularly those performing cryptographic or security-critical tasks.

After all, computer programs are mathematical objects; thus, proving that a program behaves in a certain way is itself a mathematical theorem.

For example, suppose you want to prove whether Signal—the encrypted messaging software—is truly secure. You can mathematically define what “secure” means in that context.

At a high level, you’d prove that, assuming certain cryptographic assumptions hold, only someone possessing the private key can learn anything about the message content. In practice, many distinct security properties are critical.

It turns out there really *is* a team working precisely on this! One of their security theorems looks like this:

theorem passive_secrecy_le_ddh (g : G) (adv : PassiveAdversary G SK) : passiveSecrecyAdvantage (F := F) g adv ≤ ProbComp.boolDistAdvantage (DiffieHellman.ddhExpReal (F := F) g (ddhReduction adv)) (DiffieHellman.ddhExpRand (F := F) g (ddhReduction adv))

Here's Leanstral’s summary of its meaning:

The `passive_secrecy_le_ddh` theorem is a tight reduction showing that X3DH’s passive message secrecy is at least as hard as the DDH assumption in the random oracle model. If an adversary breaks X3DH’s passive message secrecy, they can also break DDH.

Since we assume DDH is hard to break, X3DH is secure against passive attackers. This theorem proves that if an adversary passively observes Signal’s key exchange messages, they cannot distinguish the resulting session key from a random key with non-negligible advantage.

Combined with a proof that the AES encryption implementation is correct, you get a proof that the Signal protocol is cryptographically secure against passive attackers.

Similar projects have proven the security of TLS implementations and other parts of browser cryptography.

With end-to-end full formal verification, you prove not just that some theoretical description of the protocol is secure—but that the concrete code users run is secure in practice.

From the user’s perspective, this dramatically increases trustlessness: to fully trust the code, you need not inspect the entire codebase—you only need to check the statements it was proven to satisfy.

Now, several major caveats bear remembering—especially regarding what the crucial word “security” actually means.

It’s easy to forget to prove the statements that truly matter. It’s easy to find cases where the statement to be proved has no simpler description than the code itself.

It’s easy to sneak in assumptions that ultimately don’t hold. And it’s easy to decide only one part of a system needs formal verification—only to be undone by severe vulnerabilities elsewhere (or even in hardware).

Even Lean’s own implementation may contain bugs. But before diving into all these frustrating details, let’s first explore the utopia formal verification could deliver—if done correctly and ideally.

Formal Verification for Security

Bugs in computer code are terrifying.

They become even more terrifying when you place cryptocurrency into immutable on-chain smart contracts—and North Korea can automatically drain your funds the moment a bug appears, with no recourse.

They become *even more* terrifying when wrapped in zero-knowledge proofs: if someone hacks the ZK system, they can extract all funds—and we won’t even know what went wrong (worse, we won’t know *when* it went wrong).

They become *even more* terrifying when powerful AI models—like Claude Mythos two years from now—can automatically discover such bugs.

Some react to this reality by advocating abandoning smart contracts altogether—or even arguing that cyberspace can never be a domain where defenders hold asymmetric advantages over attackers.

Some quotes:

“To harden a system, you need to spend more tokens discovering vulnerabilities than attackers spend exploiting them.”

And:

“Our industry is built on deterministic code: write it, test it, ship it, trust it runs—but in my experience, this contract is breaking.”

Among top operators at truly AI-native companies, codebases have become things you “trust” to run—not things whose success probability you can precisely state.

Worse still, some argue the only solution is abandoning open source.

That would be a bleak future for cybersecurity—especially for those of us who care about internet decentralization and freedom. It’s profoundly pessimistic.

The entire cypherpunk ethos rests fundamentally on the idea that, online, defenders hold the advantage: building a digital “castle” (be it encryption, signatures, or proofs) is far easier than destroying one.

If we lose that, internet security can only come from economies of scale—chasing potential attackers globally—and, broadly speaking, choosing between domination and destruction.

I disagree. I hold a more optimistic vision for cybersecurity’s future.

I believe the challenge posed by powerful AI vulnerability-finding capabilities is serious—but transitional. Once the dust settles and we reach a new equilibrium, we’ll enjoy an environment far more favorable to defenders than ever before.

Mozilla agrees. As they put it:

“You may need to reprioritize everything else, dedicating continuous, intense focus to this task—but there is light at the end of the tunnel.”

“We’re incredibly proud of how our team has risen to this challenge—and others will too. Our work isn’t done, but we’ve passed the hardest part and can glimpse a future that’s not merely keeping pace—but far brighter.”

“Defenders finally have a chance to win decisively… Vulnerabilities are finite—and we’re entering a world where we can finally find them all.”

Now, if you Ctrl+F Mozilla’s post for the words “formal” and “verification,” you’ll find zero matches. A positive cybersecurity future doesn’t hinge solely on formal verification—or any single technology.

What does it depend on? Fundamentally, this chart:

Declining CVE vulnerability count over time

For decades, many technologies have driven down vulnerability counts:

Type systems
Memory-safe languages
Improved software architecture (including sandboxing, permission controls, and broader explicit separation of the “trusted computing base” from “other code”)
Better testing methodologies
Growing knowledge bases of secure and insecure coding patterns
Increasingly pre-written and audited software libraries

AI-assisted formal verification shouldn’t be seen as a wholly new paradigm—but rather as a powerful accelerator for trends and paradigms already advancing.

Formal verification isn’t magic. But it shines where the goal is much simpler than the implementation. This is especially true for some extremely complex, thorny technologies we’ll deploy in Ethereum’s next major iteration: post-quantum signatures, STARKs, consensus algorithms, and ZK-EVMs.

STARKs are highly complex software. Yet their core security property is easy to understand and formalize: if you see a proof for program P, input x, output y, and hash H, then either (i) the hash algorithm used in the STARK is broken, or (ii) P(x) = y.

Thus, the Arklib project aims to create a fully formally verified STARK implementation (see VCV-io, providing foundational oracle computation infrastructure usable for formally verifying various other cryptographic protocols—including many STARK dependencies).

Even more ambitious is evm-asm: a project building a fully formally verified entire EVM implementation.

Here, the security property is less straightforward: essentially, the goal is to prove equivalence to another EVM implementation written in Lean—but one designed purely for maximum intuitiveness and readability, with no concern for runtime efficiency.

It’s possible we’ll end up with ten EVM implementations, all provably equivalent—and all containing the same fatal flaw allowing attackers to drain ETH from addresses they lack permission to touch.

But that’s far less likely than a single EVM implementation having such a flaw today. Another safety property we learned painfully to value—DoS resistance—is far easier to specify.

Two other critical areas:

Byzantine fault-tolerant consensus: formalizing all desired security properties is similarly difficult, but given how common bugs have been, it’s worth trying. Thus, ongoing Lean implementations and proofs of consensus protocols exist.
Smart contract programming languages: see formal verification in Vyper and Verity.

In all these cases, one major added value of formal verification is that these proofs are truly end-to-end. Often, the most insidious bugs are interaction bugs—lurking at the interface between two subsystems considered independently.

For humans, reasoning end-to-end about the whole system is too hard. Automated rule-checking systems can do it.

Formal Verification for Efficiency

Let’s revisit evm-asm. It’s an EVM implementation—but written directly in RISC-V assembly.

Literally.

Here’s the ADD opcode:

import EvmAsm.Rv64.Program
namespace EvmAsm.Evm64
open EvmAsm.Rv64
/-- 256-bit EVM ADD: binary, pops 2, pushes 1. Limb 0: LD, LD, ADD, SLTU (carry), SD (5 instructions). Limbs 1-3: LD, LD, ADD, SLTU (carry1), ADD (carryIn), SLTU (carry2), OR (carryOut), SD (8 each). Then ADDI sp, sp, 32. Registers: x12=sp, x7=acc, x6=operand, x5=carry, x11=carry1. -/def evm_add : Program := -- Limb 0 (5 instructions) LD .x7 .x12 0 ;; LD .x6 .x12 32 ;; ADD .x7 .x7 .x6 ;; SLTU .x5 .x7 .x6 ;; SD .x12 .x7 32 ;; -- Limb 1 (8 instructions) LD .x7 .x12 8 ;; LD .x6 .x12 40 ;; ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;; ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;; OR' .x5 .x11 .x6 ;; SD .x12 .x7 40 ;; -- Limb 2 (8 instructions) LD .x7 .x12 16 ;; LD .x6 .x12 48 ;; ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;; ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;; OR' .x5 .x11 .x6 ;; SD .x12 .x7 48 ;; -- Limb 3 (8 instructions) LD .x7 .x12 24 ;; LD .x6 .x12 56 ;; ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;; ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;; OR' .x5 .x11 .x6 ;; SD .x12 .x7 56 ;; -- sp adjustment ADDI .x12 .x12 32
end EvmAsm.Evm64

RISC-V was chosen because ZK-EVM provers under construction often operate by proving RISC-V and compiling Ethereum clients to RISC-V. So an EVM implementation written directly in RISC-V should be the fastest possible.

RISC-V can also be simulated extremely efficiently on ordinary computers (and RISC-V laptops exist).

Of course, for true end-to-end assurance, you must formally verify the RISC-V implementation (or the prover’s arithmetic encoding) itself—but don’t worry, work on that already exists.

Writing code directly in assembly is something we did fifty years ago. Since then, we’ve abandoned it in favor of high-level languages.

High-level languages sacrifice some efficiency—but in exchange, they let us write code faster, and—more importantly—understand others’ code faster, which is essential for security.

With formal verification combined with AI, we get a chance to “return to the future.”

Specifically, we can let AI write assembly code, then write a formal proof verifying that assembly code has the desired properties.

At minimum, those properties could simply be perfect equivalence to an implementation optimized for readability, written in some human-friendly high-level language.

We no longer need a single code artifact balancing readability and efficiency. Instead, we have two independent artifacts: one (the assembly implementation) optimized solely for efficiency—considering the specific demands of its execution environment; the other (the safety specification, or high-level implementation) optimized solely for readability—and we mathematically prove their equivalence.

Users can (automatically) verify the proof once—and thereafter simply run the fast version.

This approach is extremely powerful—so much so that Yoichi Hirai calls it “the ultimate form of software development” for good reason.

Formal Verification Is Not a Panacea

In cryptography and computer science, there’s a tradition nearly as old as formal methods themselves: criticizing formal methods (or more broadly, reliance on “proofs”).

This literature abounds with real-world examples. Let’s begin with handwritten proofs from early cryptography days—citing Menezes and Koblitz’s 2004 critique:

“In 1979, Rabin proposed an encryption function that was ‘provably’ secure in a certain sense—that is, it possessed a reductionist security property.

A reductionist security claim states that anyone who can recover message m from ciphertext y must also be able to factor n… Shortly after Rabin proposed his scheme, Rivest pointed out—ironically—that the very feature granting it extra security caused total collapse under a different attack: ‘chosen ciphertext.’

That is, if an attacker can somehow trick Alice into decrypting ciphertexts of their choice, the attacker can follow the same steps Sam used earlier to factor n.”

Menezes and Koblitz go on to give more examples. A common pattern emerges: designs aimed at making cryptographic protocols more “provable” often make them less “natural”—making them more likely to collapse in ways designers didn’t even consider.

Now, back to machine-checkable proofs and code. Here’s a 2011 paper finding a vulnerability in a formally verified C compiler:

“We found the second CompCert issue manifested in two bugs causing generation of this code: stwu r1, -44432(r1)—allocating a large PowerPC stack frame.

The problem is that the 16-bit displacement field overflowed. CompCert’s PPC semantics didn’t specify limits on immediate widths, assuming the assembler would catch out-of-range values.”

And here’s a 2022 paper:

“In CompCert-KVX, commit e2618b31 fixed a bug: the ‘nand’ instruction was printed as ‘and’; ‘nand’ was only used in rare ~ (a & b) patterns. The bug was discovered by compiling randomly generated programs.”

And today—in 2026—here’s Nadim Kobeissi describing vulnerabilities in Cryspen’s formally verified software:

“In November 2025, Filippo Valsorda independently reported that libcrux-ml-dsa v0.0.3 produced different public keys and signatures on different platforms given identical deterministic inputs.

The bug resided in the `_vxarq_u64` wrapper function implementing the XAR operation used in SHA-3’s Keccak-f permutation. The fallback mechanism passed incorrect parameters to shift operations, corrupting SHA-3 digests on ARM64 platforms without hardware SHA-3 support.

This is a Type I failure: the internal function was marked, but the entire NEON backend lacked proofs of runtime safety or correctness.”

And:

“The libcrux-psq library implements a post-quantum pre-shared key protocol. In the `decrypt_out` method, the AES-GCM 128 decryption path called `.unwrap()` on the decryption result instead of propagating the error. A malformed ciphertext could crash the process.”

All four issues fall into one of two categories:

Only part of the code was verified (because verifying the rest was too hard)—yet unverified code turned out to harbor more (and more catastrophic) vulnerabilities than authors imagined.
Authors forgot to specify critical properties needing proof.

Nadim’s article includes a taxonomy of formal verification failure modes—and cites other types (e.g., “the formal specification itself is wrong, or the proof contains false statements silently accepted by the system being built”).

Finally, consider formal verification failures at the software-hardware boundary. A common issue is verifying resistance to side-channel attacks.

Even with perfectly secure cryptographic primitives protecting your messages, you’re unsafe if someone meters electrical signal fluctuations a few meters away and extracts your private key after hundreds of thousands of encryptions.

Here’s an article on “differential power analysis”—a well-understood example of such techniques:

Differential power analysis is a common type of side-channel attack. Source: Wikipedia

People have long tried proving security against such attackers. Yet any such proof requires a mathematical model of the attacker—so you can prove security against it.

Sometimes the “d-probing model” is used: we assume attackers have a known limit on circuit positions they can query. But some leakage forms evade this model.

As observed in this article, a common problem is transient leakage: if you observe a signal depending not only on a value’s current state but also on its transition, this often lets you recover needed information from both old and new values—not just one.

This article gives a taxonomy of other leakage forms.

For decades, these critiques helped formal verification improve. Today, we’re better at guarding against such issues than before. But it’s still imperfect.

Viewed holistically, one main thread emerges: formal verification is powerful.

Yet no matter how marketing terms make formal verification sound like it delivers “provable correctness,” “provable correctness” fundamentally *cannot* prove software (or hardware) is “correct.”

To most humans, “correct” means something like: “the thing behaves as users understand the developer’s intent.”

And “secure” means something like: “the thing behaves without violating users’ expectations in ways harmful to their interests.”

In both cases, correctness and security reduce to comparing mathematical objects with human intent or expectations.

Human intent and expectations are technically mathematical objects too—after all, human brains are part of the universe, obeying physical laws simulatable given sufficient compute.

But they’re incredibly complex mathematical objects—neither computers nor ourselves can understand or even read them.

For all practical purposes, they’re black boxes; we only grasp our own intents and expectations because each of us has spent years observing our thoughts and inferring others’.

And since we cannot stuff raw human intent into computers, formal verification cannot prove comparison against human intent.

Thus, “provable correctness” and “provable security” don’t actually prove the “correctness” and “security” humans understand. Nothing can—unless we fully simulate the human brain.

So What Is It Good For?

I tend to view test suites, type systems, and formal verification as different implementations of the same underlying approach to programming language safety (perhaps the only reasonable one).

They’re all about redundantly specifying our intent in different ways—and automatically checking whether those specifications are mutually compatible.

Take this Python code:

def fib(n: int) -> int: if n < 0: raise Exception("Negative values not supported") elif 0 <= n < 2: return n else: return fib(n-1) + fib(n-2)
if __name__ == '__main__': assert [fib(i) for i in range(10)] == [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] assert fib(15) == 610

Here, you express your intent three different ways:

Explicitly, by implementing the Fibonacci formula in code
Implicitly, via the type system (specifying inputs, outputs, and recursive intermediate steps are all integers)
Via the “example pack” method: test cases

Running the file checks the formula against examples. The type checker verifies type compatibility: adding two integers is a valid operation yielding another integer.

Type systems often serve as excellent “physics homework checkers”: if you compute acceleration but get units of meters/second instead of meters/second², you know you erred.

Test cases are instances of the “example pack” definition—often far more natural for humans handling concepts than direct explicit definitions.

The more different ways you can specify your intent—ideally requiring radically different mental approaches—the more likely you’ve actually captured what you truly want, once all specifications prove mutually compatible.

Secure programming lies in expressing intent in multiple distinct ways—and automatically verifying all expressions are mutually compatible.

Formal verification extends this method further. With formal verification, you can specify your intent in nearly infinite redundant ways—programs only verify if *all* are compatible.

You can specify a highly optimized implementation alongside an inefficient but human-readable one—and verify they match. You can ask ten friends for lists of mathematical properties your program should satisfy—and check them all.

If any fail, determine whether the program or the mathematical property is wrong. And you can do all this with extreme AI efficiency.

So How Do I Get Started?

Realistically, you won’t write proofs yourself. Formal methods haven’t caught on widely precisely because most people can’t decipher this arcane syntax. Can you explain what this code means?

/-- Helper: pointwise ≤ at the foldl level with an accumulator. -/private theorem foldl_acc_le (ds1 ds2 : List Nat) (w : Nat) (a b : Nat) (hAcc : a ≤ b) (hLE : Forall₂ (· ≤ ·) ds1 ds2) : List.foldl (λ acc d => acc * w + d) a ds1 ≤ List.foldl (λ acc d => acc * w + d) b ds2 := by match ds1, ds2, hLE with | [], [], .nil => exact hAcc | d1::ds1', d2::ds2', .cons hd htl => simp [List.foldl] refine foldl_acc_le ds1' ds2' w (a * w + d1) (b * w + d2) ?_ htl exact Nat.add_le_add (Nat.mul_le_mul hAcc (Nat.le_refl _)) hd

(If you’re curious, this is one of many sub-lemmas in a proof of a specific security claim for a SPHINCS signature variant.

Specifically, the claim is: unless a hash collision occurs, a signature for one message will require a higher value somewhere on a hash ladder than any other message’s signature—and thus contains information computable from neither that other signature.)

Rather than manually writing code and proofs, simply let AI write your program (either directly in Lean, or in assembly for speed)—and prove any desired properties during the process.

This task has the nice property of being self-verifying—so you don’t need supervision; just let AI run for several hours straight.

The worst outcome is it spins its wheels making no progress (or—as my leanstral once did—it spontaneously replaces the statement it was asked to prove to lighten its own workload).

The only thing you need to check at the end is whether the statement it proved matches your requirements.

In the SPHINCS signature variant, this is the final statement:

theorem wots_fullDigits_incomparable {dig1 dig2 : List Nat} {w l1 l2 : Nat} (hw : 0 < w) (hLen1 : dig1.length = l1) (hLen2 : dig2.length = l1) (hBound1 : ∀ d ∈ dig1, d < w) (hBound2 : ∀ d ∈ dig2, d < w) (hL2suff : l1 * (w - 1) < w ^ l2) (hNeq : dig1 ≠ dig2) : ¬ Forall₂ (· ≤ ·) (wotsFullDigits dig1 w l1 l2) (wotsFullDigits dig2 w l1 l2) ∧ ¬ Forall₂ (· ≤ ·) (wotsFullDigits dig2 w l1 l2) (wotsFullDigits dig1 w l1 l2)

This is actually borderline readable:

If digits generated from one hash digest (dig1) differ from those generated from another (dig2),

then neither of the following holds:

For all digits, dig1’s digit ≤ dig2’s digit
For all digits, dig2’s digit ≤ dig1’s digit

This applies also to the “extended digits” (wotsFullDigits) generated by adding checksums. That is, in dig1’s extension, some digits inevitably exceed those in dig2’s extension—and vice versa.

For writing proofs using large language models, I find both Claude and Deepseek 4 Pro capable. Leanstral—a smaller open-weight model specifically fine-tuned for Lean—is a promising alternative.

With 119B parameters and 6B activated per token, you can run it locally—albeit slowly (≈15 tok/sec on my laptop). Benchmarks show Leanstral outperforms much larger general-purpose models:

Based on my current personal experience, it’s slightly weaker than Deepseek 4 Pro—but still highly effective.

Formal verification won’t solve all our problems.

But if we want internet security models no longer based on trusting a few powerful organizations—and instead on trusting code, even against powerful AI adversaries—AI-assisted formal verification takes us a major step toward that goal.

Like blockchain and ZK-SNARKs, AI and formal verification are highly complementary technologies.

Blockchain trades privacy and scalability for open verifiability and censorship resistance—while ZK-SNARKs restore privacy and scalability (even beyond original levels).

AI trades accuracy for massive code-generation capability—while formal verification restores accuracy (even beyond original levels).

By default, AI will generate vast amounts of sloppy code—increasing bug counts.

Indeed, in some cases, tolerating increased bugs is the right tradeoff: if bugs are minor, buggy software may still be better than none.

But here, cybersecurity has an optimistic future: software will (continue to) split into “unsafe edge parts” surrounding a “secure core.”

Unsafe edge parts will run in sandboxes, granted only minimal permissions needed to do their jobs.

The secure core will manage everything. If the secure core crashes, everything collapses—including your personal data, money, etc. But if an unsafe edge part crashes, the secure core still protects you.

When it comes to the secure core, buggy code cannot proliferate. We’ll take aggressive action to keep the secure core small—and shrink it further.

Instead, we’ll channel all AI’s extra performance into making the secure core safer—so it withstands the immense trust burden we assign it in our hyper-digital society.

Operating system kernels (or at least parts thereof) will become such secure cores.

Ethereum will be another.

Hopefully, for all non-performance-intensive computation, your hardware will be the third.

IoT-related systems will be the fourth.

At least within these secure cores, the old adage “bugs are inevitable—you can only try to find them before attackers do” will be falsified—replaced by a more hopeful world where real security is attainable.

But if you willingly entrust your assets and data to shoddy software that might accidentally swallow them into a black hole—well, you certainly have that freedom.

Join TechFlow official community to stay tuned

Telegram:https://t.me/TechFlowDaily

X (Twitter):https://x.com/TechFlowPost

X (Twitter) EN:https://x.com/BlockFlow_News

Source

Add to Favorites

Share to Social Media

Author

Vitalik Buterin

@VitalikButerin

Vitalik: What’s the Key to Ethereum’s Next Phase?

TechFlow Selected TechFlow Selected