ZKVM 是如何工作的

Last updated on Aug 20, 2025

零知识虚拟机（ZKVM）是一种能够生成零知识证明（ZK-proof）来验证其正确执行了一组机器指令的虚拟机。这使我们能够获取一个程序（一组操作码）、一个虚拟机规范（虚拟机的行为方式、使用的操作码等），并证明所生成的输出是正确的。验证者不需要重新运行程序，只需检查生成的 ZK 证明——这使得验证变得简洁（succinct）。这种简洁性正是 ZK Layer 2 区块链能够实现扩容的基础。它还允许人们在不重新运行整个算法的情况下，检查机器学习算法是否如声称的那样运行了。

与名称相反，ZKVM 很少在保持计算隐私的意义上做到“零知识”。相反，它们使用 ZK 算法生成一个简洁证明，证明程序在特定输入上正确执行，从而使验证者能够以呈指数级减少的工作量来复核计算结果。尽管是否公开程序输入是可选的，但避免意外的数据泄露以及让多方就隐私状态达成一致是非常具有挑战性的工程问题，仍存在尚未解决的挑战和扩容限制。

ZKVM 会“计算”操作码中的每一步，然后约束该操作码被正确执行。约束的设计必须能够让我们处理任意但有效的操作码序列。

我们可以将 ZKVM 视为一系列的“状态”转换。“状态转换函数”接收前一个状态和当前要执行的操作码，并生成一个新状态。ZKVM 实现了“状态转换函数”以及对其行为进行建模的电路约束。请注意，“状态”可以包含像“program counter”（程序计数器）这类内容，或虚拟机正常运行所需的其他簿记信息。

本教程将构建一个极其简单的 ZKVM，它仅支持基础算术运算，但可针对其他操作码进行扩展。这里展示的虚拟机仅包含堆栈（stack），没有内存（memory）或存储（storage）。我们在文章末尾提供了进一步学习 ZKVM 的建议。

前置条件

本教程假设您对 EVM（或某些其他基于堆栈的架构，如 Java Virtual Machine）的工作原理有基本的了解。我们大量借鉴了上一章中的堆栈示例进行构建，因此假设您已掌握相关知识。

简单的基于堆栈的 ZKVM

我们将构建一个简单的基于堆栈的 ZKVM，其中包含一个用于保存计算结果的专用信号。该虚拟机接收一系列操作码和数字，然后将最终结果输出到一个我们称之为 out 的特殊信号中。

我们的 ZKVM 仅具有以下操作码：

PUSH（将第一个参数推入堆栈）
ADD（从堆栈中弹出顶部两个元素并将其总和推入堆栈）
MUL（从堆栈中弹出顶部两个元素并将其乘积推入堆栈）
NOP（无操作，不执行任何动作）

为简单起见，所有操作码都接受单个参数，但只有 PUSH 会使用该参数。其余指令会忽略该参数。我们向不使用参数的操作码提供参数的原因在于，这样就不必根据操作码来有条件地检查参数是否存在。

没有 STOP 或 RETURN 操作码（稍后将解释替代方案）。虚拟机接受一个 steps 参数，并在执行了 steps 个指令后返回堆栈底部的任意值。

以下动画给出了在此架构中将两个数字相加的简单示例：

在 Circom 中，循环不能是可变长度的，它们必须始终执行固定次数的迭代，因为底层的 Rank 1 Constraint System (R1CS) 本身必须是固定大小的。同样，程序也不能是可变大小的。但是，无论程序如何运行，它们都必须具有相同数量的操作码。

为了运行操作码数量少于固定数量的程序，我们只需用 NOP 对其进行填充，直到程序达到最大大小。为了知道何时“停止执行”，用户必须提供上述的 steps 参数，以确定何时返回堆栈底部的值。

关于我们架构的几点说明：

虚拟机是基于堆栈的，就像 EVM、Java Virtual Machine，或者（对于那些了解的人来说）Reverse Polish Notation calculator（逆波兰表示法计算器）。
没有跳转指令，因此程序计数器只会递增。
所有操作码都接受单个参数，但 ADD、MUL 和 NOP 忽略传递给它们的参数。这使得我们可以每次始终将程序计数器递增相同的量——我们不必为 PUSH 将程序计数器更新 2 且为 ADD 更新 1 等等。我们始终将计数器增加 2。
要读取 PUSH 的参数，我们只需从程序计数器“向前看”一个索引。
加法和乘法使用模算术（Circom 中的默认方式）进行。我们使用 Circom 默认域的阶作为我们的“字长”——我们不会尝试模拟具有 64 位或 256 位传统字长的虚拟机。模拟固定大小位数的计算是下一章的主题。

更新我们的堆栈代码

我们可以对上一章中的堆栈代码进行几个关键更改，以构建符合上述规范的 ZKVM。

我们移除了 POP 操作码，因为它不再需要了。
我们添加了 ADD 和 MUL 操作码。

回顾一下，先前用于复制上一个堆栈的规则是：

A. 如果 sp 为 1 或更大，并且列 j 是 sp 下方的第 1 个索引，且当前指令是 PUSH 或 NOP，我们应该复制列 j
B. 如果 sp 为 2 或更大，并且列 j 是 sp 下方的第 2 个索引，且当前指令是 POP，我们应该复制列 j

规则 A 保持不变，但 B 需要更新为以下内容：

B. 如果 sp 为 2 或更大，并且列 j 是 sp 下方的第 3 个索引，且当前指令是 ADD 或 MUL，我们应该复制列 j

这一变更的原因在于，之前的 POP 指令并没有改变堆栈中倒数第二顶部的元素，它仅仅移除了顶部元素。然而，ADD 实际上是将堆栈弹出两次并推入总和。同样，MUL 将堆栈弹出两次并推入乘积。

先前的堆栈实现仅将新值写入堆栈指针。然而，新的实现可以将总和或乘积写入堆栈指针下方两个索引的位置。例如，下方堆栈中的 12 在相加后将变为 15，并且该位置位于堆栈指针下方两个索引处：

加法之前：

[12 , 3, sp] (sp = 3)

加法之后：

[15, sp] (sp = 2)

这里，我们将 12 作为堆栈底部，并且 sp 指向堆栈上方的空白空间。

因此，我们需要一个信号来指示特定列位于堆栈指针下方两个元素处。

下面的代码在很大程度上源自上一章的堆栈，但它实现了本章描述的更新。具体来说：

我们已经将 NOP、PUSH 和 POP 替换为 NOP、PUSH、ADD 和 MUL。ADD 和 MUL 会使堆栈指针减 1，NOP 保持堆栈指针不变，而 PUSH 会使堆栈指针加 1 并将其参数复制到堆栈顶部。

pragma circom 2.1.6;

include "circomlib/comparators.circom";
include "circomlib/gates.circom";

template AND3() {
  signal input in[3];
  signal output out;
  
  signal temp;
  temp <== in[0] * in[1];
  out <== temp * in[2];
}

// i is the column number
// bits is how many bits we need
// for the LessEqThan component
template ShouldCopy(i, bits) {
  signal input sp;
  signal input is_push;
  signal input is_nop;
  signal input is_add;
  signal input is_mul;
  
  // out = 1 if should copy
  signal output out;
  
  // sanity checks
  is_add + is_mul + is_push + is_nop === 1;
  is_nop * (1 - is_nop) === 0;
  is_push * (1 - is_push) === 0;
  is_add * (1 - is_add) === 0;
  is_mul * (1 - is_mul) === 0;
  
  // it's cheaper to compute ≠ 0 than > 0 to avoid
  // converting the number to binary
  signal spEqZero;
  signal spGteOne;
  spEqZero <== IsZero()(sp);
  spGteOne <== 1 - spEqZero;
  
  // it's cheaper to compute ≠ 0 and ≠ 1 than ≥ 2
  signal spEqOne;
  signal spGteTwo;
  spEqOne <== IsEqual()([sp, 1]);
  spGteTwo <== 1 - spEqOne * spEqZero;
  
  // the current column is 1 or more 
  // below the stack pointer
  signal oneBelowSp <== LessEqThan(bits)([i, sp - 1]);
  
  // the current column is 3 or more
  // below the stack pointer
  signal threeBelowSP <== LessEqThan(bits)([i, sp - 3]);
  
  // condition A
  component a3A = AND3();
  a3A.in[0] <== spGteOne;
  a3A.in[1] <== oneBelowSp;
  a3A.in[2] <== is_push + is_nop;
  
  // condition B
  component a3B = AND3();
  a3B.in[0] <== spGteTwo;
  a3B.in[1] <== threeBelowSP;
  a3B.in[2] <== is_add + is_mul;
  
  component or = OR();
  or.a <== a3A.out;
  or.b <== a3B.out;  
  out <== or.out;
}

template CopyStack(m) {
  var nBits = 4;
    signal output out[m];
    signal input sp;
    signal input is_add;
    signal input is_mul;
    signal input is_push;
    signal input is_nop;

    component ShouldCopys[m];
    signal copy[m];
    
    // loop over the columns
    for (var i = 0; i < m; i++) {
      ShouldCopys[i] = ShouldCopy(i, nBits);
      ShouldCopys[i].sp <== sp;
      ShouldCopys[i].is_add <== is_add;
      ShouldCopys[i].is_mul <== is_mul;
      ShouldCopys[i].is_push <== is_push;
      ShouldCopys[i].is_nop <== is_nop;
      out[i] <== ShouldCopys[i].out;
    }
}

// n is how many instructions we can handle
// since all the instructions might be push,
// our stack needs capacity of up to n
template ZKVM(n) {
  var NOP = 0;
  var PUSH = 1;
  var ADD = 2;
  var MUL = 3;

  signal input instr[2 * n];
  
  // we add one extra row for sp because
  // our algorithm always writes to the
  // next row and we don't want to conditionally
  // check for an array-out-of-bounds
  signal output sp[n + 1];

  signal output stack[n][n];

  var IS_NOP = 0;
  var IS_PUSH = 1;
  var IS_ADD = 2;
  var IS_MUL = 3;
  var ARG = 4;
  signal metaTable[n][5];

  // first instruction must be PUSH or NOP
  (instr[0] - PUSH) * (instr[0] - NOP) === 0;

  signal first_op_is_push;
  first_op_is_push <== IsEqual()([instr[0], PUSH]);

  // if the first op is NOP, we are forcing the first
  // value to be zero, but this is where the stack
  // pointer is, so it doesn't matter
  stack[0][0] <== first_op_is_push * instr[1];

  // initialize the rest of the first stack to be zero
  for (var i = 1; i < n; i++) {
    stack[0][i] <== 0;
  }

  // we fill out the 0th elements to avoid
  // uninitialzed signals
  sp[0] <== 0;
  sp[1] <== first_op_is_push;
  metaTable[0][IS_PUSH] <== first_op_is_push;
  metaTable[0][IS_NOP] <== 1 - first_op_is_push;
  metaTable[0][IS_ADD] <== 0;
  metaTable[0][IS_MUL] <== 0;
  metaTable[0][ARG] <== instr[1];

  // spBranch is what we add to the previous stack pointer
  // based on the opcode. Could be 1, 0, or -1 depending on the
  // opcode. Since the first opcode cannot be POP, -1 is not
  // an option here.
  var SAME = 0;
  var INC = 1;
  var DEC = 2;
  signal spBranch[n][3];
  spBranch[0][INC] <== first_op_is_push * 1;
  spBranch[0][SAME] <== (1 - first_op_is_push) * 0;
  spBranch[0][DEC] <== 0;

  // populate the metaTable and the stack pointer
  component EqPush[n];
  component EqNop[n];
  component EqAdd[n];
  component EqMul[n];

  component eqSP[n][n];
  signal eqSPAndIsPush[n][n];
  for (var i = 0; i < n; i++) {
    eqSPAndIsPush[0][i] <== 0;
  }

  // signals and components for copying
  component CopyStack[n];
  signal previousCellIfShouldCopy[n][n];
  for (var i = 0; i < n; i++) {
    previousCellIfShouldCopy[0][i] <== 0;
  }

  component eqSPMinus2[n][n];
  signal eqSPMinus2AndIsAdd[n][n];
  signal eqSPMinus2AndIsMul[n][n];
  for (var i = 0; i < n; i++) {
    eqSPMinus2AndIsAdd[0][i] <== 0;
    eqSPMinus2AndIsMul[0][i] <== 0;
  }

  // (the current column = sp - 2 and is_add) * sum
  signal eqSPMinus2AndIsAddWithValue[n][n];
  signal eqSPMinus2AndIsMulWithValue[n][n];

  signal sum_result[n][n];
  signal mul_result[n][n];
  for (var i = 0; i < n; i++) {
    eqSPMinus2AndIsAddWithValue[0][i] <== 0;
    eqSPMinus2AndIsMulWithValue[0][i] <== 0;
    sum_result[0][i] <== 0;
    mul_result[0][i] <== 0; 
  }

  for (var i = 1; i < n; i++) {
    // check which opcode we are executing
    EqPush[i] = IsEqual();
    EqPush[i].in[0] <== instr[2 * i];
    EqPush[i].in[1] <== PUSH;
    metaTable[i][IS_PUSH] <== EqPush[i].out;

    EqNop[i] = IsEqual();
    EqNop[i].in[0] <== instr[2 * i];
    EqNop[i].in[1] <== NOP;
    metaTable[i][IS_NOP] <== EqNop[i].out;

    EqAdd[i] = IsEqual();
    EqAdd[i].in[0] <== instr[2 * i];
    EqAdd[i].in[1] <== ADD;
    metaTable[i][IS_ADD] <== EqAdd[i].out;

    EqMul[i] = IsEqual();
    EqMul[i].in[0] <== instr[2 * i];
    EqMul[i].in[1] <== MUL;
    metaTable[i][IS_MUL] <== EqMul[i].out;

    // carry out the sums and muls
    for (var j = 0; j < n - 1; j++) {
      sum_result[i][j] <== stack[i - 1][j] + stack[i - 1][j + 1];
      mul_result[i][j] <== stack[i - 1][j] * stack[i - 1][j + 1];
    }

    // these values cannot be used in practice because
    // the stack doesn't go that high.
    // However, we still need to initialize
    // them because every column checks
    // if it is sp - 1, even the last 2
    for (var j = n - 1; j < n; j++) {
      sum_result[i][j] <== 0;
      mul_result[i][j] <== 0;
    }

    // get the instruction argument
    metaTable[i][ARG] <== instr[2 * i + 1];

    // if it is a push, write to the stack
    // if it is a copy, write to the stack
    CopyStack[i] = CopyStack(n);
    CopyStack[i].sp <== sp[i];
    CopyStack[i].is_push <== metaTable[i][IS_PUSH];
    CopyStack[i].is_nop <== metaTable[i][IS_NOP];
    CopyStack[i].is_add <== metaTable[i][IS_ADD];
    CopyStack[i].is_mul <== metaTable[i][IS_MUL];
    for (var j = 0; j < n; j++) {
      previousCellIfShouldCopy[i][j] <== CopyStack[i].out[j] * stack[i - 1][j];

      eqSP[i][j] = IsEqual();
      eqSP[i][j].in[0] <== j;
      eqSP[i][j].in[1] <== sp[i];
      eqSPAndIsPush[i][j] <== eqSP[i][j].out * metaTable[i][IS_PUSH];

      // check if the column is two less
      // than the stack pointer
      // if so, we prepare to write the sum or
      // product here
      // if the current instruction is add or mul
      eqSPMinus2[i][j] = IsEqual();
      eqSPMinus2[i][j].in[0] <== j;
      eqSPMinus2[i][j].in[1] <== sp[i] - 2; // underflow doesn't matter

      eqSPMinus2AndIsAdd[i][j] <== eqSPMinus2[i][j].out * metaTable[i][IS_ADD];
      eqSPMinus2AndIsMul[i][j] <== eqSPMinus2[i][j].out * metaTable[i][IS_MUL];

      eqSPMinus2AndIsAddWithValue[i][j] <== eqSPMinus2AndIsAdd[i][j] * sum_result[i][j];
      eqSPMinus2AndIsMulWithValue[i][j] <== eqSPMinus2AndIsMul[i][j] * mul_result[i][j];
      // we will either
      // - PUSH 
      // - COPY or implicilty assign 0
      // - ADD
      // - MUL
      stack[i][j] <== eqSPAndIsPush[i][j] * metaTable[i][ARG] + previousCellIfShouldCopy[i][j] + eqSPMinus2AndIsAddWithValue[i][j] + eqSPMinus2AndIsMulWithValue[i][j];
    }

    // write to the next row's stack pointer
    spBranch[i][INC] <== metaTable[i][IS_PUSH] * (sp[i] + 1);
    spBranch[i][SAME] <== metaTable[i][IS_NOP] * (sp[i]);
    spBranch[i][DEC] <== (metaTable[i][IS_ADD] + metaTable[i][IS_MUL]) * (sp[i] - 1);
    sp[i + 1] <== spBranch[i][INC] + spBranch[i][SAME] + spBranch[i][DEC];
  }
}

component main = ZKVM(5);

/* INPUT = {
    "instr": [1,3,1,6,1,2,3,0,3,0]
} */

如果我们只使用一个操作码却为每个操作码保留一个约束，这难道不低效吗？

在我们的 ZKVM 中，我们为堆栈中的每个元素执行一次加法和乘法，即使我们实际上只使用了其中之一。对于像加法或乘法这样非常轻量级的操作来说，这并不重要。然而，如果我们有一个针对哈希（hashing）等繁重操作的操作码，这将产生显著更多的约束；我们必须为堆栈中的每一项填充一个哈希电路，即使只有堆栈顶部需要进行哈希运算。所有这些都会导致不必要的计算和高昂的计算开销。

我们可以通过使用一个（或两个）Quin selector 来确定堆栈上的哪些元素将作为操作码的输入，以此提高效率，但这仍然意味着堆栈的每次迭代都需要保留针对哈希的约束，即使并没有使用它们。

我们将其留作读者的练习：实现两个 Quin selector，使得仅对堆栈顶部的两个项进行加法或乘法运算，而不是整个堆栈。

这种不必要地重复未使用约束的低效性，是原生的 R1CS 的一个严重缺陷，因为它不允许有条件地使用约束。

提高效率的解决方案

显著提高效率的两种现代方法是基于查找表（lookup tables）的约束和递归证明（recursive proofs）。

查找表是一种算术化方案，其中只有实际用到的约束才是表的一部分，然后 ZK 证明去证明它在每条指令上使用了表中正确的条目。
递归证明会为每条指令创建一个单独的 ZK 证明，然后将该证明与另一个用于验证输入证明有效性的 ZK 证明结合起来。（试想，ZK 中的验证算法本身就可以用算术电路建模）。

这些改进是 ZK 书籍后续部分的主题。不过，为虚拟机中有效状态转换是什么样进行建模的这种思维过程，同样可以推广到现代虚拟机中。

了解有关 ZKVM 的更多信息

最早关于 ZKVM 的提案是 TinyRAM，提出于论文 Snarks for C: verifying Program Executions Succinctly and in Zero Knowledge 中。用作者的话来说，他们创建了“一个具有 Harvard architecture（哈佛架构）和字可寻址随机访问内存的极简 RISC（精简指令集计算机）随机访问机器”。TinyRAM specification（TinyRAM 规范）仅需要 29 个操作码。

由于这篇论文催生了对 ZKVM 的研究，它是一篇值得去理解的高影响力论文。

大多数现代 ZKVM 都是基于 RISC-V 架构的，但也存在 MIPS 架构的 ZKVM。ZK Layer 2 区块链通常使用它们自己的定制架构。

Ye Zhang 关于 Scroll 如何创建 ZKEVM 的视频也相当浅显易懂，适合进行高层次的理解。

Last updated on Aug 20, 2025

ZKVM 会“计算”操作码中的每一步，然后约束该操作码被正确执行。约束的设计必须能够让我们处理任意但有效的操作码序列。

前置条件

简单的基于堆栈的 ZKVM

我们的 ZKVM 仅具有以下操作码：

PUSH（将第一个参数推入堆栈）
ADD（从堆栈中弹出顶部两个元素并将其总和推入堆栈）
MUL（从堆栈中弹出顶部两个元素并将其乘积推入堆栈）
NOP（无操作，不执行任何动作）

没有 STOP 或 RETURN 操作码（稍后将解释替代方案）。虚拟机接受一个 steps 参数，并在执行了 steps 个指令后返回堆栈底部的任意值。

以下动画给出了在此架构中将两个数字相加的简单示例：

关于我们架构的几点说明：

虚拟机是基于堆栈的，就像 EVM、Java Virtual Machine，或者（对于那些了解的人来说）Reverse Polish Notation calculator（逆波兰表示法计算器）。
没有跳转指令，因此程序计数器只会递增。
所有操作码都接受单个参数，但 ADD、MUL 和 NOP 忽略传递给它们的参数。这使得我们可以每次始终将程序计数器递增相同的量——我们不必为 PUSH 将程序计数器更新 2 且为 ADD 更新 1 等等。我们始终将计数器增加 2。
要读取 PUSH 的参数，我们只需从程序计数器“向前看”一个索引。
加法和乘法使用模算术（Circom 中的默认方式）进行。我们使用 Circom 默认域的阶作为我们的“字长”——我们不会尝试模拟具有 64 位或 256 位传统字长的虚拟机。模拟固定大小位数的计算是下一章的主题。

更新我们的堆栈代码

我们可以对上一章中的堆栈代码进行几个关键更改，以构建符合上述规范的 ZKVM。

我们移除了 POP 操作码，因为它不再需要了。
我们添加了 ADD 和 MUL 操作码。

回顾一下，先前用于复制上一个堆栈的规则是：

A. 如果 sp 为 1 或更大，并且列 j 是 sp 下方的第 1 个索引，且当前指令是 PUSH 或 NOP，我们应该复制列 j
B. 如果 sp 为 2 或更大，并且列 j 是 sp 下方的第 2 个索引，且当前指令是 POP，我们应该复制列 j

规则 A 保持不变，但 B 需要更新为以下内容：

B. 如果 sp 为 2 或更大，并且列 j 是 sp 下方的第 3 个索引，且当前指令是 ADD 或 MUL，我们应该复制列 j

加法之前：

[12 , 3, sp] (sp = 3)

加法之后：

[15, sp] (sp = 2)

这里，我们将 12 作为堆栈底部，并且 sp 指向堆栈上方的空白空间。

因此，我们需要一个信号来指示特定列位于堆栈指针下方两个元素处。

下面的代码在很大程度上源自上一章的堆栈，但它实现了本章描述的更新。具体来说：

我们已经将 NOP、PUSH 和 POP 替换为 NOP、PUSH、ADD 和 MUL。ADD 和 MUL 会使堆栈指针减 1，NOP 保持堆栈指针不变，而 PUSH 会使堆栈指针加 1 并将其参数复制到堆栈顶部。

pragma circom 2.1.6;

include "circomlib/comparators.circom";
include "circomlib/gates.circom";

template AND3() {
  signal input in[3];
  signal output out;
  
  signal temp;
  temp <== in[0] * in[1];
  out <== temp * in[2];
}

// i is the column number
// bits is how many bits we need
// for the LessEqThan component
template ShouldCopy(i, bits) {
  signal input sp;
  signal input is_push;
  signal input is_nop;
  signal input is_add;
  signal input is_mul;
  
  // out = 1 if should copy
  signal output out;
  
  // sanity checks
  is_add + is_mul + is_push + is_nop === 1;
  is_nop * (1 - is_nop) === 0;
  is_push * (1 - is_push) === 0;
  is_add * (1 - is_add) === 0;
  is_mul * (1 - is_mul) === 0;
  
  // it's cheaper to compute ≠ 0 than > 0 to avoid
  // converting the number to binary
  signal spEqZero;
  signal spGteOne;
  spEqZero <== IsZero()(sp);
  spGteOne <== 1 - spEqZero;
  
  // it's cheaper to compute ≠ 0 and ≠ 1 than ≥ 2
  signal spEqOne;
  signal spGteTwo;
  spEqOne <== IsEqual()([sp, 1]);
  spGteTwo <== 1 - spEqOne * spEqZero;
  
  // the current column is 1 or more 
  // below the stack pointer
  signal oneBelowSp <== LessEqThan(bits)([i, sp - 1]);
  
  // the current column is 3 or more
  // below the stack pointer
  signal threeBelowSP <== LessEqThan(bits)([i, sp - 3]);
  
  // condition A
  component a3A = AND3();
  a3A.in[0] <== spGteOne;
  a3A.in[1] <== oneBelowSp;
  a3A.in[2] <== is_push + is_nop;
  
  // condition B
  component a3B = AND3();
  a3B.in[0] <== spGteTwo;
  a3B.in[1] <== threeBelowSP;
  a3B.in[2] <== is_add + is_mul;
  
  component or = OR();
  or.a <== a3A.out;
  or.b <== a3B.out;  
  out <== or.out;
}

template CopyStack(m) {
  var nBits = 4;
    signal output out[m];
    signal input sp;
    signal input is_add;
    signal input is_mul;
    signal input is_push;
    signal input is_nop;

    component ShouldCopys[m];
    signal copy[m];
    
    // loop over the columns
    for (var i = 0; i < m; i++) {
      ShouldCopys[i] = ShouldCopy(i, nBits);
      ShouldCopys[i].sp <== sp;
      ShouldCopys[i].is_add <== is_add;
      ShouldCopys[i].is_mul <== is_mul;
      ShouldCopys[i].is_push <== is_push;
      ShouldCopys[i].is_nop <== is_nop;
      out[i] <== ShouldCopys[i].out;
    }
}

// n is how many instructions we can handle
// since all the instructions might be push,
// our stack needs capacity of up to n
template ZKVM(n) {
  var NOP = 0;
  var PUSH = 1;
  var ADD = 2;
  var MUL = 3;

  signal input instr[2 * n];
  
  // we add one extra row for sp because
  // our algorithm always writes to the
  // next row and we don't want to conditionally
  // check for an array-out-of-bounds
  signal output sp[n + 1];

  signal output stack[n][n];

  var IS_NOP = 0;
  var IS_PUSH = 1;
  var IS_ADD = 2;
  var IS_MUL = 3;
  var ARG = 4;
  signal metaTable[n][5];

  // first instruction must be PUSH or NOP
  (instr[0] - PUSH) * (instr[0] - NOP) === 0;

  signal first_op_is_push;
  first_op_is_push <== IsEqual()([instr[0], PUSH]);

  // if the first op is NOP, we are forcing the first
  // value to be zero, but this is where the stack
  // pointer is, so it doesn't matter
  stack[0][0] <== first_op_is_push * instr[1];

  // initialize the rest of the first stack to be zero
  for (var i = 1; i < n; i++) {
    stack[0][i] <== 0;
  }

  // we fill out the 0th elements to avoid
  // uninitialzed signals
  sp[0] <== 0;
  sp[1] <== first_op_is_push;
  metaTable[0][IS_PUSH] <== first_op_is_push;
  metaTable[0][IS_NOP] <== 1 - first_op_is_push;
  metaTable[0][IS_ADD] <== 0;
  metaTable[0][IS_MUL] <== 0;
  metaTable[0][ARG] <== instr[1];

  // spBranch is what we add to the previous stack pointer
  // based on the opcode. Could be 1, 0, or -1 depending on the
  // opcode. Since the first opcode cannot be POP, -1 is not
  // an option here.
  var SAME = 0;
  var INC = 1;
  var DEC = 2;
  signal spBranch[n][3];
  spBranch[0][INC] <== first_op_is_push * 1;
  spBranch[0][SAME] <== (1 - first_op_is_push) * 0;
  spBranch[0][DEC] <== 0;

  // populate the metaTable and the stack pointer
  component EqPush[n];
  component EqNop[n];
  component EqAdd[n];
  component EqMul[n];

  component eqSP[n][n];
  signal eqSPAndIsPush[n][n];
  for (var i = 0; i < n; i++) {
    eqSPAndIsPush[0][i] <== 0;
  }

  // signals and components for copying
  component CopyStack[n];
  signal previousCellIfShouldCopy[n][n];
  for (var i = 0; i < n; i++) {
    previousCellIfShouldCopy[0][i] <== 0;
  }

  component eqSPMinus2[n][n];
  signal eqSPMinus2AndIsAdd[n][n];
  signal eqSPMinus2AndIsMul[n][n];
  for (var i = 0; i < n; i++) {
    eqSPMinus2AndIsAdd[0][i] <== 0;
    eqSPMinus2AndIsMul[0][i] <== 0;
  }

  // (the current column = sp - 2 and is_add) * sum
  signal eqSPMinus2AndIsAddWithValue[n][n];
  signal eqSPMinus2AndIsMulWithValue[n][n];

  signal sum_result[n][n];
  signal mul_result[n][n];
  for (var i = 0; i < n; i++) {
    eqSPMinus2AndIsAddWithValue[0][i] <== 0;
    eqSPMinus2AndIsMulWithValue[0][i] <== 0;
    sum_result[0][i] <== 0;
    mul_result[0][i] <== 0; 
  }

  for (var i = 1; i < n; i++) {
    // check which opcode we are executing
    EqPush[i] = IsEqual();
    EqPush[i].in[0] <== instr[2 * i];
    EqPush[i].in[1] <== PUSH;
    metaTable[i][IS_PUSH] <== EqPush[i].out;

    EqNop[i] = IsEqual();
    EqNop[i].in[0] <== instr[2 * i];
    EqNop[i].in[1] <== NOP;
    metaTable[i][IS_NOP] <== EqNop[i].out;

    EqAdd[i] = IsEqual();
    EqAdd[i].in[0] <== instr[2 * i];
    EqAdd[i].in[1] <== ADD;
    metaTable[i][IS_ADD] <== EqAdd[i].out;

    EqMul[i] = IsEqual();
    EqMul[i].in[0] <== instr[2 * i];
    EqMul[i].in[1] <== MUL;
    metaTable[i][IS_MUL] <== EqMul[i].out;

    // carry out the sums and muls
    for (var j = 0; j < n - 1; j++) {
      sum_result[i][j] <== stack[i - 1][j] + stack[i - 1][j + 1];
      mul_result[i][j] <== stack[i - 1][j] * stack[i - 1][j + 1];
    }

    // these values cannot be used in practice because
    // the stack doesn't go that high.
    // However, we still need to initialize
    // them because every column checks
    // if it is sp - 1, even the last 2
    for (var j = n - 1; j < n; j++) {
      sum_result[i][j] <== 0;
      mul_result[i][j] <== 0;
    }

    // get the instruction argument
    metaTable[i][ARG] <== instr[2 * i + 1];

    // if it is a push, write to the stack
    // if it is a copy, write to the stack
    CopyStack[i] = CopyStack(n);
    CopyStack[i].sp <== sp[i];
    CopyStack[i].is_push <== metaTable[i][IS_PUSH];
    CopyStack[i].is_nop <== metaTable[i][IS_NOP];
    CopyStack[i].is_add <== metaTable[i][IS_ADD];
    CopyStack[i].is_mul <== metaTable[i][IS_MUL];
    for (var j = 0; j < n; j++) {
      previousCellIfShouldCopy[i][j] <== CopyStack[i].out[j] * stack[i - 1][j];

      eqSP[i][j] = IsEqual();
      eqSP[i][j].in[0] <== j;
      eqSP[i][j].in[1] <== sp[i];
      eqSPAndIsPush[i][j] <== eqSP[i][j].out * metaTable[i][IS_PUSH];

      // check if the column is two less
      // than the stack pointer
      // if so, we prepare to write the sum or
      // product here
      // if the current instruction is add or mul
      eqSPMinus2[i][j] = IsEqual();
      eqSPMinus2[i][j].in[0] <== j;
      eqSPMinus2[i][j].in[1] <== sp[i] - 2; // underflow doesn't matter

      eqSPMinus2AndIsAdd[i][j] <== eqSPMinus2[i][j].out * metaTable[i][IS_ADD];
      eqSPMinus2AndIsMul[i][j] <== eqSPMinus2[i][j].out * metaTable[i][IS_MUL];

      eqSPMinus2AndIsAddWithValue[i][j] <== eqSPMinus2AndIsAdd[i][j] * sum_result[i][j];
      eqSPMinus2AndIsMulWithValue[i][j] <== eqSPMinus2AndIsMul[i][j] * mul_result[i][j];
      // we will either
      // - PUSH 
      // - COPY or implicilty assign 0
      // - ADD
      // - MUL
      stack[i][j] <== eqSPAndIsPush[i][j] * metaTable[i][ARG] + previousCellIfShouldCopy[i][j] + eqSPMinus2AndIsAddWithValue[i][j] + eqSPMinus2AndIsMulWithValue[i][j];
    }

    // write to the next row's stack pointer
    spBranch[i][INC] <== metaTable[i][IS_PUSH] * (sp[i] + 1);
    spBranch[i][SAME] <== metaTable[i][IS_NOP] * (sp[i]);
    spBranch[i][DEC] <== (metaTable[i][IS_ADD] + metaTable[i][IS_MUL]) * (sp[i] - 1);
    sp[i + 1] <== spBranch[i][INC] + spBranch[i][SAME] + spBranch[i][DEC];
  }
}

component main = ZKVM(5);

/* INPUT = {
    "instr": [1,3,1,6,1,2,3,0,3,0]
} */

如果我们只使用一个操作码却为每个操作码保留一个约束，这难道不低效吗？

我们将其留作读者的练习：实现两个 Quin selector，使得仅对堆栈顶部的两个项进行加法或乘法运算，而不是整个堆栈。

这种不必要地重复未使用约束的低效性，是原生的 R1CS 的一个严重缺陷，因为它不允许有条件地使用约束。

提高效率的解决方案

显著提高效率的两种现代方法是基于查找表（lookup tables）的约束和递归证明（recursive proofs）。

查找表是一种算术化方案，其中只有实际用到的约束才是表的一部分，然后 ZK 证明去证明它在每条指令上使用了表中正确的条目。
递归证明会为每条指令创建一个单独的 ZK 证明，然后将该证明与另一个用于验证输入证明有效性的 ZK 证明结合起来。（试想，ZK 中的验证算法本身就可以用算术电路建模）。

这些改进是 ZK 书籍后续部分的主题。不过，为虚拟机中有效状态转换是什么样进行建模的这种思维过程，同样可以推广到现代虚拟机中。

了解有关 ZKVM 的更多信息

由于这篇论文催生了对 ZKVM 的研究，它是一篇值得去理解的高影响力论文。

大多数现代 ZKVM 都是基于 RISC-V 架构的，但也存在 MIPS 架构的 ZKVM。ZK Layer 2 区块链通常使用它们自己的定制架构。

Ye Zhang 关于 Scroll 如何创建 ZKEVM 的视频也相当浅显易懂，适合进行高层次的理解。