The Groth16 algorithm enables a quadratic arithmetic program to be computed by a prover over elliptic curve points derived in a trusted setup, and quickly checked by a verifier. It uses auxiliary elliptic curve points from the trusted setup to prevent forged proofs.
We refer to an elliptic curve point belonging to the elliptic curve group as and an elliptic curve point belonging to the elliptic curve group as . A pairing between and is denoted as and produces an element in . Variables in bold such as are vectors, upper case bold letters such as are matrices, and field elements (sometimes informally referred to as “scalars”) are lower case letters such as . All arithmetic operations happen in a finite field with a characteristic that equals the order of the elliptic curve group.
From this, we can construct a Quadratic Arithmetic Program (QAP):
where
and
If a third party creates a structured reference string (srs) via a powers of tau ceremony, then the prover can evaluate sum terms (the terms) in the QAP at a hidden point . Let the structured reference strings be computed as follows:
We refer to as a polynomial evaluated on a structured reference string via the inner product:
or for srs:
is shorthand for the above expression, and produces an elliptic curve point. It does not mean the prover knows .
The prover can evaluate their QAP on the trusted setup by computing:
If the QAP is balanced, then the following equation holds:
Motivation
Simply presenting is not a convincing argument that the prover knows such that the QAP is balanced.
The prover can simply invent values , , where , compute
and present those as elliptic curve points , , to the verifier.
Thus, the verifier has no idea if were the result of a satisfied QAP or not.
We need to force the prover to be honest without introducing too much computational overhead. The first algorithm to accomplish this was “Pinocchio: Nearly Practical Verifiable Computation.” This was usable enough for ZCash to base the first version of their blockchain on it.
However, Groth16 was able to accomplish the same thing in much fewer steps, and the algorithm is still widely in use today, as no algorithm since has produced as efficient an algorithm for the verification step (though other algorithms have eliminated the trusted setup or significantly reduced the amount of work for the prover).
Update for 2024: A paper rather triumphantly titled “Polymath: Groth16 is not the limit” published in Cryptology demonstrates an algorithm that requires fewer verifier steps than Groth16. However, there are no known implementations of the algorithm at this time of writing.
Preventing forgery Part 1: Introducing and
An “unsolveable” verification formula
Suppose we update our verification formula to the following:
Note that we are using additive notation for the group for convenience.
Here, is an element from and has an unknown discrete logarithm.
We now show that it is impossible for a verifier to provide a solution to this equation, without knowing the discrete logarithm of .
Attack 1: Forging A and B and deriving C
Suppose the prover randomly selects and to produce and and tries to derive a value that is compatible with the verifier’s formula.
Knowing the discrete logarithms of and , the malicious prover tries to solve for by doing
The final line is requires the prover to solve for the discrete log of , so they cannot compute a valid discrete log for .
Attack 2: Forging C and deriving A and B
Here the prover picks a random point and computes . Because they know , they can try to discover a compatible combination of and such that
This requires the prover, given , to come up with an and that pair to produce .
Similar to the discrete log problem, we rely on unproven cryptographic assumptions that this computation (decomposing an element in into a and element) is infeasible. In this case, the assumption that we cannot decompose into and is called the Bilinear Diffie-Hellman Assumption. The interested reader can see a related discussion on the Decisional Diffie-Hellman Assumption.
(Unproven does not mean unreliable. If you can find a way to prove or disprove this assumption, fame and fortune awaits you! In practice, there is no known way to decompose into and and it is believed to be computationally infeasible.)
How and are used
In practice, Groth16 doesn’t use a term . Instead, the trusted setup generates two random scalars and and publishes the elliptic curve points computed as:
What we referred to as is simply .
Re-deriving the proving and verification formulas
To make the verification formula “solvable”, we need to alter our QAP formula to incorporate and .
Now consider what happens if we introduce terms and to the left hand side of the equation:
We can substitute the rightmost terms using the original QAP definition:
Now we can introduce an “expanded” QAP with the following definition:
As a sneak peak to where we are going, if we replace with and with , we get updated verification formula from earlier:
where
The prover can compute and without knowing , , or . Given the structured reference string (powers of ) and the elliptic curve points , the prover computes and as
Here, does not mean the prover knows . The prover is using the structure reference string to compute for and the srs for for .
However, it isn’t currently possible to compute without knowing and . The prover cannot pair with and with because that would create a point, whereas the prover needs a point for .
Instead, the trusted setup needs to precompute polynomials for the problematic term of the expanded QAP.
With some algebraic manipulation, we combine the sum terms into a single sum:
and factor out :
The trusted setup can create polynomials evaluated at from the boxed term above, and the prover can use that to compute the sum. The exact details are shown in the next section.
Summary of the algorithm so far
Trusted setup steps
Concretely, the trusted setup computes the following:
The trusted setup publishes
Prover steps
The prover computes
Note that we replaced the “problematic” polynomial
(the one that contained and ) with
Verifier steps
The verifier computes:
Supporting public inputs
The verifier formula so far does not support public inputs, i.e. making a portion of the witness public.
By convention, public portions of the witness are the first elements of the vector . To make those elements public, the prover simply reveals them:
For the verifier to test that those values were in fact used, verifier must carry out some of the computation that the prover was originally doing.
Specifically, the prover computes:
Note that only the computation of changed – the prover only uses the and terms to .
The verifier computes the first terms of the sum:
And the verification equation is:
Part 2: Separating the public inputs from the private inputs with or
Forging proofs by misuing for
The assumption in the equation above is that the prover is only using to to compute , but nothing stops a dishonest prover from using to to compute , leading to a forged proof.
For example, here is our current verification equation:
If we expand the C term under the hood, we get the following:
Suppose for example and without loss of generality that and . In that case, the public part of the witness is and the private part is .
The final equation would be as follows:
However, nothing stops the prover from creating an valid portion of the public witness as [1,2,0] and moving the zeroed out public portion to the private part of the computation as follows:
The equation above is valid, but the witness does not necessarily satisfy the original constraints.
Therefore, we need to prevent the prover from using to as part of the computation of .
Introducing and/or
To avoid the problem above, the trusted setup introduces a new scalar : or to force to to be separate from to . To do this, the trusted setup divides (multiplies by the modular inverse) the private terms (that constitute ) by and/or the public terms (that constitute , the sum the verifier computes) by .
Since the term is embedded in , those terms also need to be divided by . If either and have an unknown discrete logarithm, then the forgery described earlier along possible other methods are avoided. This method was used in Zcash’s Sapling’s‑based trusted setups where is simply left to and is still updated from to a random value at later trusted setup stages.
The trusted setup publishes
The prover steps are the same as before:
And the verifier steps now include pairing by and/or to cancel out the denominators:
Part 3: Enforcing true zero knowledge: r and s
Our scheme is not yet truly zero knowledge. If an attacker is able to guess our witness vector (which is possible if there is only a small range of valid inputs, e.g. secret voting from privileged addresses), then they can verify their guess is correct by comparing their constructed proof to the original proof.
As a trivial example, suppose our claim is and are both either or . The corresponding arithmetic circuit would be
An attacker only needs to guess four combinations to figure out what the witness is. That is, they guess a witness, generate a proof, and see if their answer matches the original proof.
To prevent guessing, the prover needs to “salt” their proof, and the verification equation needs to be modified to accommodate the salt.
The prover samples two random field elements and and adds them to and to make the witness unguessable – an attacker would have to guess both the witness and the salts and :
To derive the final verification formula, let’s temporarily ignore that we don’t know the discrete logs of the Greek letter terms and compute the left-hand-side of the verification equation :
Expanding the terms we get:
We can select out the original terms for
And combine them on the left, leaving the new terms on the right:
We further rearrange the underlined terms to write them in terms of and as follows. We also split into :
Group the and terms together:
Factor out and :
Substitute and :
So our final equation is
We now break it into the public and private portions:
We want the public portion and the private portion to be separated by and respectively:
cancels for some of the terms:
We now separate this equation in to the verifier and prover portions. The boxed terms are the verifier portion, the underbrace terms are the terms that the prover provides:
Groth16 Proof Algorithm
We are now ready to show the Groth16 algorithm end-to-end. The trusted setup and the verification steps remain unchanged from the previous example where we incorporated and . Only the prover’s calculation changes to incorporate and .
Trusted Setup
The trusted setup publishes
Prover Steps
Prover has a witness and generates random scalars and .
The prover publishes .
Verifier Steps
The verifier checks
Verifying Groth16 in Solidity
At this point, you have sufficient knowledge to understand the proof verification code in Solidity. Here is Tornado Cash’s proof verification code. The reader is encouraged to read the source code closely. If the reader is comfortable with Solidity assembly programming, then understanding this source code will not be difficult as the variable names are consistent with the ones in this article.
Intuitively, the attacker is multiplying and by , and cancels itself out in the pairing.
Hence, if the verification formula accepts
then it will also accept
The defense against this attack is described in the following section.
You can see a proof of concept of this attack in this article.
The prover can create an unlimited number of proofs for the same witness
This isn’t a “security issue” per se – it is necessary to achieve Zero Knowledge. However, the application needs a mechanism to track which facts have already been proven and cannot rely on the uniqueness of the proof to achieve that.
Learn more with RareSkills
Our ability to publish material like this free of charge depends on the continued support of our students. Consider signing up for our Zero Knowledge Bootcamp, Web3 Bootcamps, or getting a job on RareTalent.