Solidity Mutation Testing

Mutation testing is a method to check the quality of the test suite by intentionally introducing bugs into the code and ensuring the tests catch the bug.

The kind of bugs that get introduced are straightforward. Consider the following examples:

// original function
function mint() external payable {
    require(msg.value >= PRICE, "insufficient msg value");
}

// mutated function
function mint() external public {
    require(msg.value < PRICE, "insufficient msg value");
}

In the example above, the inequality operator was flipped. If the unit tests still pass, then the unit tests are simply offering false assurance.

It is important that the bugs be syntactically valid, i.e. still result in compilable solidity code. If the code doesn’t compile, then it won’t be possible to run the unit tests.

Line Coverage Without Testing

Let’s use the default example Foundry provides after running forge init and comment out the assert statements.

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.13;

import "forge-std/Test.sol";
import "../src/Counter.sol";

contract CounterTest is Test {
    Counter public counter;

    function setUp() public {
        counter = new Counter();
        counter.setNumber(0);
    }

    function testIncrement() public {
        counter.increment();
        //assertEq(counter.number(), 1);
    }

    function testSetNumber(uint256 x) public {
        counter.setNumber(x);
        //assertEq(counter.number(), x);
    }
}

If we run forge coverage, we get the following table:

Supposedly, we have 100% line and branch coverage on Counter.sol despite having no assert statements! This means we can introduce bugs at will and the tests will still pass.

Now of course, this is a blatant example of what not to do. But it’s easy to accidentally make this mistake when optimizing for coverage. Coverage only tells you that you ran the code and it didn’t revert. You want to ensure that all the expected state changes are actually taking place (see our other post for more on solidity unit tesing best practices.

Kinds of Mutants

Here are some kinds of mutations that may be useful:

deleting function modifiers
inverting inequality comparisons
changing constant values or swapping string constants for empty strings
replace true with false
replace && with || and bitwise & with bitwise |
swapping arithmetic operators (e.g. + becomes -)
deleting lines
swapping lines

Automatic Mutation Testing

It would be rather tedious to manually mutate the code according to the rules above and then run the test suite. Thus, tools exist that do this automatically. They generate dozens of potential mutations, mutate the code, run the test suite, store the results, and generate a report afterwards. There can be three outcomes:

mutant survived
equivalent mutant
mutant killed

Mutant survived means that the code was changed and the test still passed. An equivalent mutant happens when the bytecode did not change after running the mutation. This can happen if a symbol is randomly replaced with the same symbol, or the mutation doesn’t alter the business logic and the compiler optimization ignores the change.

Here is an example of where an equivalent mutant could occur:

// before
x = x + 1;
y = y + 1;

// after
y = y + 1;
x = x + 1;

Under some circumstances, the compiler might produce the same bytecode after a mutation like this. This is an equivalent mutation. Equivalent mutants might signal unnecessary or dead code like in the following example:

require(false);
// anything that happens here doesn't matter

Finally, the mutant killed scenario is the desirable one. It means the code was mutated and the tests failed. Therefore, the tests can actually detect when something goes wrong. If a mutation results in non-compiling code, e.g. deleting a variable declaration that is used later, then the mutant is considered killed.

100% Line and Branch Coverage is Important for Mutation Testing

If a line or branch is not covered, then naturally mutating this line will not cause the test to fail.

Consider the following example:

function mint(address to_, string memory questId_) public onlyMinter {
	// business logic
}

There is an implied branch here with the onlyMinter modifier. If this is only tested in a situation where the minter was the one calling the function, then deleting onlyMinter will not cause the test to fail. If the onlyMinter modifier doesn’t block non-minters, then the unit tests won’t catch it.

By the way, as contrived as this example may seem, it is taken from a real codearena report.

Off by One Errors and Boundry Conditions

Mutation tests can be useful for catching off-by-one errors. Consider the following mutation:

uint256 public LIMIT = 5;

// original
function mint(uint256 amount) external {
	require(amount < LIMIT, "exceeds limit");
}

// mutation
function mint(uint256 amount) external {
	require(amount <= LIMIT, "exceeds limit");
}

If our unit tests set the amount to be 3 and 8, the code will have a 100% branch coverage with respect to this test. However, the mutation tests will fail because the strict inequality was replaced with an inequality and the test still passed. This is because the tests do not accurately express the intended functionality. Specifically, the tests should enforce if the upper limit is 4 or 5. Testing values for amount like 3 or 8 do not fully define the smart contract specification for this function.

Vertigo-rs

RareSkills actively maintains a mutation testing tool for Solidity, vertigo-rs. This was forked from the vertigo repo which is no longer maintained. Support for the Foundry framework has been added. The tool works with Foundry, Hardhat, and Truffle. Instructions to run the tool are in the Readme. No modifications to the Solidity codebase or tests are required. Simply close the repository, install the dependencies, then run it in the Solidity project that you are are testing.

Other Mutation Testing Tools

Although vertigo-rs is the only tool that automatically runs the test suit, there are other noteable tools for generating mutations (but they don’t support automatically re-running the test suite and summarizing the results).

Gambit by Certora
Universal Mutator by sambucha

There are other tools, but they apparently are no longer maintained.

Mutation Score

Tools for languages besides Solidity sometimes provide a mutation score. This the the percentage of mutants that were killed. If 100% of the mutants were killed, then the unit tests can be relied upon to detect unwanted or accidental changes in the codebase.

For very large codebases, having a 100% score may be impractical. Solidity smart contracts are quite small compared to traditional codebases, such as most backend and frontend applications. Aiming for a 100% mutation score for codebases that large may be infeasible. But because Solidity smart contracts are relatively small, and bugs are catastrophic, surviving mutants should be scrutanized carefully.

Limitations of Mutation Testing

Because mutation testing tests the quality of unit tests, and unit tests are generally stateless, mutation testing cannot naturally illuminate that stateful business logic is testing properly.

Mutation testing can create hundreds of mutations, but for the sake of time, most tools only run a subset of them. This means important mutations that uncover bugs in the test suite may be missed.

Learn More

This material is part of our Solidity bootcamp. You can also learn Solidity for free with our free Solidity course.

Originally Published April 14, 2023