Fixing a fuzzed-up function

Fixing a fuzzed-up function

There are bugs in your program; I guarantee it.

We can’t prove anything in science. But the best that scientists can do is fail to disprove things while pointing to how hard they tried.

—Richard Dawkins, “The Greatest Show on Earth”

The trouble with testing your code for bugs, of course, is that you sometimes find them. Then what do you do?

Well, fixing them would be a start. In Target acquired, we wrote a fuzz test in Go for some function FirstRune that’s supposed to return the first rune from a string. Here’s the test:

func FuzzFirstRune(f *testing.F) {
    f.Add("Hello")
    f.Add("world")
    f.Fuzz(func(t *testing.T, s string) {
        got := runes.FirstRune(s)
        want, _ := utf8.DecodeRuneInString(s)
        if want == utf8.RuneError {
            t.Skip() // don't bother testing invalid runes
        }
        if want != got {
            t.Errorf("given %q (0x%[1]x): want '%c' (0x%[2]x)",
                s, want)
            t.Errorf("got '%c' (0x%[1]x)", got)
        }
    })
}

(Listing runes/1)

And we implemented FirstRunepoorly:

func FirstRune(s string) rune {
    return 0
}

(Listing runes/1)

If the test didn’t fail with this function, we might start to worry a bit about whether it would detect any bugs. But it does indeed report, reassuringly:

go test -fuzz .

fuzz: elapsed: 0s, gathering baseline coverage: 0/2 completed
failure while testing seed corpus entry: FuzzFirstRune/seed#1
fuzz: elapsed: 0s, gathering baseline coverage: 0/2 completed
--- FAIL: FuzzFirstRune (0.02s)
    --- FAIL: FuzzFirstRune (0.00s)
        runes_test.go:22: given "world" (0x776f726c64): want 'w'
        (0x77)
        runes_test.go:24: got '' (0x0)

In other words, asking for the first rune in the string "world" gives us the rune value 0, which is incorrect: it should be 'w'.

So we’ve found the bug: this function does nothing useful whatsoever! What can we do to fix this?

Well, here’s a slightly more plausible version. Is this okay?

func FirstRune(s string) rune {
    return rune(s[0])
}

(Listing runes/2)

We use the square bracket slice-index notation (s[0]) to extract the first byte of s, and we return it having converted it explicitly to the rune type.

Let’s see what the fuzzer says about this:

go test -fuzz .

fuzz: elapsed: 0s, gathering baseline coverage: 0/2 completed
fuzz: elapsed: 0s, gathering baseline coverage: 2/2 completed,
    now fuzzing with 8 workers
fuzz: minimizing 27-byte failing input file
fuzz: elapsed: 0s, minimizing
--- FAIL: FuzzFirstRune (0.16s)
    --- FAIL: FuzzFirstRune (0.00s)
        testing.go:1356: panic: runtime error: index out of range
        [0] with length 0
        [stack trace omitted]

What’s this “minimizing” about? When the fuzzer finds some input that fails the test, it tries smaller and smaller versions of it, looking for the minimal failing input:

fuzz: minimizing 27-byte failing input file
fuzz: elapsed: 0s, minimizing

In this example, we can see that it found a 27-byte input string that caused the fuzz target to fail. It then tries smaller and smaller versions of the input to see if they still fail. This process continues until it has found the minimal input that causes the fuzz target to fail.

And here it is:

--- FAIL: FuzzFirstRune (0.16s)
    --- FAIL: FuzzFirstRune (0.00s)
        testing.go:1356: panic: runtime error: index out of range
        [0] with length 0

It seems that, in this case, the fuzzer managed to shrink the failing input to a completely empty string (length 0). This is rather puzzling. What’s gone on here?

In fact, what’s probably happened is that the fuzzer generated a random 27-byte string whose first rune was multi-byte, causing the function to return the wrong answer. As the fuzzer tried shorter and shorter versions of the string, the same “wrong result” failure kept happening.

Finally, when it reduced the string to length zero, it got a different failure: an “index out of range” panic. It so happens that FirstRune doesn’t cope well with empty input strings:

func FirstRune(s string) rune {
    return rune(s[0]) // oops, slice overrun
}

(Listing runes/2)

As I’m sure you spotted right away, the slice index expression s[0] will panic when s is empty, and indeed that’s what we’re seeing. Let’s fix that:

func FirstRune(s string) rune {
    if s == "" {
        return utf8.RuneError
    }
    return rune(s[0])
}

(Listing runes/3)

It seems reasonable to follow the example set by the standard library, which is to return utf8.RuneError in this case. Now the function won’t panic when given an empty input.

Detecting more subtle bugs

The fuzzer has already proved its worth, but how do we know there aren’t still more bugs lurking in FirstRune? Well, let’s fuzz around and find out.

go test -fuzz .

fuzz: elapsed: 0s, gathering baseline coverage: 0/3 completed
fuzz: elapsed: 0s, gathering baseline coverage: 3/3 completed, now
fuzzing with 8 workers
fuzz: minimizing 32-byte failing input file
fuzz: elapsed: 0s, minimizing
--- FAIL: FuzzFirstRune (0.13s)
    --- FAIL: FuzzFirstRune (0.00s)
        fuzz_test.go:22: given "ʭ" (0xcaad): want 'ʭ' (0x2ad),
        got 'Ê' (0xca)

Again, interpreting this output, we can see that the fuzzer found a 32-byte failing input, which it then managed to shrink to just 2 bytes:

given "ʭ" (0xcaad)

That is, this input string consists of the rune ʭ, which in UTF-8 is encoded as two bytes, 0xca and 0xad. If FirstRune had correctly decoded this string, it would have returned the rune ʭ, the same result as the oracle.

Instead, it mistakenly interpreted the first byte of the string (0xca) as a single-byte rune, in this case the letter Ê.

A victory for fuzzing! Even if we didn’t know about the distinction between runes and bytes in Go, and even if we might not have thought of using a string like “ʭ” as an example input, we were still able to find the bug.

Fixing the implementation

How should we have implemented FirstRune correctly, then? Well, that’s not really important for the discussion of fuzz testing, but I don’t like to leave anyone in suspense. Let’s see if we can fix this bug.

There are a variety of ways we could write FirstRune correctly, including the rather dull one of simply calling utf8.DecodeRuneInString, as the test does.

But here’s a slightly more interesting version, just for fun:

func FirstRune(s string) rune {
    for _, r := range s {
        return r
    }
    return utf8.RuneError
}

(Listing runes/4)

Why does this work? As you probably know, the range operator iterates over a string by runes, not by bytes. So, on each successive execution of this loop, r will be the next rune in the string, starting with the first.

But on this very first iteration, we hit the return statement inside the loop body, and we return the first rune straight away. After all, that’s the right answer!

In the case where s is empty, the for loop will execute zero times; that is to say, not at all. We’ll go straight to the end of the function and return the value 0.

I think this is correct, but let’s see if the fuzzer agrees. Or, rather, we’ll give the fuzzer a chance to find some counterexamples, and see if it succeeds:

go test -fuzz .

fuzz: elapsed: 0s, gathering baseline coverage: 0/4 completed
fuzz: elapsed: 0s, gathering baseline coverage: 4/4 completed, now
fuzzing with 8 workers
fuzz: elapsed: 3s, execs: 130517 (43494/sec), new interesting: 5
(total: 13)
fuzz: elapsed: 6s, execs: 268452 (45962/sec), new interesting: 5
(total: 13)
fuzz: elapsed: 9s, execs: 396881 (42797/sec), new interesting: 5
(total: 13)
...

And, in fact, I left this to run for a minute or two, but it didn’t find any more failing inputs. That doesn’t guarantee that there aren’t any, of course. Even the fuzzer can’t guarantee that.

What it can do is increase our confidence that we’ve found most of the low-hanging bugs. If we fuzz a function for a decent length of time and it doesn’t fail, that’s encouraging. If it does fail, we’ve generated a new test case that will help us catch regressions in the future. Either way, it’s fuzzing useful.

My book The Power of Go: Tests is a compendium of hard-won tips, tricks, and techniques for testing Go programs, including the largest and scariest codebases. Even more importantly, it’s all about how to use tests to drive great design, and build reliable programs that you can be proud of. I hope you’ll take a look.

You can do this: surviving day one

You can do this: surviving day one

Target acquired

Target acquired