Target acquired
Got your bytes and runes twisted? See a doctor immediately.
By its nature, testing is never complete. As the influential computer scientist Edsger Dijkstra put it, “Testing shows the presence, not the absence of bugs.” No quantity of tests can ever prove a package free of bugs. At best, they increase our confidence that the package works well in a wide range of important scenarios.
—Alan Donovan & Brian Kernighan, “The Go Programming Language”
If you feel a bit fuzzy about your Go tests, you’re not alone. Even if you diligently write tests for all your code, there’s always that sneaking doubt: what if there’s some test case you didn’t think of?
In previous posts, we’ve talked about ways to generate test cases randomly, and how to use Go’s fuzz testing facility to automate this process. Now let’s see how to apply fuzz testing to a realistic function, and use it to find a very common Go bug, by writing a fuzz target.
Suppose we want to write some function FirstRune
that
takes a string and returns its first rune. Given the string
"Hello"
, let’s say, it should return the rune
'H'
. That sounds doable, doesn’t it?
In a programming language where there’s a one-to-one correspondence
between runes and bytes, this would be straightforward. If the string is
s
, for example, we can simply index the first byte of the
string with s[0]
and return it. Job done!
But that won’t work in Go, because that byte may be merely the first part of a multi-byte UTF-8 sequence. Indeed, it can take up to four bytes to represent some Unicode characters. We just won’t know what we’ve got until we look at the actual bytes.
Let’s see if we can use fuzz testing to catch this kind of error,
then. We’ll use the fuzzer to generate random inputs, and see if any of
them produce the wrong answer from FirstRune
.
Here’s an example of such a test:
func FuzzFirstRune(f *testing.F) {
.Add("Hello")
f.Add("world")
f.Fuzz(func(t *testing.T, s string) {
f:= runes.FirstRune(s)
got , _ := utf8.DecodeRuneInString(s)
wantif want == utf8.RuneError {
.Skip() // don't bother testing invalid runes
t}
if want != got {
.Errorf("given %q (0x%[1]x): want '%c' (0x%[2]x)",
t, want)
s.Errorf("got '%c' (0x%[1]x)", got)
t}
})
}
It might look a little alarming at first, but we’ll break it down, line by line.
The general idea is to call FirstRune
with some string
(randomly generated by the fuzzer), and compare the result against what
the “oracle” (in this case, the standard
utf8.DecodeRuneInString
function) says it should be.
If our FirstRune
function works correctly, it should
give the same rune result as DecodeRuneInString
for all
inputs. But if there’s a bug in the function, its result won’t match the
oracle’s, and the fuzz target will mark this as a failing test case.
My book The Power of Go: Tests is all about testing in Go, and it covers fuzz testing, mutation testing, property-based testing, and all sorts of advanced techniques that most Go programmers don’t even know about. If you’re enjoying this excerpt, do check out the full book.
Adding training data with
f.Add
While, as we’ve seen, the fuzzer is quite capable of generating
random values from scratch, it can be helpful to give it some starting
inputs as hints. Remember the mysterious warning it gave us when we ran
the fuzz test for the Guess
function?
warning: starting with empty corpus
The fuzzer’s corpus is the set of training data we give it, if you like. It can still fuzz even with an empty corpus, as we saw, but it helps if we can provide a few example values for it to start with. These will be supplied to our fuzz target and tested. The fuzzer will then generate more inputs from them by mutating them randomly.
Here’s how we add training data to the corpus:
.Add("Hello")
f.Add("world") f
We can call f.Add
to add more example values as many
times as we like. The more training data we can give the fuzzer to
suggest possible lines of attack, the better.
The corpus also includes any previously-generated failing test cases
stored in the testdata/fuzz
folder, like the “21” case in
our previous example. So over time we’ll build up a bigger and bigger
corpus, which gives the fuzzer more to work with.
A more sophisticated fuzz target
Next comes the call to f.Fuzz
with our fuzz target:
.Fuzz(func(t *testing.T, s string) {
f:= runes.FirstRune(s)
got , _ := utf8.DecodeRuneInString(s)
wantif want == utf8.RuneError {
.Skip() // don't bother testing invalid runes
t}
if want != got {
... // fail
}
})
First, we’ll call the function and save its result to a variable
got
, so we can compare it with the expected result. What
is the expected result, though, since we don’t know in advance
what s
is?
Well, we can’t predict it, but we can ask the oracle.
DecodeRuneInString
will give us the right answer, and we’ll
store it in the want
variable.
Now, we know that the fuzzer is going to generate lots of random
strings for us to use as input, but are all possible strings actually
valid input to FirstRune
?
No, because not all strings encode valid UTF-8 runes. For example,
the empty string doesn’t. And there are many sequences of arbitrary
bytes that don’t happen to be valid UTF-8 data. So we need to do a
little pre-filtering on our input, before checking whether
FirstRune
actually handled it correctly.
Suppose the fuzzer calls our fuzz target with some string that
doesn’t contain valid runes. What will our oracle function
DecodeRuneInString
return in this case?
The answer is the constant utf8.RuneError
. If we get
that result, then this particular input value is invalid, and we should
just skip it. That’s what the call to t.Skip
does:
if want == utf8.RuneError {
.Skip() // don't bother testing invalid runes
t}
A skipped test neither fails nor succeeds; the fuzzer just ignores it and moves on to the next randomly-generated input.
Why don’t we skip the test before calling
FirstRune
with the input? Because even though we don’t
require FirstRune
to give the right answer for
invalid strings, we do require that it doesn’t panic. So if any
fuzz input makes FirstRune
panic, we want to find that out
before we skip.
Finally, just like in any test, we compare want
and
got
, and fail the test if they’re not equal. The failure
message is a little more complicated than any we’ve written so far:
.Errorf("given %q (0x%[1]x): want '%c' (0x%[2]x)", s, want)
t.Errorf("got '%c' (0x%[1]x)", got) t
Let’s break it down. In the event of a failure, what information would we like to print? Certainly the input string that caused the failure, so that’s the first piece of data:
given %q
Even though this string is guaranteed to be valid UTF-8, we can’t guarantee that all its characters will be printable, or have a corresponding glyph in the display font we’re using.
So just in case the string contains some non-printing runes, we’d also like to print out its raw bytes, as numbers. How can we do that?
As you probably know, the fmt
verb %x
will
print a piece of data as a hexadecimal value. And we can give the reader
a clue that what follows is a hex value by prefixing it with “0x”, which
denotes a hex literal in Go.
So a format string like this would do the job:
.Errorf("given %q (0x%x) ... ", s, s) t
For example, if s
were the string “hello”, and the test
failed, this would print:
given "hello" (0x68656c6c6f) ...
But notice that we had to supply the argument s
twice, which seems wasteful. As you know, every %
verb in the format string has to correspond to a piece of data, supplied
as a subsequent argument to t.Errorf
.
So if there are two verbs (%q
and %x
), then
we need two data arguments, even though they’re actually the
same value. Is there a way to avoid repeating s
in this
case, then?
Yes. We can use what’s called an explicit argument index,
which is a number in square brackets that goes between the
%
and the x
in %x
:
%[1]x
This is interpreted by the formatting machinery as meaning “print the
Nth data value”. In this case, that’ll be number 1, the first data
value, which is s
.
We can use the explicit-index trick to repeat a given argument any number of times in our format string:
.Printf("%s %[1]s %[1]s %[1]s", "hello")
fmt// Output:
// hello hello hello hello
In the fuzzing example, we need to use s
twice, once to
print it out as a quoted string (%q
), and again as a hex
value (%x
). So here’s how we do that:
.Errorf("given %q (0x%[1]x) ...", s, ...) t
So far, so good. What about the next part of the format string? Since
both want
and got
in this case are characters,
we’ll need to use the verb %c
to display them.
And, again, we’ll use the explicit argument index (in this case,
index 2) to re-use the same data, but this time printing it as a hex
value (%x
).
.Errorf("... want '%c' (0x%[2]x)", ..., want) t
Here’s what that would look like if the test failed:
... want 'w' (0x77)
The final part of the failure message prints got
in the
same way:
.Errorf("got '%c' (0x%[1]x)", got) t
And the result:
got 'v' (0x76)
Using the fuzzer to detect a panic
So we’re ready to write a null implementation of
FirstRune
and try out our test. We know it won’t pass, and
that’s the point. As you’ll recall, if we’re trying to build a bug
detector, we need to see it detect a bug.
Let’s write one:
func FirstRune(s string) rune {
return 0
}
This is definitely wrong, so let’s see if the test agrees:
go test -fuzz .
fuzz: elapsed: 0s, gathering baseline coverage: 0/2 completed
failure while testing seed corpus entry: FuzzFirstRune/seed#1
fuzz: elapsed: 0s, gathering baseline coverage: 0/2 completed
--- FAIL: FuzzFirstRune (0.02s)
--- FAIL: FuzzFirstRune (0.00s)
runes_test.go:22: given "world" (0x776f726c64): want 'w'
(0x77)
runes_test.go:24: got '' (0x0)
That’s encouraging, so let’s see what our carefully-constructed failure message actually reports:
given "world" (0x776f726c64): want 'w' (0x77)
got '' (0x0)
So the failing input is “world”, which was one of the cases we
manually added to the corpus using f.Add
. What caused the
failure, then?
The oracle thinks that the first rune of "world"
is
w
, and I agree. But what FirstRune
actually returned in this case was the rune value
0x0
, which corresponds to a non-printing character.
This is exactly what we expected to happen, since we deliberately
wrote FirstRune
so that it always returns zero, no
matter what.
Great! We now know that the test can detect at least this
bug, where the function does nothing at all. That’s a good start. In the
next post, we’ll fix that bug by writing a real implementation of
FirstRune
, but we’ll also introduce a new, more subtle bug,
and see if we can fix this fuzzed-up
function. Catch you later!