Oct 18

Oct 18 Files in test scripts

The txtar format is an ingenious way to supply arbitrary files and folder structures to test scripts.

In Part 1 of this series, we got started with the testscript package, and in Part 2 we learned how to test Go CLI tools using a test script. Now let’s find out some more neat stuff.

The `txtar` format: constructing test data files

We’ve already seen in this series that we can create files in the script’s work directory, using this special syntax to indicate a named file entry:

-- golden.txt --
... file contents ...

(Listing hello/1)

The line beginning --, called a file marker line, tells testscript that everything following this line (until the next file marker) should be treated as the contents of golden.txt.

A file marker line must begin with two hyphens and a space, as in our example, and end with a space and two hyphens. The part in between these markers specifies the filename, which will be stripped of any surrounding whitespace.

In fact, we can create as many additional files as we want, simply by adding more file entries delimited by marker lines:

exec cat a.txt b.txt c.txt

-- a.txt --
...

-- b.txt --
...

-- c.txt --
...

(Listing hello/1)

Each marker line denotes the beginning of a new file, followed by zero or more lines of content, and ending at the next file marker line, if there is one. All these files will be created in the work directory before the script starts, so we can rely on them being present.

If we need to create folders, or even whole trees of files and folders, we can do that by using slash-separated paths in the file names:

-- misc/a.txt --
...

-- misc/subfolder/b.txt --
...

-- extra/c.txt --
...

(Listing hello/1)

When the script runs, the following tree of files will be created for it to use:

$WORK/
    misc/
        a.txt
        subfolder/
            b.txt
    extra/
        c.txt

This is a very neat way of constructing an arbitrary set of files and folders for the test, rather than having to create them using Go code, or copying them from somewhere in testdata. Instead, we can represent any number of text files as part of a single file, the test script itself.

This representation, by the way, is called txtar format, short for “text archive”, and that’s also the file extension we use for such files.

While this format is very useful in conjunction with testscript, a txtar file doesn’t have to be a test script. It can also be used independently, as a simple way of combining multiple folders and text files into a single file.

To use the txtar format in programs directly, import the txtar package. For example, if you want to write a tool that can read data from such archives, or that allows users to supply input as txtar files, you can do that using the txtar package.

Supplying input to programs using `stdin`

Now that we know how to create arbitrary files in the script’s work directory, we have another fun trick available. We can use one of those files to supply input to a program, as though it were typed by an interactive user, or piped into the program using a shell.

Let’s see if we can use this idea to test a more sophisticated version of our hello program. This time, we’ll prompt the user to enter their name, and we’ll use that name as part of the greeting we print.

First, we’ll set up a delegate main function that returns an exit status value, as we did with the hello program:

func Main() int {
    fmt.Println("Your name? ")
    scanner := bufio.NewScanner(os.Stdin)
    if !scanner.Scan() {
        return 1
    }
    fmt.Printf("Hello, %s!\n", scanner.Text())
    return 0
}

(Listing greet/6)

Just as with hello, we’ll use the map passed to testscript.RunMain to associate our new custom program greet with the greet.Main function:

func TestMain(m *testing.M) {
    os.Exit(testscript.RunMain(m, map[string]func() int{
        "greet": greet.Main,
    }))
}

(Listing greet/6)

We’ll add the usual parent test that calls testscript.Run:

func TestGreet(t *testing.T) {
    testscript.Run(t, testscript.Params{
        Dir: "testdata/script",
    })
}

(Listing greet/6)

And here’s a test script that runs the greet program, supplies it with fake “user” input via stdin, and checks its output:

stdin input.txt
exec greet
stdout 'Hello, John!'

-- input.txt --
John

(Listing greet/6)

First, the stdin statement specifies that standard input for the next program run by exec will come from input.txt (defined at the end of the script file, using a txtar file entry).

Next, we exec the greet command, and verify that its output matches Hello, John!. Very simple, but a powerful way to simulate any amount of user input for testing. Indeed, we could simulate a whole “conversation” with the user.

Suppose the program asks the user for their favourite food as well as their name, then prints a customised dining invitation:

func Main() int {
    fmt.Println("Your name? ")
    scanner := bufio.NewScanner(os.Stdin)
    if !scanner.Scan() {
        return 1
    }
    name := scanner.Text()
    fmt.Println("Your favourite food? ")
    if !scanner.Scan() {
        return 1
    }
    food := scanner.Text()
    fmt.Printf("Hello, %s. Care to join me for some %s?\n", name,
        food)
    return 0
}

(Listing greet/7)

How can we test this with a script? Because the program scans user input a line at a time, we can construct our “fake input” file to contain the user’s name and favourite food on consecutive lines:

stdin input.txt
exec greet
stdout 'Hello, Kim. Care to join me for some barbecue?'

-- input.txt --
Kim
barbecue

(Listing greet/7)

We can go even further with stdin. We’re not restricted to supplying input from a file; we can also use the output of a previous exec. This could be useful when one program is supposed to generate output which will be piped to another, for example:

exec echo hello
stdin stdout
exec cat
stdout 'hello'

First, we execute echo hello. Next, we say stdin stdout, meaning that the input for the next exec should be the output of the previous exec (in this case, that input will be the string hello).

Finally, we execute the cat command, which copies its input to its output, producing the final result of this “pipeline”: hello. You can chain programs together using stdin stdout as many times as necessary.

This isn’t quite like a shell pipeline, though, because there’s no concurrency involved: stdin reads its entire input before continuing. This is fine for most scripts, but just be aware that the script won’t proceed until the previous exec has finished and closed its output stream.

File operations

Just as in a traditional shell script, we can copy one file to another using cp:

cp a.txt b.txt

However, the first argument to cp can also be stdout or stderr, indicating that we want to copy the output of a previous exec to some file:

exec echo hello
cp stdout tmp.txt

We can also use mv to move a file (that is, rename it) instead of copying:

mv a.txt b.txt

We can also create a directory using mkdir, and then copy multiple files into it with cp:

mkdir data
cp a.txt b.txt c.txt data

The cd statement will change the current directory for subsequent programs run by exec:

cd data

To delete a file or directory, use the rm statement:

rm data

When used with a directory, rm acts recursively, like the shell’s rm -rf: it deletes all contained files and subdirectories before deleting the directory itself.

To create a symbolic link from one file or directory to another, we can use the symlink statement, like this:

mkdir target
symlink source -> target

Note that the -> is required, and it indicates the “direction” of the symlink. In this example, the link source will be created, pointing to the existing directory target.

Differences from shell scripts

While test scripts look a lot like shell scripts, they don’t have any kind of control flow statements, such as loops or functions. Failing assertions will bail out of the script early, but that’s it. On the plus side, this makes test scripts pretty easy to read and understand.

There is a limited form of conditional statement, as we’ll see later in this series, but let’s first look at a few other ways in which test scripts differ from shell scripts.

For one thing, they don’t actually use a shell: commands run by exec are invoked directly, without being parsed by a shell first. So some of the familiar facilities of shell command lines, such as globbing (using wildcards such as * to represent multiple filenames) are not available to us directly.

That’s not necessarily a problem, though. If we need to expand a glob expression, we can simply ask the shell to do it for us:

# List all files whose names begin with '.'
exec sh -c 'ls .*'

Similarly, we can’t use the pipe character (|) to send the output of one command to another, as in a shell pipeline. Actually, we already know how to chain programs together in a test script using stdin stdout. But, again, if we’d rather invoke the shell to do this, we can:

# count the number of lines printed by 'echo hello'
exec sh -c 'echo hello | wc -l'

Comments and phases

A # character in a test script begins a comment, as we saw in the examples in the previous section. Everything else until the end of the line is considered part of the comment, and ignored. That means we can add comments after commands, too:

exec echo hello # this comment will not appear in output

Actually, comments in scripts aren’t entirely ignored. They also delimit distinct sections, or phases, in the script.

For example, we can use a comment to tell the reader what’s happening at each stage of a script that does several things:

# run an existing command: this will succeed
exec echo hello

# try to run a command that doesn't exist
exec bogus

This is informative, but that’s not all. If the script fails at any point, testscript will print the detailed log output, as we’ve seen in previous examples. However, only the log of the current phase is printed; previous phases are suppressed. Only their comments appear, to show that they succeeded.

Here’s the result of running the two-phase script shown in the previous example:

# run an existing command: this will succeed (0.003s)
# try to run a command that doesn't exist (0.000s)
> exec bogus
[exec: "bogus": executable file not found in $PATH]

Notice that the exec statement from the first phase (running echo hello) isn’t shown here. Instead, we see only its associated comment, which testscript treats as a description of that phase, followed by the time it took to run:

# run an existing command: this will succeed (0.003s)

Separating the script into phases using comments like this can be very helpful for keeping the failure output concise, and of course it makes the script more readable too.

That’s it for Part 3; in Part 4 we’ll get to grips with conditions in scripts, supplying environment variables to programs, and running programs in the background. See you there!

Previous: Testing CLI tools in Go

Next: Conditions and concurrency