Jan 3

Jan 3 The magic function

(photo by Jeff Jackowski, licensed under Creative Commons)

Yes, he had a clean desk. But that was because he was throwing all the paperwork away.
—Terry Pratchett, “The Fifth Elephant”

Coding is easy, but programming is hard—at least, if you’re doing it right. That’s because good programmers are not just trying to solve a specific problem: they’re trying to build an abstraction that solves a general class of problems (including this one).

Counting lines

Creating abstractions—that is, making your solution more general—is the whole art of software design. And that’s why it’s a bit of a challenge at first. When people ask questions like “How should I structure my Rust project?” or “When should I split up my program into multiple modules or crates?”, this is what they’re really asking: what abstractions should I create, and how should they work?

In my new book The Secrets of Rust: Tools, we’ll focus on using good abstraction design to build user-friendly APIs and command-line tools in Rust. I’m a simple man who likes simple programs, and that’s the kind we’ll be writing together.

So in this excerpt from the book, let’s practise our abstraction skills a little by designing a small but useful command-line tool in Rust: one that can count lines in its input. You might have used the Unix program wc, for example, which has a line-counting mode:

wc -l bigfile.txt

A prototype line counter

Could we build something in Rust which does more or less the same thing? Let’s try.

cargo new count

We’ll start by putting everything in the src/main.rs file. Even though we’re planning to build an importable library crate, we can’t do that before we even know what it should do. So let’s get something up and running first, and then think about how to turn it into a reusable abstraction.

Here’s a rough prototype that could work:

use std::io::{stdin, BufRead};

fn main() {
    let input = stdin().lock();
    let lines = input.lines().count();
    println!("{lines} lines");
}

(Listing count_1)

The first cut

Let’s break it down. First, we call stdin() to get a handle to read the standard input. Then we call lock() on the result, to get an exclusive lock on the Stdin object.

This isn’t strictly necessary here, but it’s good practice. In principle, a program could have multiple threads of execution that want to read from standard input, and these concurrent reads might conflict. Obtaining this mutex (“mutually exclusive”) lock ensures that no other thread can read from the input, as long as we hold the lock.

The StdinLock object we get also conveniently implements the BufRead trait, which gives us a buffered reader on the input. It’s not so much that we need to buffer the input in this case, though that usually helps performance. It’s more that the BufRead trait also gives us the lines() method, which will iterate over the input data one line at a time.

And that’s what we do: we call lines() to get that iterator, and then we use the standard count() method to count the number of items the iterator produces. That’s the answer we want, so we print it out.

As it happens, this is a relatively inefficient way to count lines. The lines() iterator gives us each line in the file as a string, which means allocating memory, but we simply throw the string away. We can probably come up with better ways to do this if we think really hard, but at the moment that’s not the point.

What do users want?

The point is that lines().count() is a very straightforward and intuitive way to count lines, and we’re not really interested in the implementation so much as the API. What would be a nice API for a library crate that provided line counting? And would it also be convenient for testing?

We’ll come to that, but first let’s see if it gives us the same answer as wc, for a quick sanity check. Let’s try it on the same bigfile.txt we counted earlier, and that wc told us contains 8,582 lines. Do we get the same result?

cargo run <bigfile.txt

8582 lines

It doesn’t look as though we’ve got things too badly wrong so far. Of course, we don’t have a library yet, so that’s the next job.

The “magic function” approach

We’d like to turn this code into a reusable library package that other people can import and use in their own programs: in Rust, this is called a crate. We’re going to move all the “behaviour” code—all the code that does stuff, as opposed to just presenting the results—into a library function.

And straight away we have a challenge, because there are lots of different ways we could write the API of that function: from its name to its parameters and what it returns. How should we approach this design problem?

We could try to guess what might be convenient for users, and we might be right—but most likely we won’t. It’s too easy to get sidetracked by what’s convenient for us as implementers. So to avoid that, let’s rewrite our main function to call this hypothetical line-counting function, as though it already existed.

I call this the “magic function” approach: pretend you have some wonderful, magical function that just does whatever it is you need doing. And then call it!

You’ll see from the way you want to call it what the API of the function has to be. All you need to do then is write it: you’ve already done the necessary design thinking.

So, if we had a suitable magic count_lines function, what would we like to write in order to call it? Something like this, perhaps:

use std::io::stdin;

use count::count_lines;

fn main() {
    let lines = count_lines(stdin().lock());
    println!("{lines} lines");
}

(Listing count_2)

What do you think? It’s hard to imagine how we could write less here and still be able to do what we want. That’s what I want from a library crate: an API where I have to do the least possible paperwork in order to use it.

Now let’s take a look at the lib.rs we need to implement to make this work.

Testing `count_lines`

First of all, let’s write a test, because that puts even stronger constraints on how count_lines must be implemented. It will have to satisfy both the real user (in this case main), and the test (which will be passing in some kind of reader that’s not standard input).

So, what does the test need to do? Well, call count_lines, necessarily, and check the result. So what can we pass to count_lines as the “fake” input here?

We need something that implements BufRead, because we’re going to call its lines method. But we’d like to be able to construct it from a string so that we can control how many lines it contains (let’s say 2).

There’s a nice type called std::io::Cursor that’s perfect for this. It takes anything that looks like a byte array (a string is fine) and turns it into a reader (that is, something that implements Read). Handily for our purposes, Cursor is also BufRead, which is exactly what count_line is expecting.

Here we go, then:

use std::io::Cursor;

#[test]
fn count_lines_fn_counts_lines_in_input() {
    let input = Cursor::new("line 1\nline 2\n");
    let lines = count_lines(input);
    assert_eq!(lines, 2, "wrong line count");
}

(Listing count_2)

Seems reasonable, doesn’t it? We create a Cursor containing two lines of text, and pass it to count_lines. We assert that the answer is 2, and fail the test otherwise.

In the next post, we’ll verify this test by deliberately writing a terrible implementation of count_lines that doesn’t work at all. See you then!

Jan 3 The magic function

Counting lines

A prototype line counter

The first cut

What do users want?

The “magic function” approach

Testing count_lines

Jan 13 How to know when it's time to go

Jan 1 What are the best Go books in 2025?

Related Posts

Testing `count_lines`