The magic function
Yes, he had a clean desk. But that was because he was throwing all the paperwork away.
—Terry Pratchett, “The Fifth Elephant”
Coding is easy, but programming is hard—at least, if you’re doing it right. That’s because good programmers are not just trying to solve a specific problem: they’re trying to build an abstraction that solves a general class of problems (including this one).
Counting lines
Creating abstractions—that is, making your solution more general—is the whole art of software design. And that’s why it’s a bit of a challenge at first. When people ask questions like “How should I structure my Rust project?” or “When should I split up my program into multiple modules or crates?”, this is what they’re really asking: what abstractions should I create, and how should they work?
In my new book The Secrets of Rust: Tools, we’ll focus on using good abstraction design to build user-friendly APIs and command-line tools in Rust. I’m a simple man who likes simple programs, and that’s the kind we’ll be writing together.
So in this excerpt from the book, let’s practise our abstraction
skills a little by designing a small but useful command-line tool in
Rust: one that can count lines in its input. You might have used the
Unix program wc
, for example, which has a line-counting
mode:
echo hello | wc -l
1
A prototype line counter
Could we build something in Rust which does more or less the same thing? Let’s try.
cargo new count
We’ll start by putting everything in the src/main.rs
file. Even though we’re planning to build an importable library crate,
we can’t do that before we even know what it should do. So let’s get
something up and running first, and then think about how to turn it into
a reusable abstraction.
Here’s a rough prototype that could work:
use std::io::{stdin, BufRead};
fn main() {
let input = stdin().lock();
let lines = input.lines().count();
println!("{lines}");
}
The first cut
Let’s break it down. First, we call stdin()
to get a
handle to read the standard input. Then we call lock()
on
the result, to get an exclusive lock on the Stdin
object.
This isn’t strictly necessary here, but it’s good practice. In principle, a program could have multiple threads of execution that want to read from standard input, and these concurrent reads might conflict. Obtaining this mutex (“mutually exclusive”) lock ensures that no other thread can read from the input, as long as we hold the lock.
The StdinLock
object we get also conveniently implements
the BufRead
trait, which gives us a buffered
reader on the input. It’s not so much that we need to buffer the input
in this case, though that usually helps performance. It’s more that the
BufRead
trait also gives us the lines()
method, which will iterate over the input data one line at a time.
And that’s what we do: we call lines()
to get that
iterator, and then we use the standard count()
method to
count the number of items the iterator produces. That’s the answer we
want, so we print it out.
As it happens, this is a relatively inefficient way to count lines.
The lines()
iterator gives us each line in the
file as a string, which means allocating memory, but we simply throw the
string away. We can probably come up with better ways to do this if we
think really hard, but at the moment that’s not the point.
What do users want?
The point is that lines().count()
is a very
straightforward and intuitive way to count lines, and we’re not really
interested in the implementation so much as the API. What would be a
nice API for a library crate that provided line counting? And would it
also be convenient for testing?
We’ll come to that, but first let’s see if it gives us the same
answer as wc
, for a quick sanity check:
echo hello | cargo run
1
Good start. Of course, the program might have a bug that causes it to always answer “1”, so this isn’t a very good test. I have a text file here that I prepared earlier, and that I happen to know contains 8,582 lines. So let’s see if the program agrees:
cat bigfile.txt | cargo run
8582
It doesn’t look as though we’ve got things too badly wrong so far. Of course, we don’t have a library yet, so that’s the next job.
The “magic function” approach
We’d like to turn this code into a reusable library package that other people can import and use in their own programs: in Rust, this is called a crate. We’re going to move all the “behaviour” code—all the code that does stuff, as opposed to just presenting the results—into a library function.
And straight away we have a challenge, because there are lots of different ways we could write the API of that function: from its name to its parameters and what it returns. How should we approach this design problem?
We could try to guess what might be convenient for users, and we
might be right—but most likely we won’t. It’s too easy to get
sidetracked by what’s convenient for us as implementers. So to
avoid that, let’s rewrite our main
function to call this
hypothetical line-counting function, as though it already existed.
I call this the “magic function” approach: pretend you have some wonderful, magical function that just does whatever it is you need doing. And then call it!
You’ll see from the way you want to call it what the API of the function has to be. All you need to do then is write it: you’ve already done the necessary design thinking.
So, if we had a suitable magic count_lines
function,
what would we like to write in order to call it? Something like this,
perhaps:
use std::io::stdin;
use count::count_lines;
fn main() {
let lines = count_lines(stdin().lock());
println!("{lines}");
}
What do you think? It’s hard to imagine how we could write less here and still be able to do what we want. That’s what I want from a library crate: an API where I have to do the least possible paperwork in order to use it.
Now let’s take a look at the lib.rs
we need to implement
to make this work.
Testing count_lines
First of all, let’s write a test, because that puts even stronger
constraints on how count_lines
must be implemented. It will
have to satisfy both the real user (in this case main
),
and the test (which will be passing in some kind of reader
that’s not standard input).
So, what does the test need to do? Well, call
count_lines
, necessarily, and check the result. So what can
we pass to count_lines
as the “fake” input
here?
We need something that implements BufRead
, because we’re
going to call its lines
method. But we’d like to be able to
construct it from a string so that we can control how many lines it
contains (let’s say 2).
There’s a nice type called std::io::Cursor
that’s
perfect for this. It takes anything that looks like a byte array (a
string is fine) and turns it into a reader (that is, something that
implements Read
). Handily for our purposes,
Cursor
is also BufRead
, which is exactly what
count_line
is expecting.
Here we go, then:
use std::io::Cursor;
#[test]
fn count_lines_fn_counts_lines_in_input() {
let input = Cursor::new("line 1\nline 2\n");
let lines = count_lines(input);
assert_eq!(lines, 2, "wrong line count");
}
Seems reasonable, doesn’t it? We create a Cursor
containing two lines of text, and pass it to count_lines
.
We assert that the answer is 2, and fail the test otherwise.
In the next post, we’ll verify this test by deliberately writing a
terrible implementation of count_lines
that
doesn’t work at all. See you then!