Iterators in Go
The latest iteration of Go looks interesting
A Golang iterator is a function that “yields” one result at a time,
instead of computing a whole set of results and returning them all at
once. When you write a for
loop where the
range
expression is an iterator, the loop will be executed
once for each value returned by the iterator.
That sounds complicated, but it’s actually very simple. It’s like a short-order cook who makes a dish for each customer on demand, as they come in, rather than cooking a whole day’s worth of meals in advance.
In my book Know Go, I explain all about iterators in Go, how to use them, and why you might want to. This is a key feature of modern Go, so if you want to understand Go, you’ll need to know about iterators.
Let’s break it down, then. First of all, what’s wrong with just returning a slice?
Why are iterators useful?
As you know, you can use range
to loop over a bunch of
different kinds of thing in Go: slices, maps, strings, channels. The
most common case is probably looping over a slice.
This slice could be a local variable, or it could be the result of calling some function that returns a slice. For example:
for _, v := range Items() {
.Println("item", v)
fmt}
We could write the Items
function something like
this:
type Item int
func Items() (items []Item) {
... // generate items
return items
}
But there are some disadvantages to this way of doing it. First, we have to wait until the whole slice has been generated before we can start to process the first element of it.
Second, we have to allocate memory for the whole slice even though we’re only using one element at a time. Third, if we don’t end up using all the elements, then we’ve wasted time and memory computing stuff we didn’t use. How annoying!
So what if there was something like a slice, but it produced just one element at a time, as needed? Let’s call this handy little gadget an iterator, as in “thing you can iterate over”.
In other words, a for
loop over such an iterator would
execute once for each result that the iterator produces—each
Item
in our example.
What is an iterator, then? How does it work under the hood?
The signature of an iterator function
An iterator is just a function, but with a particular signature. For
example, an iterator of Item
would be:
func (yield func(Item) bool)
As you can see, an iterator takes a function as a parameter,
conventionally called yield
. It’s the function that the
iterator will call to yield each successive value to the
range
loop.
Once the iterator has yielded all the items available, it returns, signalling “iteration complete”.
For example, here’s how we could write an iterator of
Item
:
func iterateItems(yield func(Item) bool) {
:= []Item{1, 2, 3}
items for _, v := range items {
if !yield(v) {
return
}
}
}
Every time you want to produce a result, you pass it to
yield
, and control goes back to the loop statement, which
would look like this:
for v := range iterateItems {
...
}
Assuming the loop keeps executing, yield
will keep
returning true
, meaning “more values please”. But if the
loop ever terminates (maybe with break
or
return
, for example), then yield
will return
false
. That gives Items
a chance to do any
necessary cleanup.
Notice that if yield
does return false
(“stop iteration, the loop is done”), the iterator must detect this and
bail out, rather than waste time computing more results that won’t be
needed. If it tries to yield again, the runtime will panic:
range function continued iteration after exit
Using iterators in programs
A function like iterateItems
is simple enough for us to
get the idea, but in real Go programs, we probably won’t range directly
over a function like this. Instead, the most useful way to create
iterators is to return them from a function call.
Single-value iterators:
iter.Seq
For example, suppose we rewrote Items
so that, instead
of returning a slice of Item
, it returned an
iterator of Item
. How would we declare such a
result type in our function signature?
We already know that the signature of an Item
iterator
is:
func (yield func(Item) bool)
But it would be annoying to write this out in full as the result type
of a function like Items
, so instead we can use the
standard library iter
package, which defines the type
Seq
for us.
Now it’s easy to describe the result of Items
, as simply
iter.Seq[Item]
:
import "iter"
func Items() iter.Seq[Item] {
return func(yield func(Item) bool) {
:= []Item{1, 2, 3}
items for _, v := range items {
if !yield(v) {
return
}
}
}
}
Notice that Items
is not an iterator itself. It
doesn’t have the required signature. Instead, it returns an
iterator.
Since we only receive only one value in our for .. range
statement, our loop now looks like this:
for v := range Items() {
.Println("item", v)
fmt}
In other words, what we’re ranging over is not some function, like
iterateItems
, but the result of calling a
function: Items()
. And that result is an iterator;
specifically, an iter.Seq[Item]
.
Two-value iterators:
iter.Seq2
Notice that we only receive one value in our example
for ... range
statement, not two as in the “loop over
slice” version. That value, straightforwardly enough, is the item
itself: we don’t need the index here.
But what if we do want the index? For example, we might want
to print a numbered list of items. In this case, we can write exactly
the same two-valued for ... range
statement that we would
with a slice:
for i, v := range Items() {
.Println("item" i, "is",v)
fmt}
With our existing Items
function, this won’t work,
because it only yields one value at a time:
range over Items() (value of type iter.Seq[Item]) permits only one iteration variable
What do we need to change about the iterator to make this work? Well,
the signature of its yield
function changes, since it must
yield two items instead of one:
func (int, Item) bool
Makes sense, doesn’t it? Each time round the loop, this
yield
function will produce another index-Item
pair.
This is a different kind of iterator signature, so let’s call it an
iter.Seq2[int, Item]
, because it yields two values. Here’s
how we could write that:
func Items() iter.Seq2[int, Item] {
return func(yield func(int, Item) bool) {
:= []Item{1, 2, 3}
items for i, v := range items {
if !yield(i, v) {
return
}
}
}
}
Let’s try our loop again to see if it works now:
for i, v := range Items() {
.Println("item" i, "is",v)
fmt}
item 0 is 1
item 1 is 2
item 2 is 3
Neat! By the way, we don’t have to receive that index when
ranging over Items
, even though it’s a Seq2
.
If we didn’t need the index in a particular loop, we could ignore it
with the blank identifier _
, just as we do when ranging
over a slice.
And in loops where we don’t care about either the index or the value, we don’t have to receive anything at all. We can write just:
for range Items() {
.Println("this prints once for each item")
fmt}
Dealing with errors
The iterators we’ve talked about so far are what we might call infallible: in other words, they’ll always produce as many results as they’re asked for, or as many as there are available.
A good example of this is something like iterating over a collection
type. For example, suppose we had a Set[E comparable]
type,
of the kind we developed in a previous chapter. Instead of the
All
method returning a slice, it would be nice if we could
return an iterator:
func (s *Set[E]) All() iter.Seq[E]
But what about when the iterator has to do something to
generate each item, and that something could fail? For example, reading
something from an io.Reader
or a
bufio.Scanner
. We know this could produce an error, so what
should we do with it?
Well, the iterator could just return in this case, since it has no more items to yield, but then we’ve lost the error. The calling loop has no way to know that the iterator ran into a problem and had to stop early.
Instead, let’s take advantage of the fact that we can yield
two results (if we’re an iter.Seq2
, that is). With
a slice, these two things are usually the index and the value, but here
they need not be. We could yield the value and an error
instead, for example:
func Lines(file string) iter.Seq2[string, error]
When we successfully read a line, we can return it along with a
nil
error, just like the plain old Go functions we love so
well. On the other hand, when there’s an error, we can use that
error
result to signal what it was.
That does mean that the calling loop will need to check the
iterator’s error, but this seems reasonable. It’s no different to
calling a regular function that returns a slice and error
,
but this way, we get the benefits of an iterator.
Cleaning up
We’ve seen that iterators need to check the result of calling their
yield
function, because that’s how they know when to stop.
Why does that matter? I mean, when the calling loop is done, why does
yield
need to return a false
result? Why does
it return a result at all, indeed?
The answer is that maybe the iterator uses some resources that need to be cleaned up. For example, it might be holding on to a database connection that’s the source of the items.
In this case, we can treat a false
result from
yield
as a signal: “Clean up, you’re done”.
Embracing iterators
Now that we know something about how iterators work, let’s talk about what it means for us as programmers. How does this new feature change the way we write programs, and design APIs? Here are a few ideas.
Composing iterators
We’ve talked about functions that return iterators, but what about functions that take iterators, too? For example, we could write something like:
func PrintAll[V any](seq iter.Seq[V]) {
for v := range seq {
.Println(v)
fmt}
}
Notice that we don’t even care what seq
is an iterator
of. It doesn’t matter. We can range
over it
regardless.
And if we can pass iterators to functions that return other iterators, then we can compose iterators:
func PrintPrimes() {
for p := range Primes(Integers()) {
.Println(p)
fmt}
}
func Primes(seq iter.Seq[int]) iter.Seq[int] {
return func(yield func(int) bool) {
for n := range seq {
if isPrime(n) {
if !yield(n) {
return
}
}
}
}
}
func Integers() iter.Seq[int] {
return func(yield func(int) bool) {
for i := range math.MaxInt {
if !yield(i) {
return
}
}
}
}
Okay, it’s a contrived example, but you get the point.
When iterators beat channels
Even before the advent of iterators, we could get a somewhat similar behaviour by calling a function that returns a channel of its results instead of a slice.
Looping over this channel with range
behaves exactly the
way you’d expect, which is to say just the same as with an iterator. But
there are at least two things about this solution that aren’t ideal. One
is that your program is now concurrent, which means it’s
probably incorrect. (Prove me wrong, Gophers. Prove me wrong.)
Concurrent programming is just hard, and there are many easy-to-make mistakes that can cause a concurrent program to deadlock, crash, and so on. It’s not a tool we deploy lightly, and it wouldn’t be worth the extra complexity just to iterate over a slice.
The second problem is about what happens if the loop exits early. If you stop listening to a channel, whoever’s trying to send to it will just block (in the case of a buffered channel, when the buffer fills up). In other words, the sending goroutine will leak: it just hangs around, taking up memory, and it has no way to ever exit. In a long-running program, leaks are something to avoid.
Sure, we do have ways to deal with that, like passing contexts that can be cancelled, and so on. And channels are very useful in all sorts of other situations. I’m not saying the problems with using channels in this way are insoluble, just that they’re annoying, and iterators do this particular job in a much more user-friendly way.
On the other hand, if your program is concurrent, using channels rather than iterators probably makes sense in most contexts.
Standard library changes
Now that iterators are part of Go, the standard library has acquired
some new APIs in order to take advantage of the new feature. It makes
sense that the existing slices
and maps
packages should be able to return iterators, since they’re about
collection types, and indeed that’s the case.
Slices
The new slices.All
and slices.Values
functions return iterators of index-value pairs, and values,
respectively:
:= []int{1, 2, 3}
s for i, v := range slices.All(s) {
.Println("item", i, "is", v)
fmt}
// Output:
// item 0 is 1
// item 1 is 2
// item 2 is 3
for v := range slices.Values(s) {
.Println(v)
fmt}
// Output:
// 1
// 2
// 3
Since we often want to deal with the results of an iterator
as a slice, the new slices.Collect
function
provides an easy way to do that:
:= slices.Collect(slices.Values([]int{1, 2, 3}))
s .Println(s)
fmt// Output:
// [1 2 3]
Maps
Similarly, we now have maps.All
, maps.Keys
,
and maps.Values
for iterating over maps:
:= map[string]float64{
m "ε": 8.854187817620389e-12,
"π": math.Pi,
"e": math.E,
"ħ": 1.054571817e-34,
}
for k, v := range maps.All(m) {
.Println("constant", k, "is", v)
fmt}
// Unordered output:
// constant e is 2.718281828459045
// constant ħ is 1.054571817e-34
// constant ε is 8.854187817620389e-12
// constant π is 3.141592653589793
for k := range maps.Keys(m) {
.Println(k)
fmt}
// Unordered output:
// π
// e
// ħ
// ε
for v := range maps.Values(m) {
.Println(v)
fmt}
// Unordered output:
// 3.141592653589793
// 2.718281828459045
// 1.054571817e-34
// 8.854187817620389e-12