Jul 25

Jul 25 Iterators in Go

The latest iteration of Go looks interesting

A Golang iterator is a function that “yields” one result at a time, instead of computing a whole set of results and returning them all at once. When you write a for loop where the range expression is an iterator, the loop will be executed once for each value returned by the iterator.

That sounds complicated, but it’s actually very simple. It’s like a short-order cook who makes a dish for each customer on demand, as they come in, rather than cooking a whole day’s worth of meals in advance.

In my book Know Go, I explain all about iterators in Go, how to use them, and why you might want to. This is a key feature of modern Go, so if you want to understand Go, you’ll need to know about iterators.

Let’s break it down, then. First of all, what’s wrong with just returning a slice?

Why are iterators useful?

As you know, you can use range to loop over a bunch of different kinds of thing in Go: slices, maps, strings, channels. The most common case is probably looping over a slice.

This slice could be a local variable, or it could be the result of calling some function that returns a slice. For example:

for _, v := range Items() {
  fmt.Println("item", v)
}

We could write the Items function something like this:

type Item int

func Items() (items []Item) {
    ... // generate items
    return items
}

But there are some disadvantages to this way of doing it. First, we have to wait until the whole slice has been generated before we can start to process the first element of it.

Second, we have to allocate memory for the whole slice even though we’re only using one element at a time. Third, if we don’t end up using all the elements, then we’ve wasted time and memory computing stuff we didn’t use. How annoying!

So what if there was something like a slice, but it produced just one element at a time, as needed? Let’s call this handy little gadget an iterator, as in “thing you can iterate over”.

In other words, a for loop over such an iterator would execute once for each result that the iterator produces—each Item in our example.

What is an iterator, then? How does it work under the hood?

The signature of an iterator function

An iterator is just a function, but with a particular signature. For example, an iterator of Item would be:

func (yield func(Item) bool)

As you can see, an iterator takes a function as a parameter, conventionally called yield. It’s the function that the iterator will call to yield each successive value to the range loop.

Once the iterator has yielded all the items available, it returns, signalling “iteration complete”.

For example, here’s how we could write an iterator of Item:

func iterateItems(yield func(Item) bool) {
  items := []Item{1, 2, 3}
  for _, v := range items {
    if !yield(v) {
        return
    }
  }
}

Every time you want to produce a result, you pass it to yield, and control goes back to the loop statement, which would look like this:

for v := range iterateItems {
  ...
}

Assuming the loop keeps executing, yield will keep returning true, meaning “more values please”. But if the loop ever terminates (maybe with break or return, for example), then yield will return false. That gives Items a chance to do any necessary cleanup.

Notice that if yield does return false (“stop iteration, the loop is done”), the iterator must detect this and bail out, rather than waste time computing more results that won’t be needed. If it tries to yield again, the runtime will panic:

range function continued iteration after exit

Using iterators in programs

A function like iterateItems is simple enough for us to get the idea, but in real Go programs, we probably won’t range directly over a function like this. Instead, the most useful way to create iterators is to return them from a function call.

Single-value iterators: `iter.Seq`

For example, suppose we rewrote Items so that, instead of returning a slice of Item, it returned an iterator of Item. How would we declare such a result type in our function signature?

We already know that the signature of an Item iterator is:

func (yield func(Item) bool)

But it would be annoying to write this out in full as the result type of a function like Items, so instead we can use the standard library iter package, which defines the type Seq for us.

Now it’s easy to describe the result of Items, as simply iter.Seq[Item]:

import "iter"

func Items() iter.Seq[Item] {
  return func(yield func(Item) bool) {
    items := []Item{1, 2, 3}
    for _, v := range items {
      if !yield(v) {
          return
      }
    }
  }
}

Notice that Items is not an iterator itself. It doesn’t have the required signature. Instead, it returns an iterator.

Since we only receive only one value in our for .. range statement, our loop now looks like this:

for v := range Items() {
  fmt.Println("item", v)
}

In other words, what we’re ranging over is not some function, like iterateItems, but the result of calling a function: Items(). And that result is an iterator; specifically, an iter.Seq[Item].

Two-value iterators: `iter.Seq2`

Notice that we only receive one value in our example for ... range statement, not two as in the “loop over slice” version. That value, straightforwardly enough, is the item itself: we don’t need the index here.

But what if we do want the index? For example, we might want to print a numbered list of items. In this case, we can write exactly the same two-valued for ... range statement that we would with a slice:

for i, v := range Items() {
  fmt.Println("item" i, "is",v)
}

With our existing Items function, this won’t work, because it only yields one value at a time:

range over Items() (value of type iter.Seq[Item]) permits only one iteration variable

What do we need to change about the iterator to make this work? Well, the signature of its yield function changes, since it must yield two items instead of one:

func (int, Item) bool

Makes sense, doesn’t it? Each time round the loop, this yield function will produce another index-Item pair.

This is a different kind of iterator signature, so let’s call it an iter.Seq2[int, Item], because it yields two values. Here’s how we could write that:

func Items() iter.Seq2[int, Item] {
  return func(yield func(int, Item) bool) {
    items := []Item{1, 2, 3}
    for i, v := range items {
      if !yield(i, v) {
          return
      }
    }
  }
}

Let’s try our loop again to see if it works now:

for i, v := range Items() {
  fmt.Println("item" i, "is",v)
}

item 0 is 1
item 1 is 2
item 2 is 3

Neat! By the way, we don’t have to receive that index when ranging over Items, even though it’s a Seq2. If we didn’t need the index in a particular loop, we could ignore it with the blank identifier _, just as we do when ranging over a slice.

And in loops where we don’t care about either the index or the value, we don’t have to receive anything at all. We can write just:

for range Items() {
  fmt.Println("this prints once for each item")
}

Dealing with errors

The iterators we’ve talked about so far are what we might call infallible: in other words, they’ll always produce as many results as they’re asked for, or as many as there are available.

A good example of this is something like iterating over a collection type. For example, suppose we had a Set[E comparable] type, of the kind we developed in a previous chapter. Instead of the All method returning a slice, it would be nice if we could return an iterator:

func (s *Set[E]) All() iter.Seq[E]

But what about when the iterator has to do something to generate each item, and that something could fail? For example, reading something from an io.Reader or a bufio.Scanner. We know this could produce an error, so what should we do with it?

Well, the iterator could just return in this case, since it has no more items to yield, but then we’ve lost the error. The calling loop has no way to know that the iterator ran into a problem and had to stop early.

Instead, let’s take advantage of the fact that we can yield two results (if we’re an iter.Seq2, that is). With a slice, these two things are usually the index and the value, but here they need not be. We could yield the value and an error instead, for example:

func Lines(file string) iter.Seq2[string, error]

When we successfully read a line, we can return it along with a nil error, just like the plain old Go functions we love so well. On the other hand, when there’s an error, we can use that error result to signal what it was.

That does mean that the calling loop will need to check the iterator’s error, but this seems reasonable. It’s no different to calling a regular function that returns a slice and error, but this way, we get the benefits of an iterator.

Cleaning up

We’ve seen that iterators need to check the result of calling their yield function, because that’s how they know when to stop. Why does that matter? I mean, when the calling loop is done, why does yield need to return a false result? Why does it return a result at all, indeed?

The answer is that maybe the iterator uses some resources that need to be cleaned up. For example, it might be holding on to a database connection that’s the source of the items.

In this case, we can treat a false result from yield as a signal: “Clean up, you’re done”.

Embracing iterators

Now that we know something about how iterators work, let’s talk about what it means for us as programmers. How does this new feature change the way we write programs, and design APIs? Here are a few ideas.

Composing iterators

We’ve talked about functions that return iterators, but what about functions that take iterators, too? For example, we could write something like:

func PrintAll[V any](seq iter.Seq[V]) {
    for v := range seq {
        fmt.Println(v)
    }
}

Notice that we don’t even care what seq is an iterator of. It doesn’t matter. We can range over it regardless.

And if we can pass iterators to functions that return other iterators, then we can compose iterators:

func PrintPrimes() {
  for p := range Primes(Integers()) {
    fmt.Println(p)
  }
}

func Primes(seq iter.Seq[int]) iter.Seq[int] {
  return func(yield func(int) bool) {
    for n := range seq {
      if isPrime(n) {
        if !yield(n) {
          return
        }
      }
    }
  }
}

func Integers() iter.Seq[int] {
  return func(yield func(int) bool) {
    for i := range math.MaxInt {
      if !yield(i) {
        return
      }
    }
  }
}

Okay, it’s a contrived example, but you get the point.

When iterators beat channels

Even before the advent of iterators, we could get a somewhat similar behaviour by calling a function that returns a channel of its results instead of a slice.

Looping over this channel with range behaves exactly the way you’d expect, which is to say just the same as with an iterator. But there are at least two things about this solution that aren’t ideal. One is that your program is now concurrent, which means it’s probably incorrect. (Prove me wrong, Gophers. Prove me wrong.)

Concurrent programming is just hard, and there are many easy-to-make mistakes that can cause a concurrent program to deadlock, crash, and so on. It’s not a tool we deploy lightly, and it wouldn’t be worth the extra complexity just to iterate over a slice.

The second problem is about what happens if the loop exits early. If you stop listening to a channel, whoever’s trying to send to it will just block (in the case of a buffered channel, when the buffer fills up). In other words, the sending goroutine will leak: it just hangs around, taking up memory, and it has no way to ever exit. In a long-running program, leaks are something to avoid.

Sure, we do have ways to deal with that, like passing contexts that can be cancelled, and so on. And channels are very useful in all sorts of other situations. I’m not saying the problems with using channels in this way are insoluble, just that they’re annoying, and iterators do this particular job in a much more user-friendly way.

On the other hand, if your program is concurrent, using channels rather than iterators probably makes sense in most contexts.

Standard library changes

Now that iterators are part of Go, the standard library has acquired some new APIs in order to take advantage of the new feature. It makes sense that the existing slices and maps packages should be able to return iterators, since they’re about collection types, and indeed that’s the case.

Slices

The new slices.All and slices.Values functions return iterators of index-value pairs, and values, respectively:

s := []int{1, 2, 3}
for i, v := range slices.All(s) {
  fmt.Println("item", i, "is", v)
}
// Output:
// item 0 is 1
// item 1 is 2
// item 2 is 3
for v := range slices.Values(s) {
  fmt.Println(v)
}
// Output:
// 1
// 2
// 3

Since we often want to deal with the results of an iterator as a slice, the new slices.Collect function provides an easy way to do that:

s := slices.Collect(slices.Values([]int{1, 2, 3}))
fmt.Println(s)
// Output:
// [1 2 3]

Maps

Similarly, we now have maps.All, maps.Keys, and maps.Values for iterating over maps:

m := map[string]float64{
  "ε": 8.854187817620389e-12,
  "π":  math.Pi,
  "e":  math.E,
  "ħ":  1.054571817e-34,
}
for k, v := range maps.All(m) {
  fmt.Println("constant", k, "is", v)
}
// Unordered output:
// constant e is 2.718281828459045
// constant ħ is 1.054571817e-34
// constant ε is 8.854187817620389e-12
// constant π is 3.141592653589793
for k := range maps.Keys(m) {
  fmt.Println(k)
}
// Unordered output:
// π
// e
// ħ
// ε
for v := range maps.Values(m) {
  fmt.Println(v)
}
// Unordered output:
// 3.141592653589793
// 2.718281828459045
// 1.054571817e-34
// 8.854187817620389e-12