Iterators in Go

Iterators in Go

The next iteration of Go looks interesting

This article describes an experimental extension to the Go language: range over func. The design of Go iterators is evolving, and this tutorial is a work in progress, so don’t forget to check back later for the latest developments.

A Golang iterator is a function that “yields” one result at a time, instead of computing a whole set of results and returning them all at once. When you write a for loop where the range expression is an iterator, the loop will be executed once for each value returned by the iterator.

That sounds complicated, but it’s actually very simple. It’s like a short-order cook who makes a dish for each customer on demand, as they come in, rather than cooking a whole day’s worth of meals in advance.

Let’s break it down, then. First of all, what’s wrong with just returning a slice?

Why are iterators useful?

As you know, you can use range to loop over a bunch of different kinds of thing in Go: slices, maps, strings, channels. The most common case is probably looping over a slice.

This slice could be a local variable, or it could be the result of calling some function that returns a slice. For example:

for _, v := range Items() {
  ... // do something with v
}

We could write the Items function something like this, traditionally, to return a slice of some hypothetical type Item:

func Items() (items []Item) {
    ... // generate items
    return items
}

But there are some disadvantages to this way of doing it. First, we have to wait until the whole slice has been generated before we can start to process the first element of it.

Second, we have to allocate memory for the whole slice even though we’re only using one element at a time. Third, if we don’t end up using all the elements, then we’ve wasted time and memory computing stuff we didn’t use. How annoying!

So what if there was something like a slice, but it produced just one element at a time, as needed? Let’s call this handy little gadget an iterator, as in “thing you can iterate over”.

In other words, a for loop over such an iterator would execute once for each result that the iterator produces—each Item in our example.

What is an iterator, then? How does it work under the hood?

The signature of an iterator function

An iterator is just a function, but with a particular signature. For example, an iterator of Item would be:

func (yield func(Item) bool)

As you can see, an iterator takes a function as a parameter, conventionally called yield. It’s the function that the iterator will call to yield each successive value to the range loop.

Once the iterator has yielded all the items available, it returns, signalling “iteration complete”.

For example, here’s how we could write an iterator of Item:

func iterateItems(yield func(Item) bool) {
  items := []Item{1, 2, 3}
  for _, v := range items {
    if !yield(v) {
        return
    }
  }
}

Every time you want to produce a result, you pass it to yield, and control goes back to the loop statement, which would look like this:

for v := range iterateItems {
  ...
}

Assuming the loop keeps executing, yield will keep returning true, meaning “more values please”. But if the loop ever terminates (maybe with break or return, for example), then yield will return false. That gives Items a chance to do any necessary cleanup.

Notice that if yield does return false (“stop iteration, the loop is done”), the iterator must detect this and bail out, rather than waste time computing more results that won’t be needed. If it tries to yield again, the runtime will panic:

range function continued iteration after exit

Returning an iterator

In practice, the most useful way to create iterators is to return them from a function call. For example, suppose we rewrote Items so that, instead of returning a slice of Item, it returned an iterator of Item. How would we declare such a result type in our function signature?

We already know that the signature of an Item iterator is:

func (yield func(Item) bool)

But it would be annoying to write this out in full as the result type of a function like Items, so let’s imagine we have a standard library generic type iter.Seq[V any] to reduce boilerplate.

Now it’s easy to describe the result of Items, as simply iter.Seq[Item]:

func Items() iter.Seq[Item] {
  return func(yield func(Item) bool) {
    items := []Item{1, 2, 3}
    for _, v := range items {
      if !yield(v) {
          return
      }
    }
  }
}

Notice that Items is not an iterator itself. It doesn’t have the required signature. Instead, it returns an iterator.

Our loop statement now becomes:

for v := range Items() {
  ...
}

In other words, what we’re ranging over is not some function, like iterateItems, but the result of calling a function: Items(). And that result is an iterator; specifically, an iter.Seq[Item].

Two-result iterators

Notice that we only receive one value in our example for ... range statement, not two as in the “loop over slice” version. That value, straightforwardly enough, is the item itself: we don’t need the index here.

But what if we do want the index? For example, we might want to print a numbered list of items. In this case, we can write exactly the same two-valued for ... range statement that we would with a slice:

for i, v := range Items() {
  fmt.Println(i, v)
}

What do we need to change about the iterator to make this work? Well, since it will now be yielding two values instead of one, the signature of its yield function becomes:

func (int, Item) bool

Makes sense, doesn’t it? Each time round the loop, this yield function will produce another index-Item pair.

This is a different kind of iterator signature, so let’s call it an iter.Seq2[int, Item], because it yields two values. Here’s how we could write that:

func Items() iter.Seq2[int, Item] {
  return func(yield func(int, Item) bool) {
    items := []Item{1, 2, 3}
    for i, v := range items {
      if !yield(i, v) {
          return
      }
    }
  }
}

We don’t have to receive that index when ranging over Items. If we didn’t need it for a particular loop, we could ignore it with the blank identifier _, just as we do when ranging over a slice.

And in loops where we don’t care about either the index or the value, we don’t have to receive anything at all. We can write just:

for range Items() {
  ...
}

What about errors?

The iterators we’ve talked about so far are what we might call infallible: in other words, they’ll always produce as many results as they’re asked for, or as many as there are available.

A good example of this is something like iterating over a collection type. For example, suppose we had a Set[V any] type (wouldn’t that be nice?). There might be an All method that returns an iterator over the elements of the set, and that’s straightforward, as we’ve already seen:

func (s *Set[V]) All() iter.Seq[V]

But what about when the iterator has to do something to generate each item, and that something could fail? For example, reading something from an io.Reader or a bufio.Scanner. We know this could produce an error, so what should we do with it?

Well, the iterator could just return in this case, since it has no more items to yield, but then we’ve lost the error. The calling loop has no way to know that the iterator ran into a problem and had to stop early.

Instead, let’s take advantage of the fact that we can yield two results (if we’re an iter.Seq2, that is). With a slice, these two things are usually the index and the value, but here they need not be. We could yield the value and an error instead, for example:

func Lines(file string) iter.Seq2[string, error]

When we successfully read a line, we can return it along with a nil error, just like the plain old Go functions we love so well. On the other hand, when there’s an error, we can use that error result to signal what it was.

That does mean that the calling loop will need to check the iterator’s error, but this seems reasonable. It’s no different to calling a regular function that returns a slice and error, but this way, we get the benefits of an iterator.

When yield returns false

We’ve seen that iterators need to check the result of calling their yield function, because that’s how they know when to stop. Why does that matter? I mean, when the calling loop is done, why does yield need to return a false result? Why does it return a result at all, indeed?

The answer is that maybe the iterator uses some resources that need to be cleaned up. For example, it might be holding on to a database connection that’s the source of the items.

In this case, we can treat a false result from yield as a signal: “Clean up, you’re done”.

Composing iterators

We’ve talked about functions that return iterators, but what about functions that take iterators? For example, we could write something like:

func PrintAll[V any](seq iter.Seq[V]) {
    for v := range seq {
        fmt.Println(v)
    }
}

Notice that we don’t even care what seq is an iterator of. It doesn’t matter. We can range over it regardless.

And if we can pass iterators to functions that return other iterators, then we can compose iterators:

func PrintPrimes() {
  for p := range Primes(Integers()) {
    fmt.Println(p)
  }
}

func Primes(seq iter.Seq[int]) iter.Seq[int] {
  return func(yield func(int) bool) {
    for n := range seq {
      if isPrime(n) {
        if !yield(n) {
          return
        }
      }
    }
  }
}

func Integers() iter.Seq[int] {
  return func(yield func(int) bool) {
    for i := 0; i <= math.MaxInt; i++ {
      if !yield(i) {
        return
      }
    }
  }
}

Okay, it’s a contrived example, but you get the point.

Does this affect the standard library?

For sure. It makes sense that the existing slices and maps packages should be able to return iterators, since they’re about collection types.

Those trusty brothers bytes and strings stand ready to take full advantage of iterators too. And it seems reasonable that regexp should do the same.

We’ve seen that defining shorthand types such as Seq and Seq2 would be useful, so the iter package is for them, together with any other iterator-related bits and pieces.

For doing things with iterators, other than just looping over them, there’s the x/exp/iter package, which, while not part of the standard library, is definitely stdlib-adjacent. It’s where new things can be played with, beaten in and out of shape, and refined until they’re considered grown-up enough to join the family of packages that ship as part of Go itself.

Why not channels?

A completely understandable reaction to a proposal like this is to say “But I can already do this with a channel. Why do we need iterators as well?”

It’s true: Go has goroutines (this is known), and they can communicate over channels. So we could get something very like the behaviour we’ve outlined simply by calling a function that returns a channel of its results instead of a slice.

Looping over this channel with range behaves exactly the way you’d expect, which is to say just the same as with an iterator.

But there are at least two things about this solution that aren’t ideal. One is that your program is now concurrent, which means it’s probably incorrect. (Prove me wrong, Gophers. Prove me wrong.)

Concurrent programming is just hard, and there are many easy-to-make mistakes that can cause a concurrent program to deadlock, crash, and so on. It’s not a tool we deploy lightly, and it wouldn’t be worth the extra complexity just to iterate over a slice.

The second problem is about what happens if the loop exits early. If you stop listening to a channel, whoever’s trying to send to it will just block (in the case of a buffered channel, when the buffer fills up).

In other words, the sending goroutine will leak: it just hangs around, taking up memory, and it has no way to ever exit. In a long-running program, leaks are something to avoid.

Sure, we do have ways to deal with that, like passing contexts that can be cancelled, and so on. And channels are very useful in all sorts of other situations. I’m not saying the problems with using channels in this way are insoluble, just that they’re annoying, and iterators do this particular job in a much more user-friendly way.

In summary, then, iterators are neat.

Enabling the experimental support for iterators

To turn on the “rangefunc” experiment so you can play with iterators in your own programs, use the GOEXPERIMENT environment variable, like this:

GOEXPERIMENT=rangefunc go build ...

This enables you to import the new iter package we talked about, and write for loops where the target of the range clause is an iterator.

You can read more about the experimental features, and how to use them, on the Go wiki:

Read more

Master of my domain

Master of my domain

Cryptography in Go: AES explained

Cryptography in Go: AES explained

0