Iterators in Go
The next iteration of Go looks interesting
This article describes an experimental extension to the Go language: range over func. The design of Go iterators is evolving, and this tutorial is a work in progress, so don’t forget to check back later for the latest developments.
A Golang iterator is a function that “yields” one result at a time, instead of computing a whole set of results and returning them all at once. When you write a for
loop where the range
expression is an iterator, the loop will be executed once for each value returned by the iterator.
That sounds complicated, but it’s actually very simple. It’s like a short-order cook who makes a dish for each customer on demand, as they come in, rather than cooking a whole day’s worth of meals in advance.
Let’s break it down, then. First of all, what’s wrong with just returning a slice?
Why are iterators useful?
As you know, you can use range
to loop over a bunch of different kinds of thing in Go: slices, maps, strings, channels. The most common case is probably looping over a slice.
This slice could be a local variable, or it could be the result of calling some function that returns a slice. For example:
for _, v := range Items() {
... // do something with v
}
We could write the Items
function something like this, traditionally, to return a slice of some hypothetical type Item
:
func Items() (items []Item) {
... // generate items
return items
}
But there are some disadvantages to this way of doing it. First, we have to wait until the whole slice has been generated before we can start to process the first element of it.
Second, we have to allocate memory for the whole slice even though we’re only using one element at a time. Third, if we don’t end up using all the elements, then we’ve wasted time and memory computing stuff we didn’t use. How annoying!
So what if there was something like a slice, but it produced just one element at a time, as needed? Let’s call this handy little gadget an iterator, as in “thing you can iterate over”.
In other words, a for
loop over such an iterator would execute once for each result that the iterator produces—each Item
in our example.
What is an iterator, then? How does it work under the hood?
The signature of an iterator function
An iterator is just a function, but with a particular signature. For example, an iterator of Item
would be:
func (yield func(Item) bool)
As you can see, an iterator takes a function as a parameter, conventionally called yield
. It’s the function that the iterator will call to yield each successive value to the range
loop.
Once the iterator has yielded all the items available, it returns, signalling “iteration complete”.
For example, here’s how we could write an iterator of Item
:
func iterateItems(yield func(Item) bool) {
:= []Item{1, 2, 3}
items for _, v := range items {
if !yield(v) {
return
}
}
}
Every time you want to produce a result, you pass it to yield
, and control goes back to the loop statement, which would look like this:
for v := range iterateItems {
...
}
Assuming the loop keeps executing, yield
will keep returning true
, meaning “more values please”. But if the loop ever terminates (maybe with break
or return
, for example), then yield
will return false
. That gives Items
a chance to do any necessary cleanup.
Notice that if yield
does return false
(“stop iteration, the loop is done”), the iterator must detect this and bail out, rather than waste time computing more results that won’t be needed. If it tries to yield again, the runtime will panic:
range function continued iteration after exit
Returning an iterator
In practice, the most useful way to create iterators is to return them from a function call. For example, suppose we rewrote Items
so that, instead of returning a slice of Item
, it returned an iterator of Item
. How would we declare such a result type in our function signature?
We already know that the signature of an Item
iterator is:
func (yield func(Item) bool)
But it would be annoying to write this out in full as the result type of a function like Items
, so let’s imagine we have a standard library generic type iter.Seq[V any]
to reduce boilerplate.
Now it’s easy to describe the result of Items
, as simply iter.Seq[Item]
:
func Items() iter.Seq[Item] {
return func(yield func(Item) bool) {
:= []Item{1, 2, 3}
items for _, v := range items {
if !yield(v) {
return
}
}
}
}
Notice that Items
is not an iterator itself. It doesn’t have the required signature. Instead, it returns an iterator.
Our loop statement now becomes:
for v := range Items() {
...
}
In other words, what we’re ranging over is not some function, like iterateItems
, but the result of calling a function: Items()
. And that result is an iterator; specifically, an iter.Seq[Item]
.
Two-result iterators
Notice that we only receive one value in our example for ... range
statement, not two as in the “loop over slice” version. That value, straightforwardly enough, is the item itself: we don’t need the index here.
But what if we do want the index? For example, we might want to print a numbered list of items. In this case, we can write exactly the same two-valued for ... range
statement that we would with a slice:
for i, v := range Items() {
.Println(i, v)
fmt}
What do we need to change about the iterator to make this work? Well, since it will now be yielding two values instead of one, the signature of its yield
function becomes:
func (int, Item) bool
Makes sense, doesn’t it? Each time round the loop, this yield
function will produce another index-Item
pair.
This is a different kind of iterator signature, so let’s call it an iter.Seq2[int, Item]
, because it yields two values. Here’s how we could write that:
func Items() iter.Seq2[int, Item] {
return func(yield func(int, Item) bool) {
:= []Item{1, 2, 3}
items for i, v := range items {
if !yield(i, v) {
return
}
}
}
}
We don’t have to receive that index when ranging over Items
. If we didn’t need it for a particular loop, we could ignore it with the blank identifier _
, just as we do when ranging over a slice.
And in loops where we don’t care about either the index or the value, we don’t have to receive anything at all. We can write just:
for range Items() {
...
}
What about errors?
The iterators we’ve talked about so far are what we might call infallible: in other words, they’ll always produce as many results as they’re asked for, or as many as there are available.
A good example of this is something like iterating over a collection type. For example, suppose we had a Set[V any]
type (wouldn’t that be nice?). There might be an All
method that returns an iterator over the elements of the set, and that’s straightforward, as we’ve already seen:
func (s *Set[V]) All() iter.Seq[V]
But what about when the iterator has to do something to generate each item, and that something could fail? For example, reading something from an io.Reader
or a bufio.Scanner
. We know this could produce an error, so what should we do with it?
Well, the iterator could just return in this case, since it has no more items to yield, but then we’ve lost the error. The calling loop has no way to know that the iterator ran into a problem and had to stop early.
Instead, let’s take advantage of the fact that we can yield two results (if we’re an iter.Seq2
, that is). With a slice, these two things are usually the index and the value, but here they need not be. We could yield the value and an error instead, for example:
func Lines(file string) iter.Seq2[string, error]
When we successfully read a line, we can return it along with a nil
error, just like the plain old Go functions we love so well. On the other hand, when there’s an error, we can use that error
result to signal what it was.
That does mean that the calling loop will need to check the iterator’s error, but this seems reasonable. It’s no different to calling a regular function that returns a slice and error
, but this way, we get the benefits of an iterator.
When yield
returns false
We’ve seen that iterators need to check the result of calling their yield
function, because that’s how they know when to stop. Why does that matter? I mean, when the calling loop is done, why does yield
need to return a false
result? Why does it return a result at all, indeed?
The answer is that maybe the iterator uses some resources that need to be cleaned up. For example, it might be holding on to a database connection that’s the source of the items.
In this case, we can treat a false
result from yield
as a signal: “Clean up, you’re done”.
Composing iterators
We’ve talked about functions that return iterators, but what about functions that take iterators? For example, we could write something like:
func PrintAll[V any](seq iter.Seq[V]) {
for v := range seq {
.Println(v)
fmt}
}
Notice that we don’t even care what seq
is an iterator of. It doesn’t matter. We can range
over it regardless.
And if we can pass iterators to functions that return other iterators, then we can compose iterators:
func PrintPrimes() {
for p := range Primes(Integers()) {
.Println(p)
fmt}
}
func Primes(seq iter.Seq[int]) iter.Seq[int] {
return func(yield func(int) bool) {
for n := range seq {
if isPrime(n) {
if !yield(n) {
return
}
}
}
}
}
func Integers() iter.Seq[int] {
return func(yield func(int) bool) {
for i := 0; i <= math.MaxInt; i++ {
if !yield(i) {
return
}
}
}
}
Okay, it’s a contrived example, but you get the point.
Does this affect the standard library?
For sure. It makes sense that the existing slices
and maps
packages should be able to return iterators, since they’re about collection types.
Those trusty brothers bytes
and strings
stand ready to take full advantage of iterators too. And it seems reasonable that regexp
should do the same.
We’ve seen that defining shorthand types such as Seq
and Seq2
would be useful, so the iter
package is for them, together with any other iterator-related bits and pieces.
For doing things with iterators, other than just looping over them, there’s the x/exp/iter
package, which, while not part of the standard library, is definitely stdlib-adjacent. It’s where new things can be played with, beaten in and out of shape, and refined until they’re considered grown-up enough to join the family of packages that ship as part of Go itself.
Why not channels?
A completely understandable reaction to a proposal like this is to say “But I can already do this with a channel. Why do we need iterators as well?”
It’s true: Go has goroutines (this is known), and they can communicate over channels. So we could get something very like the behaviour we’ve outlined simply by calling a function that returns a channel of its results instead of a slice.
Looping over this channel with range
behaves exactly the way you’d expect, which is to say just the same as with an iterator.
But there are at least two things about this solution that aren’t ideal. One is that your program is now concurrent, which means it’s probably incorrect. (Prove me wrong, Gophers. Prove me wrong.)
Concurrent programming is just hard, and there are many easy-to-make mistakes that can cause a concurrent program to deadlock, crash, and so on. It’s not a tool we deploy lightly, and it wouldn’t be worth the extra complexity just to iterate over a slice.
The second problem is about what happens if the loop exits early. If you stop listening to a channel, whoever’s trying to send to it will just block (in the case of a buffered channel, when the buffer fills up).
In other words, the sending goroutine will leak: it just hangs around, taking up memory, and it has no way to ever exit. In a long-running program, leaks are something to avoid.
Sure, we do have ways to deal with that, like passing contexts that can be cancelled, and so on. And channels are very useful in all sorts of other situations. I’m not saying the problems with using channels in this way are insoluble, just that they’re annoying, and iterators do this particular job in a much more user-friendly way.
In summary, then, iterators are neat.
Enabling the experimental support for iterators
To turn on the “rangefunc” experiment so you can play with iterators in your own programs, use the GOEXPERIMENT
environment variable, like this:
GOEXPERIMENT=rangefunc go build ...
This enables you to import the new iter
package we talked about, and write for
loops where the target of the range
clause is an iterator.
You can read more about the experimental features, and how to use them, on the Go wiki: