Go go goroutines
I try to take one day at a time, but sometimes several days attack me at once.
—Ashleigh Brilliant
Your brain is limited. I don’t mean any offence, you understand: your brain is a wonderful instrument, and it’s just as good as anyone else’s, or perhaps slightly better. But even the smartest brain can still only do one thing at once, despite our constant and ineffectual attempts to multi-task with it…
…sorry, I was scrolling on my phone and lost the thread. But what if you could do two things at once, or even more? What would that feel like? What if you had two brains, for example? How would you organise them to work on independent tasks? And how would you stop them interfering with each other and making you very confused?
Well, your computer does have two brains. Actually, mine has—checks notes—eighteen, though yours probably has more. And, even though I only have eighteen “processing units”, I can run a lot more than eighteen independent tasks: thousands, or even millions. So how does that work? And how would you ever write a program like that?
In this series we’ll explain the basics of concurrency, see what concurrent programming looks like, explore some of the common pitfalls and how to avoid them, and learn why concurrency is a key feature of Go. It’s not terribly complicated, but it’s probably a little unfamiliar to most of us, so let’s work up to it by easy stages.
Single-tasking
In the beginning, each computer could run one program at a time, and it ran from the beginning to the end and that was that. But the earliest computers were extremely expensive and, it turns out, they actually spent a lot of their valuable time idle, for one reason or another.
For example, a typical task might be something like “read a few million data records from punched cards or paper tape, compute some statistics about them, and then write a report to a printer”. And, since physical peripherals such as card readers and printers work very slowly compared to the computer’s central processing unit (CPU) itself, a large part of the running time of this program would be spent doing nothing but waiting for these input / output (I/O) operations to finish. The actual computation part would be over in a flash, relatively speaking.
So it made financial as well as practical sense to come up with a scheme to increase the CPU’s duty cycle: the proportion of the total running time it spends doing useful work, as opposed to hanging around. Ideally, when the CPU has nothing to do for a while on the current task, it should work on some other task instead.
So one great way to get more compute for your money is multi-tasking: running more than one task at once. How might this work, then?
Time-sharing
The simplest and most obvious multi-tasking scheme is time-sharing: maintain a queue of tasks, and give each task an equal chunk of time on the CPU. When its time is up, the task gets paused, and added to the back of the queue, whence it will eventually run again once all other tasks have had a go.
We can implement this idea with a pleasantly simple scheduler program. The scheduler’s only task is to figure out which task should run next, and run it for its assigned time-slice. When that has elapsed, a timer interrupt will pause the user task, and invoke the scheduler again to pick and run the next task.
The scheduler’s logic is something like this:
- Is there a queued task ready to run?
- If so, start it.
- If not, wait a while, and then go back to step 1.
Such a round-robin algorithm is easy to implement, but not very efficient, because as we already saw, most tasks spend a lot of time blocked (that is, waiting) on I/O operations. We don’t take any account of this, so even though round-robining tasks reduces latency (average time to complete a task), it doesn’t make much improvement to throughput (number of tasks completed in a given time).
In fact, throughput might even get worse, because multi-tasking adds a scheduling overhead: the time that the CPU spends executing scheduler code, and switching between different tasks, as opposed to actually running task code.
Can we do better?
A co-operative scheduler
One nice improvement to our simple scheduler would be co-operative multi-tasking: allowing a task to yield the CPU of its own accord when it doesn’t have any useful computation to do for a while (for example, if it just issued a request to read some data from disk).
The scheduler program now gets a little more complicated, because it has two queues to maintain, not one. Let’s call them the “ready queue” and the “blocked queue”.
The ready queue contains tasks that could use the CPU if it were available. When a new task is created (for example, by running a program), it will first of all be added to the back of the ready queue.
The blocked queue, on the other hand, contains tasks that have no further use to make of the CPU for the time being: they’re blocked. These tasks won’t be ready to run until some external event happens and unblocks them: perhaps a timer elapses, or an in-progress I/O request completes. When this happens, the scheduler will move them to the ready queue.
Meanwhile, the scheduler continues picking tasks from the ready queue and giving them CPU time. Once a task has run to completion, the scheduler can forget about it, freeing up the memory it was using to store the state of that task.
Concurrent tasks
You could see this scheme as implementing a little state machine, where any given task is always in one of three possible states:
- Running
- Blocked
- Ready
New tasks start their life in the ready queue. If a task is running, but becomes blocked, it moves to the blocked queue. When a blocked task becomes unblocked, it moves to the ready queue. When it reaches the front of the ready queue and a CPU is available, the task starts running. When a task completes, the next ready task is scheduled in its place.
The life cycle of a concurrent task
The scheduler’s task is to move each task from one state to the next as required. Indeed, the simpler the scheduler the better, because then we’re not wasting too much valuable CPU time on merely figuring out which task should run next.
Concurrency in Go
This is roughly how most concurrent computer systems work, and you’ll be pleased to know that it’s the same in Go. When you compile and execute your program, what’s actually happening is that the Go runtime is scheduling a number of different goroutines—Go’s name for concurrent tasks—to make the best use of your CPU.
In my book The Deeper Love of Go, you’ll learn all about concurrency in Go and how to use it. Theory is all very well, but it doesn’t mean much unless you can apply it to practical projects, and throughout the book you’ll do just that. Together we’ll build a concurrent, distributed database system with Go for a cheerful bookstore named Happy Fun Books.
As part of that, you’ll get a thorough understanding of how goroutines work and how they’re scheduled. Let’s start by writing a really simple concurrent program in Go, and see if we can figure out why it behaves the way it does.
Goroutines
You might be surprised to learn that, in fact, all your Go programs so far have been concurrent: they have multiple goroutines. At least one of them is executing the Go code you’ve written, such as reading and listing the book catalog. Others are doing housekeeping operations such as garbage collection: identifying and recycling bits of memory that are no longer required for your program.
We can ignore these “background” goroutines for now, and we’ll focus only on the “main” goroutine: the one that executes your Go code. In this little example, let’s call it “goroutine A”:
package main
import "fmt"
func main() {
for i := range 10 {
.Println("Hello from goroutine A!", i)
fmt}
}
Ranging over integers
This is a use of for ... range
that we haven’t seen
before. Previously, we’ve used the range
keyword to create
a loop that executes once for each element of a slice or map, which is a
pretty common thing to want to do.
Here, though, we have a loop that executes a fixed number of times
(sometimes called “range over int”, since we use an integer as the
argument to range
). In this case, it’s 10:
for i := range 10 {
Each time round the loop, i
takes on the next value, so
it goes 0, 1, 2, 3… 9, executing 10 times in total.
Here’s the output:
Hello from goroutine A! 0
Hello from goroutine A! 1
Hello from goroutine A! 2
Hello from goroutine A! 3
...
Hello from goroutine A! 9
The go
statement
As we’ve seen, your first goroutine is free, and you get it automatically just by running the program. But you can create new goroutines yourself, so that more than one Go function can be running concurrently.
The keyword to do this is easy to remember, because it’s right there
in the name of the language: go
. Let’s see what happens
when we add another goroutine to our example, using a go
statement:
package main
import (
"fmt"
)
func goroutineB() {
for i := range 10 {
.Println("Hello from goroutine B!", i)
fmt}
}
func main() {
go goroutineB()
for i := range 10 {
.Println("Hello from goroutine A!", i)
fmt}
}
Here’s the go
statement that creates a new
goroutine:
go goroutineB()
In Go, goroutines are function calls. So, to create one, we put the
go
keyword in front of a function call such as
goroutineB()
.
The goroutine that never was
It looks as though this program will have two goroutines, A and B, each printing its own set of ten messages. What will we see when we run it? Will we see all of A’s messages, followed by all of B’s messages, or vice versa, or will we see them interleaved? Let’s find out:
go run
Hello from goroutine A! 0
Hello from goroutine A! 1
Hello from goroutine A! 2
Hello from goroutine A! 3
...
Hello from goroutine A! 9
Well, that was weird! It looks like goroutine B didn’t run at all. So why not?
In the next post, we’ll answer that question, but see if you can figure it out for yourself in the meantime!