Aug 15

Aug 15 Go go goroutines: understanding Go’s concurrency model

I try to take one day at a time, but sometimes several days attack me at once.
—Ashleigh Brilliant

Your brain is limited. I don’t mean any offence, you understand: your brain is a wonderful instrument, and it’s just as good as anyone else’s, or perhaps slightly better. But even the smartest brain can still only do one thing at once, despite our constant and ineffectual attempts to multi-task with it…

…sorry, I was scrolling on my phone and lost the thread. But what if you could do two things at once, or even more? What would that feel like? What if you had two brains, for example? How would you organise them to work on independent tasks? And how would you stop them interfering with each other and making you very confused?

Well, your computer does have two brains. Actually, mine has—checks notes—eighteen, though yours probably has more. And, even though I only have eighteen “processing units”, I can run a lot more than eighteen independent tasks: thousands, or even millions. How does that work? And how would you ever write a program like that?

In this series we’ll explain the basics of concurrency, see what concurrent programming looks like, explore some of the common pitfalls and how to avoid them, and learn why concurrency is a key feature of Go. It’s not terribly complicated, but it’s probably a little unfamiliar to most of us, so let’s work up to it by easy stages.

Multi-tasking

In the beginning, then, each computer could run one program at a time, and it ran from the beginning to the end and that was that. But the earliest computers were extremely expensive and, it turns out, they actually spent a lot of their valuable time idle, for one reason or another.

For example, a typical task might be something like “read a few million data records from punched cards or paper tape, compute some statistics about them, and then write a report to a printer”. And, since physical peripherals such as card readers and printers work very slowly compared to the computer’s central processing unit (CPU) itself, a large part of the running time of this program would be spent doing nothing but waiting for these input / output (I/O) operations to finish. The actual computation part would be over in a flash, relatively speaking.

It made financial as well as practical sense to come up with a scheme to increase the CPU’s duty cycle: the proportion of the total running time it spends doing useful work, as opposed to hanging around. Ideally, when the CPU has nothing to do for a while on the current task, it should work on some other task instead.

One great way to get more compute for your money is multi-tasking: running more than one task at once. How might this work, then?

The simplest and most obvious multi-tasking scheme is time-sharing: maintain a queue of tasks, and give each task an equal chunk of time on the CPU. When its time is up, the task gets paused and added to the back of the queue, to eventually run again in its turn.

We can implement this idea with a pleasantly simple scheduler program. The scheduler’s job is to figure out which task should run next. When its time slice has elapsed, a timer interrupt will pause the task, and invoke the scheduler again to pick the next one to run.

The scheduler’s logic is something like this:

Is there a queued task ready to run?
If so, start it.
If not, wait a while, and then go back to step 1.

Such a round-robin algorithm is easy to implement, but not very efficient, because as we already saw, most tasks spend a lot of time blocked (that is, waiting) on I/O operations. We don’t take any account of this, so even though round-robining tasks reduces latency (average time to complete a task), it doesn’t make much improvement to throughput (number of tasks completed in a given time).

In fact, throughput might even get worse, because multi-tasking naturally adds a scheduling overhead: the time that the CPU spends executing scheduler code, and switching between different tasks, as opposed to actually running task code.

Can we do better?

A co-operative scheduler

One nice improvement to our simple scheduler would be co-operative multi-tasking: allowing a task to yield the CPU of its own accord when it doesn’t have any useful computation to do for a while (for example, if it just issued a request to read some data from disk).

The scheduler program now gets a little more complicated, because it has two queues to maintain, not one. Let’s call them the “ready queue” and the “blocked queue”.

The ready queue contains tasks that could use the CPU if it were available. When a new task is created (for example, when the user starts a program), it will first of all be added to the back of the ready queue.

The blocked queue, on the other hand, contains tasks that have not yet completed, but have nothing useful to do on the CPU at the moment: they’re blocked. These tasks won’t be ready to run until some external event unblocks them: perhaps a timer elapses, or an in-progress I/O request completes. When this happens, the scheduler will move them to the ready queue.

Meanwhile, the scheduler continues picking tasks from the ready queue and giving them CPU time. Once a task has run to completion, the scheduler can forget about it, freeing up the memory it was using to store the state of that task.

Concurrent tasks

You could see this scheme as implementing a little state machine, where any given task is always in one of three possible states:

Running
Blocked
Ready

New tasks start their life in the ready queue. If a task is running, but becomes blocked, it moves to the blocked queue. When a blocked task becomes unblocked, it moves to the ready queue. When it reaches the front of the ready queue and a CPU is available, the task starts running. When a task completes, the next ready task is scheduled in its place.

The scheduler’s task is to move each task from one state to the next as required. Indeed, the simpler the scheduler the better, because then we’re not wasting too much valuable CPU time on merely figuring out which task should run next.

Concurrency is not parallelism

We’ve been talking up to now about concurrent tasks sharing a single CPU, and so it’s clear (I hope) that these tasks are not executing in parallel—that is, simultaneously.

We speak loosely as though concurrency and parallelism were the same thing, but they’re not. For example, I’m currently repainting the kitchen, but I’m concurrently writing this post. Obviously I’m not doing these things literally at the same time. But I’ve started both tasks, and finished neither of them, so they’re concurrent (“running together”).

Now, if I were typing with one hand and wielding a paintbrush with the other, that would be parallelism (and it would probably also make a bit of a mess of both jobs). Fortunately, computers are better at this than we are, and are in practically no danger of getting paint all over themselves.

Indeed, almost all modern computers have more than one CPU, and even individual CPUs are often split up into multiple execution cores or hardware threads, meaning that they can process several machine instructions at once.

This multiprocessing (as opposed to multitasking) capability doesn’t change our scheduling setup as much as you might think. We can still use a single ready queue and a single blocked queue, but now more than one task can be in the running state at once. Whenever a CPU becomes free, the scheduler can assign it the next ready task, just as before.

For better efficiency, as well as a global queue, we can also maintain local queues for each CPU, because it’s more expensive (that is, takes longer) to move a task from one CPU to another than it is to reschedule that task on the same CPU. A problem with multi-queueing is that one CPU could run out of work while others are still busy. To avoid this, a work-stealing scheduler can grab tasks from other CPU queues when necessary.

Concurrency in Go

This is roughly how most concurrent computer systems work, and you’ll be pleased to know that it’s the same in Go. When you compile and execute your program, what’s actually happening is that the Go runtime is scheduling a number of different goroutines—Go’s name for concurrent tasks—to make the best use of your CPU.

If, as is likely, you have multiple CPUs, Go can schedule goroutines across all of them, and the runtime will use work stealing to keep all your cores as busy as possible. Thus, you can have concurrency and parallelism. Such fun!

In my book The Deeper Love of Go, you’ll learn all about concurrency in Go and how to use it. Theory is all very well, but it doesn’t mean much unless you can apply it to practical projects, and throughout the book you’ll do just that. Together we’ll build a concurrent, distributed database system with Go for a cheerful bookstore named Happy Fun Books.

As part of that, you’ll get a thorough understanding of how goroutines work and how they’re scheduled. Let’s start by writing a really simple concurrent program in Go, and see if we can figure out why it behaves the way it does.

Goroutines

You might be surprised to learn that, in fact, all your Go programs so far have been concurrent: they have multiple goroutines. At least one of them is executing the Go code you’ve written, such as reading and listing the book catalog. Others are doing housekeeping operations such as garbage collection: identifying and recycling bits of memory that are no longer required for your program.

We can ignore these “background” goroutines for now, and we’ll focus only on the “main” goroutine: the one that executes your Go code. In this little example, let’s call it “goroutine A”:

package main

import "fmt"

func main() {
    for i := range 10 {
        fmt.Println("Hello from goroutine A!", i)
    }
}

(Listing hello/3)

Ranging over integers

This is a use of for ... range that we haven’t seen before. Previously, we’ve used the range keyword to create a loop that executes once for each element of a slice or map. Here, though, we have a loop that executes a fixed number of times (sometimes called “range over int”, since we use an integer as the argument to range).

In this case, the loop executes 10 times:

for i := range 10 {

Each time round the loop, i takes on the next value, so it goes 0, 1, 2, 3… 9. Here’s the output:

Hello from goroutine A! 0
Hello from goroutine A! 1
Hello from goroutine A! 2
Hello from goroutine A! 3
...
Hello from goroutine A! 9

The `go` statement

As we’ve seen, you get your first goroutine for free, just by running the program. But you can also create new goroutines yourself. The keyword to do this is easy to remember, because it’s the same as the name of the language: go.

Let’s see what happens when we add another goroutine to our example, using a go statement:

package main

import (
    "fmt"
)

func goroutineB() {
    for i := range 10 {
        fmt.Println("Hello from goroutine B!", i)
    }
}

func main() {
    go goroutineB()
    for i := range 10 {
        fmt.Println("Hello from goroutine A!", i)
    }
}

Here’s the go statement that creates a new goroutine:

go goroutineB()

In Go, goroutines are function calls. So, to create one, we put the go keyword in front of a function call such as goroutineB().

The goroutine that never was

It looks as though this program will have two goroutines, A and B, each printing its own set of ten messages. What will we see when we run it? Will we see all of A’s messages, followed by all of B’s messages, or vice versa, or will we see them interleaved? Let’s find out:

go run

Hello from goroutine A! 0
Hello from goroutine A! 1
Hello from goroutine A! 2
Hello from goroutine A! 3
...
Hello from goroutine A! 9

Well, that was weird! It looks like goroutine B didn’t run at all. So why not?

In the next post, we’ll answer that question by learning about starving, sleeping, and yielding, but see if you can figure it out for yourself in the meantime!

Aug 15 Go go goroutines: understanding Go’s concurrency model

Multi-tasking

Time-sharing

A co-operative scheduler

Concurrent tasks

Concurrency is not parallelism

Concurrency in Go

Goroutines

Ranging over integers

The go statement

The goroutine that never was

Aug 29 Elephants for breakfast: testing the untestable in Rust

Jul 6 Here comes the sun: building a weather client in Rust

Related Posts

The `go` statement