It's a lock: sync.Mutex in Go

It's a lock: sync.Mutex in Go

NIGEL: Don’t touch it!
MARTY: I wasn’t going to touch it, I was just pointing at it.
NIGEL: Well, don’t point, even.
“This is Spinal Tap”

We saw in Racing with disaster that when two goroutines go to war, a data race is all that you can score. That is, when your program concurrently reads and writes some piece of shared data, the results are non-deterministic, and that’s generally unacceptable in distributed systems.

Here’s the problematic program:

func main() {
    copies := 1
    go func() {
        if copies > 0 {
            copies--
            fmt.Println("Customer A got the book")
        }
    }()
    go func() {
        if copies > 0 {
            copies--
            fmt.Println("Customer B got the book")
        }
    }()
    time.Sleep(time.Second)
}

(Listing copies/1)

Goroutines A and B both read and try to decrement the shared copies variable, and the result is that we can’t predict which one will get there first, and thus which customer will get the book. In a distributed system where the copies variable is held on a remote computer, we could easily end up with a situation where both customers think they’ve got the last copy of the book.

For example, suppose that “goroutine A” is actually Amina, a Happy Fun Books customer happiness operative who’s trying to put through a purchase at her sales terminal. Concurrently, the part of “goroutine B” is played by Bhavana, who’s also trying to process purchasing the same book at her own terminal.

How can Amina and Bhavana co-ordinate to avoid a data race?

One very simple, and low-tech, idea would be to have Amina shout out “Hey, y’all, I’m about to modify book abc. Don’t do anything with abc until I say it’s okay!” And then later: “All good, I’m done with abc now”.

An audio-based protocol like this would work, sort of. It would mean a lot of shouting, though, and that’s not really ideal: in the best bookstores, a quiet, studious atmosphere should prevail.

A better scheme might be to have a big whiteboard in the middle of the store where everyone can see it. When Amina wants to make a change to book abc, she looks at the board to see if “abc” is already up there. If not, she writes it on the board. Then she makes her changes, and when she’s done, she rubs “abc” off the board so other users know it’s once again safe to modify abc.

What if Bhavana wants to make a change to abc while Amina still has it “locked” like this? Well, she has to wait. Yes, it might be slightly annoying for other users to wait until Amina releases the lock, but it’s better than the catalog data getting corrupted: that would annoy everybody.

And under normal circumstances, no one will need to lock a book for very long, so most books will spend most of the time unlocked. It’s unlikely that anyone will have to wait for the lock on a given book, and even if they do, the waiting time won’t be significant. We don’t expect much contention for these locks, in the jargon.

It’s a workable scheme, with some caveats. For example, Amina has to be careful to remember to obtain the lock—that is, to check the board to see if the book’s ID is up there, and add it herself if not. If she forgets to do this before modifying the book information, there’ll be a data race.

Also, she has to remember to release the lock afterwards: to rub the ID off the board. Otherwise, Bhavana will be stuck waiting for the lock forever, and more and more users will gradually get stuck waiting on the same lock over time, until eventually nobody can do anything at all (this is called a deadlock).

Despite these potential problems, locking protocols like this are the simplest way to prevent data races in real-life concurrent environments. They don’t usually involve a big whiteboard, but the principle is the same.

In my book The Deeper Love of Go, we’ll use Happy Fun Books as an example application to help us learn Go programming from scratch. Starting with fundamental concepts like functions, data, and variables, we’ll gradually build up a suite of Go software to store and manage book information. Along the way we’ll talk about writing command-line tools, HTTP servers and clients, designing REST APIs, storing and retrieving data as JSON, and all sorts of other exciting topics. I hope you’ll take a look!

Mutexes in Go

A lock system like the one we’ve described ensures mutually exclusive access to some data value, such as book abc. In other words, if I have access to it, that excludes you having access to it, and vice versa. In the jargon, then, the lock object that ensures this mutual exclusion is called a mutex.

Go’s standard library includes one of these, as part of the sync package: it’s called sync.Mutex. Here’s an example of how it’s commonly used:

var mutex sync.Mutex
mutex.Lock()
// do something with data
mutex.Unlock()

We needn’t worry at all about how the mutex works internally. We can treat it as a black box with two methods: Lock and Unlock. We call Lock to get the lock, we do whatever we need to do with the protected data, and then we call Unlock to release the lock when we’re finished.

So what happens if someone else already has the lock when we call Lock? The answer is that Lock will wait until the lock becomes available, and then obtain it for us. This can take an arbitrary amount of time (indeed, the lock might never be released), but if or when Lock does return, it guarantees that we now have the lock.

When someone holds the lock on the mutex, it blocks everybody else from accessing the protected data: that’s a mutex’s job, after all. So we should only acquire the lock right before we need it, and release it as soon as possible afterwards. We should also avoid doing anything too time-consuming during this period, to avoid unnecessarily blocking the lock for other users.

Both reads and writes need to be lock-aware

It sounds like a mutex will help us to ensure safe concurrent access to the catalog, but before we do that, there’s one more point we need to talk about. When do we need to acquire the mutex lock? Is it only when we want to change the data, or is it also when we’re reading it?

You might be tempted to say that it’s only when we’re changing the data. After all, if we’re just reading, we’re just reading! Why would we need to get a lock on something that we’re not changing?

But that idea doesn’t work, because even though we’re just reading, someone else might concurrently be writing. While someone has the “write lock” on the data, it’s not safe for anyone else to read it. By “not safe” I mean that the data we’re looking at it might be in the middle of being changed, and we could see partial or invalid results because of this.

In other words, a data race exists when two or more users (or different parts of the same program) are accessing the same piece of data concurrently, and at least one of those accesses is a write. So all accesses to mutex-protected data need to be lock-aware, even reads.

Here’s an updated version of our example program, this time with a sync.Mutex which we’ll use to protect the copies variable from concurrent access:

func main() {
    var mu sync.Mutex
    copies := 1
    go func() {
        mu.Lock()
        defer mu.Unlock()
        if copies > 0 {
            copies--
            fmt.Println("Customer A got the book")
        }
    }()
    go func() {
        mu.Lock()
        defer mu.Unlock()
        if copies > 0 {
            copies--
            fmt.Println("Customer B got the book")
        }
    }()
    time.Sleep(time.Second)
}

(Listing copies/2)

The “lock, defer unlock” pattern

Everything is the same as before except that now we have an extra variable mu, the mutex. Here’s the customer A goroutine (customer B’s is identical except for the printed message):

mu.Lock()
defer mu.Unlock()
if copies > 0 {
    copies--
    fmt.Println("Customer A got the book")
}

If the mutex scheme is going to work, then each goroutine has to obtain the lock before trying to read the copies variable. That’s what mu.Lock() does.

If the lock is available, great: we’ll obtain it and proceed straight away. If it’s in use by someone else, the call to mu.Lock() will wait until it’s released (that is, the calling goroutine will move to the blocked queue until the lock becomes available).

Once the goroutine has got the lock (and made it back to the front of the ready queue, if it had to wait in the meantime), it can proceed. We then defer the call to mu.Unlock(), ensuring that whatever happens in this goroutine, we release the lock when the function exits.

Taking the token

The sequence of events for each goroutine will be:

  1. Obtain lock (blocking if necessary).
  2. Read copies.
  3. If the book is still available, set copies to zero.
  4. Release lock.

This time, at step 3, the book can only be available to exactly one of the goroutines. The two critical sections (steps 2-3 in each goroutine) can no longer overlap: that’s what the mutex prevents.

It’s a little like a single-track railway line, where only one train can safely occupy a given section of line at a time. If two trains try to share it, especially when they’re travelling in opposite directions, it’s usually a bad day for everyone concerned. To prevent this, we need a way to enforce mutual exclusion of trains in each block (the equivalent of our critical section).

One typical scheme relies on a physical token that the train driver has to obtain before entering the block. On reaching the other end of the block, the driver hands the token back to a signaller, so that it can be used by the next train, and so on. Mechanical interlocking prevents the points and signals being set for any train movement until the corresponding token is in place.

The mutex in our example program is the digital equivalent of the token: each goroutine must wait until the token is available, signalling that it can enter the critical section. Once it leaves, it gives up the token so that other goroutines can use it to enter their own critical sections.

A race-free program with locking

In our book example, the critical section starts where the goroutine reads copies, and ends after it (optionally) writes to the same variable. So, before the goroutine enters this section, it must obtain the “token”, by calling mu.Lock(). After it leaves, it gives up the token by calling mu.Unlock() (via a defer statement).

One of two things has to happen, then: either the customer A routine gets the lock first and updates copies, meaning that customer B misses out, or the customer B goroutine gets the lock first, and customer A misses out.

Access to the copies variable is guaranteed by the lock to be mutually exclusive, eliminating the data race. Let’s see if the race detector agrees:

go run -race main.go

Customer B got the book

Well, that’s great news for customer B, and it’s good news for us, too: we no longer have a data race. Sadly, this is little comfort to customer A, who missed out. Maybe they’ll come back tomorrow.

Zero Cost to Love

Zero Cost to Love

0