Mar 14

Mar 14 A generic Set type in Go

You wanted generics. You got them. Now it’s time to pay the price.

In order to be irreplaceable, one must always be different.
—Coco Chanel

In Functional programming in Go, we looked at some interesting generic functions that we can write on slices (and, by extension, maps): for example, Map, Reduce, and Filter. Maps and slices are so useful and ubiquitous in Go precisely because they can be used as containers: values that store a collection of elements of any type.

Before the introduction of generics in Go 1.18, though, we were more or less limited to just maps and slices, because it wasn’t possible to write user-defined containers of arbitrary types.

Now that we can, what container types, or other generic data structures, would it be interesting to write?

Sets

A set is like a kind of compromise between a slice and a map. Like a slice, a set is a collection of elements, all of the same type. Like a map, there’s no defined order, and elements must be unique.

We very often need something like a set in Go, but there’s no built-in or standard library set type, so we tend to roll our own using a map. For example:

var validCategory = map[string]bool{
    "Autobiography":       true,
    "Large Print Romance": true,
    "Particle Physics":    true,
}

The value type of the map doesn’t matter here, since we’re only really interested in the keys. But it’s convenient to use bool values, since it means we can use them in conditional expressions:

if validCategory[category] {

As you probably know, a map index expression like this returns the zero value if the given key is not in the map. Since the zero value for bool is false, this means the if condition is false if category is not in the “set”.

As neat as this is, it feels a little hacky, so it’s nice that we can now write a proper Set type using generics. Let’s try.

Operations on sets

What kind of things would we like to be able to do with a set? There are many possibilities, but perhaps the most basic are adding an element, and checking whether the set contains a given element.

In other words, we’d like to be able to write something like this:

s := NewSet[int]()
s.Add(1)
fmt.Println(s.Contains(1))
// true

Let’s get to work!

Designing the `Set` type

We need something to keep the actual data in. A map would make sense, since it gives us the “unique key” property for free.

And we know the map keys will need to be of arbitrary (but comparable) type, since they represent the elements of our Set. What about the value type?

We could use bool again, but maybe struct{} is a better choice in this case, since it takes up no space.

So suppose we wrote some type definition like this:

type Set[E comparable] map[E]struct{}

In other words, for some comparable type E, a Set[E] is a map of E to empty struct.

If you’re already familiar with Go’s generics syntax, you’ll recognise E as a type parameter. It means that we’re not just defining a single type Set: instead, we’re defining a Set of something. A Set of int, for example, or perhaps string.

In fact, we can use any element type E, provided it satisfies the constraint we put on it: comparable. That’s a shorthand way of saying that E must be a type that you can use the == operator with. That makes sense, since we’re using a map as the underlying type for Set, and map keys must be comparable.

Since we can’t do much with a map until it’s been initialised, let’s provide a constructor function NewSet:

func NewSet[E comparable]() Set[E] {
    return Set[E]{}
}

The `Add` method

Let’s write an Add method on this type. It should take some value and store it in the set. We can do that by assigning to the map:

func (s Set[E]) Add(v E) {
    s[v] = struct
}

Easy, right? It’s just like adding a new key to a map, which happens to be exactly what we’re doing. We just don’t have to explicitly mention the map’s key type, because it doesn’t matter—so long as it’s comparable. The compiler will check this for us when someone tries to instantiate a Set of that type, and complain if it doesn’t satisfy the constraint.

The `Contains` method

The next job is to write Contains. How should it behave? That seems straightforward: given some value, it should return true if the value is in the set, or false otherwise.

You probably know that a map index expression in Go can return two values. The second, conventionally named ok, tells us whether the key was in fact in the map. And that’s the information we want here:

func (s Set[E]) Contains(v E) bool {
    _, ok := s[v]
    return ok
}

Let’s try out some simple set operations:

s := NewSet[string]()
s.Add("hello")
fmt.Println(s.Contains("hello"))
// true

Great! This is already useful. But let’s see what we can do to make it even better.

Initialising with multiple values

It seems as though it would be convenient to initialise a set with a bunch of elements supplied to NewSet, rather than having to create the set first and then add each element one by one. In other words, we’d like to write, for example:

s := NewSet(1, 2, 3) // no need to say “[int]”

Not only does this save time, but now we don’t need to instantiate NewSet explicitly, because the compiler can infer the type of E from the values we pass to it.

Let’s modify NewSet to take any number of E values:

func NewSet[E comparable](vals ...E) Set[E] {
    s := Set[E]{}
    for _, v := range vals {
        s[v] = struct
    }
    return s
}

While we’re at it, we’ll make the same change to Add, so that we can add any number of new members in one go:

func (s Set[E]) Add(vals ...E) {
    for _, v := range vals {
        s[v] = struct
    }
}

We’re ready to roll:

s := NewSet(true, true, true)
s.Add(true, true)
fmt.Println(s.Contains(false))
// false

What else could we do?

Getting a set’s members

It’ll almost certainly be convenient to get all the members of the set as a slice. One of the more annoying limitations of using maps as sets in Go is that it’s not straightforward to do this.

Let’s make it straightforward!

func (s Set[E]) Members() []E {
    result := make([]E, 0, len(s))
    for v := range s {
        result = append(result, v)
    }
    return result
}

Now we can write:

s := NewSet(1, 2, 3)
fmt.Println(s.Members())
// [1 2 3]

A `String` method

In fact, let’s add a String method so that we can print the set in a nice way:

func (s Set[E]) String() string {
    return fmt.Sprintf("%v", s.Members())
}

Now we don’t need to call Members to print out the set, and we can write simply:

fmt.Println(s)
// [2 3 1]

Notice that the members came out in a different order this time, and that’s okay!

It’s part of the definition of a set that its members aren’t in any special order, just like a map. And because we’re using a map to store the elements, we get that behaviour for free.

The union of two sets

Let’s do something more fun. A common operation on sets is to ask for the union of two sets: that is, the set that contains all the elements of both sets. How could we implement that?

One way would be to create a new set, using the members of set 1, then add the members of set 2. Here’s what that looks like:

func (s Set[E]) Union(s2 Set[E]) Set[E] {
    result := NewSet(s.Members()...)
    result.Add(s2.Members()...)
    return result
}

Making NewSet and Add variadic has paid off already, since it makes Union much easier to write, with no looping.

If this works, then we should be able to find the union of two small sets of integers:

s1 := NewSet(1, 2, 3)
s2 := NewSet(3, 4, 5)
fmt.Println(s1.Union(s2))
// [1 2 3 4 5]

Excellent! As we’d expect, Union has not just added together the two sets, it’s also removed any duplicates. We didn’t have to write any code to do this: again, it came free with the map.

An `Intersection` method

Let’s write Intersection, too, just for completeness. The intersection of two sets is the set of members that they have in common.

This is a little more difficult to write, but I think we can do it:

func (s Set[E]) Intersection(s2 Set[E]) Set[E] {
    result := NewSet[E]()
    for _, v := range s.Members() {
        if s2.Contains(v) {
            result.Add(v)
        }
    }
    return result
}

In other words, for each member of s, we check whether it is also present in s2. If so, we add it to the result set.

Does it work? Here goes:

s1 := NewSet(1, 2, 3)
s2 := NewSet(3, 4, 5)
fmt.Println(s1.Intersection(s2))
// [3]

Encouraging!

Logic on sets

Let’s try out our new Set type in a semi-realistic application: checking job applications against the required set of skills for a vacancy.

Since we’ve just written Intersection, this should be easy. We just need to define the set of required skills, and find its intersection with the candidate’s skills.

If there are any members in this set, then the candidate has at least one of the required skills, so we can hire them.

For example, suppose there’s a job available which requires either Go or Java skills, but I happen to know Go and Ruby. Would I qualify?

Let’s find out:

jobSkills := NewSet("go", "java")
mySkills := NewSet("go", "ruby")
matches := jobSkills.Intersection(mySkills)
if len(matches) > 0 {
    fmt.Println("You're hired!")
}
// You're hired!

If only it were that easy.

Challenge: Stack overflow

Now it’s your turn! See if you can solve this programming challenge using Go generics. Your task is to implement a stack: an abstract data type which stores items in a last-in-first-out way. The only thing you can do with a stack is “push” a new item onto it, or “pop” the top item off it.

You’ll be building a generic Stack[E] type that holds an ordered collection of values of as broad a range of types as possible.

You’ll need to write a Push method that can append any number of items to the stack, in order, and a Pop method that retrieves (and removes) the last item from the stack.

Pop should also return a second ok value indicating whether an item was actually popped. If the stack is empty, then Pop should return false for this second value, but true otherwise.

Also, you should provide a Len method that returns the number of items in the stack.

For example:

s := Stack[int]{}
fmt.Println(s.Pop())
// 0 false
s.Push(5, 6)
fmt.Println(s.Len())
// 2
v, ok := s.Pop()
fmt.Println(v)
// 6
fmt.Println(ok)
// true

If you’d like some hints, check out one possible solution.

What will you build next?

In pre-generics days, we had to write, or generate, lots of duplicated code to implement data structures on multiple specific types, so there was an obvious disincentive to put too much effort into it. Since it was impractical to provide general-purpose libraries of such data structures, we tended to hack together our own versions as needed.

That being so, they generally weren’t very sophisticated: no one wants to maintain N different versions of the same sophisticated data structure. So we often went without neat features like concurrency safety, high performance, state-of-the-art algorithms, and so on.

That’s no longer the case, and we can write generic abstract data types that are as sophisticated as we like. There are some examples in my book Know Go: Generics that may help to inspire you. See what you can come up with, and have fun!

Mar 14 A generic Set type in Go

Sets

Operations on sets

Designing the Set type

The Add method

The Contains method