Range Semantics in Go

Introduction

A few months ago, famous Gopher Dave Cheney posted the following question on Twitter:

The answer, for the impatient, is that it does terminate. He and I went back and forth about why and it got me thinking about the semantics of the range statement. I promised Dave that I would write up the results of my experiments and I am finally doing so in this blog post.

The range clause controls a for loop and is used when the programmer wants to loop through all the values of an array, slice, map or channel. In this post I will deal only with the range clause over an array or slice.

Implicit in Dave’s question is how the range clause applied to an array behaves when the loop body modifies that array. In pseudo code, that looks something like this:

for index, value of all indexes and values of A {
    if there is an element of A at (index + 1)
        A[index + 1] = value + 1;
    print value
}

That may seem contrived, but its a relatively straightforward way to illustrate the semantics of the range clause.

For the purposes of this post, there will be two arrays:

a = {1,2,3}
b = {4,5,6}

The first thing to remember is that the “[t]he range expression is evaluated once before beginning the loop”. Yes, there is an exception to that rule. No, I am not going to focus on that now. The second important thing to remember is that assignments in go are done by copy. That means that functions receive copies of the parameters placed in the caller’s invocation. That means that a caller gets a copy of the return values from a function it invokes. And, with respect to the range clause, that means that the range clause creates a copy of the input variable to use during iteration.

By Value Range Clause

The first example explores the semantics of, what I will call, the “by value range clause”.

package main

import (
    "fmt"
)

func array_by_value() {
    var a, b [3]int

    a = [3]int{1, 2, 3}
    b = [3]int{4, 5, 6}
    
    fmt.Printf("Ref: 1 2 3\n")
    fmt.Printf("Act: ")
    for i, v := range a {
        fmt.Printf("%v ", v)
        if i < len(a)-1 {
            a[i+1] = b[i+1]
        }
    }
    fmt.Printf("\n")
}

func main() {
    array_by_value()
}

To run this example, you can go to the playground https://play.golang.org/p/fzKuCGhC4n

The output from this example is

Ref: 1 2 3
Act: 1 2 3

So, what happened to our assignments? Why aren’t the updated values printed at each iteration?

When the range statement is evaluated (again, remember that the range clause is evaluated exactly once at the start of the loop), the range clause copies a (again, remember that everything in Go is a copy). Because a is the array, its contents are copied into a temporary. That temporary value is indexed at each iteration of the loop to determine the value v.

See Figure 1 for a visual representation of the contents of memory at the start of the 0th iteration and the start of the 1st iteration. The array values highlighted in red are those indexed at each iteration to produce the value v.

Range by Value

What is important to notice about the above is how Go is making a copy of the contents of the array at the beginning of the loop. This will not be a problem when the array contains a few elements. There are, however, very interesting implications for this when the array has many elements. I will discuss the implications of range clause semantics for application performance below.

By Reference Range Clause

The second example explores the semantics of, what I will call, the “by reference range clause”. I call it “by reference” because passing an address or a pointer is how programmers achieve “by reference” semantics in “by value” languages like C[2] (and Go).

package main

import (
    "fmt"
)

func array_by_ref() {
    var a, b [3]int

    a = [3]int{1, 2, 3}
    b = [3]int{4, 5, 6}
    
    fmt.Printf("Ref: 1 5 6\n")
    fmt.Printf("Act: ")
    for i, v := range &a {
        fmt.Printf("%v ", v)
        if i < len(a)-1 {
            a[i+1] = b[i+1]
        }
    }
    fmt.Printf("\n")
}

func main() {
    array_by_ref()
}

To run this example, you can go to the playground
https://play.golang.org/p/TFYKoq0csJ

The output from this example is

Ref: 1 5 6
Act: 1 5 6

And, wow. In this version we do get the updated values! So, what’s going on?

When the range statement is evaluated (again, remember that the range clause is evaluated exactly once at the start of the loop), the range clause copies &a (again, remember that everything in Go is a copy). Because &a is just the address of the array, the address is copied into a temporary. That temporary value is indexed at each iteration of the loop to determine the value v.

See the figure below for a visual representation of what is happening in this “by reference” range clause example at the start of the 0th iteration and the start of the 1st iteration. The array values highlighted in red are those indexed at each iteration to produce the value v.

Range by Reference

To reiterate, Go is copying the reference and not the array to the temporary at the start of the loop. I will discuss the performance implications of this later.

By Slice Value Range Clause

The third example explores the semantics of, what I will call, the “by slice value range clause”. I think the name will be self explanatory when you see the code. The name reflects the fact that the range clause copies a  slice by value.

package main

import (
    "fmt"
)

func slice_by_value() {
    var a, b [3]int

    a = [3]int{1, 2, 3}
    b = [3]int{4, 5, 6}
    c := a[:]

    fmt.Printf("Ref: 1 5 6\n")
    fmt.Printf("Act: ")
    for i, v := range c {
        fmt.Printf("%v ", v)
        if i < len(a)-1 {
            a[i+1] = b[i+1]
        }
    }
    fmt.Printf("\n")
}

func main() {
    slice_by_value()
}

To run this example, you can go to the playground
https://play.golang.org/p/1OYBLTd30P

The output from this example is

Ref: 1 5 6
Act: 1 5 6 

And, wow again! We are seeing updated values despite the “by value” copy of the slice. What is going on here?

The slice is very much like a reference. Above we saw that a reference is a simple pointer to an underlying object. The slice is a pointer and metadata. The pointer is to the underlying array that supplies the slice’s values and the metadata is the length and capacity of the slice. For more information on the internals of slices in Go, see [3].

In the above example, the slice is copied by value. In other words, the pointer and the metadata are copied and not the array itself. For a visual representation, see the figure below. Again, the array values highlighted in red are those indexed at the start of iteration 0 and iteration 1 to produce the value v.

Range by Slice

To reiterate, Go is copying the slice and not the array to the temporary at the start of the loop. I will discuss the performance implications of the three different range semantics next.

Performance Implications

As mentioned earlier, each of the different semantics of the range clause have implications on performance. To understand these implications, let’s consider a scenario where the program is going to repeatedly execute a loop over an array that contains a large number of elements.

For our example, we will have an array called base that we are going to loop over 50000 times. base will contain 50000 entries. In this example the inner loop will be relatively contrived.[4]

Here is the Go version of the code that we will test:

package main

import (
    "fmt"
    "time"
)

func speed_of_value_range() {
    const size int = 50000
    const iterations int = 50000
    var base [size]int
    var start_time time.Time
    var k int

    for i := 0; i < size; i++ {
        base[i] = i
    }

    start_time = time.Now()
    for i := 0; i < iterations; i++ {
        for _, j := range base {
            j++
            k = j
        }
    }
    fmt.Printf("%v Speed: %v\n", k, time.Now().Sub(start_time))
}

func main() {
    speed_of_value_range()
}

On my machine, this code takes 2.153868183s to execute.

Let’s compare that with the runtime for two almost identical functions. The only difference will be the way that we use the range clause.

In the next example, we will execute the range clause by reference.

package main

import (
    "fmt"
    "time"
)

func speed_of_ref_range() {
    const size int = 50000
    const iterations int = 50000
    var base [size]int
    var start_time time.Time
    var k int

    for i := 0; i < size; i++ {
        base[i] = i
    }

    start_time = time.Now()
    for i := 0; i < iterations; i++ {
        for _, j := range &base {
            j++
            k = j
        }
    }
    fmt.Printf("%v Speed: %v\n", k, time.Now().Sub(start_time))
}

func main() {
    speed_of_ref_range()
}

On my machine, this code takes 1.323862716s to execute. Wow! That’s half the time!

Let’s try a third and final example where we will execute the range clause by slice value.

package main

import (
    "fmt"
    "time"
)

func speed_of_slice_range() {
    const size int = 50000
    const iterations int = 50000
    var base [size]int
    var start_time time.Time
    var k int

    c := base[:]

    for i := 0; i < size; i++ {
        base[i] = i
    }

    start_time = time.Now()
    for i := 0; i < iterations; i++ {
        for _, j := range c {
            j++
            k = j
        }
    }
    fmt.Printf("%v Speed: %v\n", k, time.Now().Sub(start_time))
}

func main() {
    speed_of_slice_range()
}

On my machine, this takes 1.279790841s to execute. Wow, even faster still (albeit by a smaller amount).

It seems obvious that we should execute range clauses by reference or by slice value whenever possible in order to gain as much performance as possible. Why is this?

If we look at the object code generated by the Go compiler for each of these loops, the answer is clear!

Here is the relevant section of object code for the first example, speed_of_value_range():

  486296: 31 db                 xor    ebx,ebx
  486298: 31 f6                 xor    esi,esi
  48629a: 48 81 fb 50 c3 00 00  cmp    rbx,0xc350
  4862a1: 7d 59                 jge    4862fc 
  4862a3: 48 8d bc 24 e0 1a 06  lea    rdi,[rsp+0x61ae0]
  4862aa: 00
  4862ab: 49 89 f0              mov    r8,rsi
  4862ae: 48 8d 74 24 60        lea    rsi,[rsp+0x60]
  4862b3: 41 89 c9              mov    r9d,ecx
  4862b6: 48 c7 c1 50 c3 00 00  mov    rcx,0xc350
  4862bd: f3 48 a5              rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
  4862c0: 31 c9                 xor    ecx,ecx
  4862c2: 48 8d b4 24 e0 1a 06  lea    rsi,[rsp+0x61ae0]
  4862c9: 00
  4862ca: 48 81 f9 50 c3 00 00  cmp    rcx,0xc350
  4862d1: 7d 17                 jge    4862ea 
  4862d3: 4c 8b 16              mov    r10,QWORD PTR [rsi]
  4862d6: 48 83 c6 08           add    rsi,0x8
  4862da: 48 ff c1              inc    rcx
  4862dd: 4d 8d 42 01           lea    r8,[r10+0x1]
  4862e1: 48 81 f9 50 c3 00 00  cmp    rcx,0xc350
  4862e8: 7c e9                 jl     4862d3 
  4862ea: 48 ff c3              inc    rbx
  4862ed: 44 89 c9              mov    ecx,r9d
  4862f0: 4c 89 c6              mov    rsi,r8
  4862f3: 48 81 fb 50 c3 00 00  cmp    rbx,0xc350
  4862fa: 7c a7                 jl     4862a3

And here is the relevant section of object code for the second example, speed_of_ref_range():

  4864d6: 31 db                 xor    ebx,ebx
  4864d8: 31 f6                 xor    esi,esi
  4864da: 48 81 fb 50 c3 00 00  cmp    rbx,0xc350
  4864e1: 7d 33                 jge    486516
  4864e3: 31 ff                 xor    edi,edi
  4864e5: 4c 8d 44 24 60        lea    r8,[rsp+0x60]
  4864ea: 48 81 ff 50 c3 00 00  cmp    rdi,0xc350
  4864f1: 7d 17                 jge    48650a
  4864f3: 4d 8b 08              mov    r9,QWORD PTR [r8]
  4864f6: 49 83 c0 08           add    r8,0x8
  4864fa: 48 ff c7              inc    rdi
  4864fd: 49 8d 71 01           lea    rsi,[r9+0x1]
  486501: 48 81 ff 50 c3 00 00  cmp    rdi,0xc350
  486508: 7c e9                 jl     4864f3
  48650a: 48 ff c3              inc    rbx
  48650d: 48 81 fb 50 c3 00 00  cmp    rbx,0xc350
  486514: 7c cd                 jl     4864e3

Although there are a number of differences between the two snippets of object code, there seems to be one major difference. The speed_of_value_range() contains this instruction:

rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi]

How is that instruction causing us such a problem? It’s easy to see once we think about the semantics of that instruction. The rep modifier to the instruction means that we will repeat the movs operation the number of times contained in the count register (e/r)cx. The instruction before the rep movs is

mov    rcx,0xc350

In other words, we are moving 0xc350 quad words (four bytes) every time we execute the inner loop. That means throughout the life of the speed_of_value_range() function the CPU will perform 0xc350 * 0xc350 moves that it will not do in the speed_of_ref_range() function. [5]

That’s where we get the speedup!

Conclusion

I hope that this post helped you understand the different semantics of the range clause. As a programmer when you update the values used in a range clause during the body of the loop, the type of range clause will affect what values you see in future iterations.

There are also performance implications. The simple advice for the best performing code is to use by value range semantics sparingly!

I’d love to hear what you think about these results and whether you agree or disagree. Please let me know in the comments below!

 

References

[1] This comes from the Go Programming Language Specification at https://golang.org/ref/spec.

[2] For an explanation of this, see http://denniskubes.com/2012/08/20/is-c-pass-by-value-or-reference/

[3] Go slices: Usage and internals https://blog.golang.org/go-slices-usage-and-internals

[4] The inner loop body is contrived just so that we can trick the compiler. The compiler aggressively optimizes loops when the body does not use the range value and index variables. This will affect our timing results.

[5] It will not do those moves in the speed_by_slice_range() either. Proving this is left as an exercise for the reader! I love when people say that!