Escape Analysis in Go and a dive into Memory Management

Came across this interesting article and went off into a learning tangent about Memory Management. I found another article on this topic, also worth linking.

1. What is Escape Analysis?

In Go (or in any other language), variables can be allocated on the stack or the heap. In manually memory managed languages like C or C++, the distinction is quite clear - unless you use malloc or new explicitly, you are using the stack, and if you do use manual allocation, you are allocating objects on the heap.

Now, in memory managed languages like Go - allocation of memory is entirely on the compiler/runtime and so is the free'ing of said memory via Garbage Collection. That said, overheads of garbage collection can be reduced, if certain objects are allocated on the stack itself, instead of the heap. This is where "Escape Analysis" comes in - this is a compile time procedure to figure out if some variables (with strictly local scope) can be allocated on the stack. Exact algorithms and trade-offs vary, but the idea is simple.

The linked blog go into more details about Escape Analysis in Go.

2. How do I examine the analysis for my program?

go build -gcflags '-m -l'

The -l is to disable inlining of functions and is optional. You should see an output like the following:

./main.go:15:13: ... argument does not escape
./main.go:15:13: a escapes to heap

where a variable a supposedly escaped to the heap - could not be allocated on the stack.

3. Escapes due to fmt.Println

The Analysis reveals a surprising result for a simple program like the following:

package main

import (
    "fmt"
)

func foo() {
    a := 10
    fmt.Println(a)
}

Running the analysis for this reveals:

$ go build -gcflags '-m'
./main.go:32:13: inlining call to fmt.Println
./main.go:32:13: ... argument does not escape
./main.go:32:13: a escapes to heap

Why should a local variable like a here escape to the heap?

Just changing the fmt.Println(a) to the built-in println(a) will remove the escape line. Here-in is the secret (one that I have not completely grasped): fmt.Println takes a interface{} as the argument and this means anything can happen in the callee (such as reflectection), so such arguments always escape to the heap.

This is a known long-time issue in Go: https://github.com/golang/go/issues/8618

Does this mean you should use the built-ins and not the fmt functions? Absolutely not! If nothing, the builtins aren't actually guaranteed to be around.

But, there is another problem…

4. fmt.Println is actually more performant.

Now, just to confirm what we think is happening, let us try to benchmark this using a function like the following:

package main

import (
    "testing"
)

func BenchmarkFoo(b *testing.B) {
    for i := 0; i < b.N; i++ {
        foo()
    }
}

with both the variants, running the benchmark as follows:

go test -bench '^Benchmark' -benchmem

The -benchmem flag should get us some memory related stats.

What we see is something like the following:

println(a)
866140              1452 ns/op               0 B/op          0 allocs/op
fmt.Println(a)
1293528               913.7 ns/op             0 B/op          0 allocs/op

(This is obviously a simplistic run. Each version is run for 1 second. A better option is to average such numbers across multiple runs.) What does this mean?

The println version ran around 866k times in one second, taking around 1452 ns per run. Op here refers to each run.
The fmt.Println version is actually faster around 913 ns in this example. On repeated runs, this patterns holds - the fmt version is around 2/3rds the time taken by the built in.
Why is this? At this point I don't know.
Most surprisingly, both versions show 0 allocations per op. How does this mesh with what we just learnt above with the escape analysis output?

5. Is anything actually allocated on the heap?

So, I repeated the experiment above with fmt.Println and a few other data types (instead of int).

Bool did not get allocated (much like int). So, for standard data types, the escape analysis output is a bit misleading - and I don't really understand it yet.

Strings did get allocated with 16B/op 1 alloc/op. This means that for each run of the benchmark (one call of function foo), we have 1 object allocated of around 16 bytes. I varied the size of the string - so "haha", "hello world" and "hello world - how is this possible?" all had only 16 bytes. How does this make sense?

Strings are actually stored in the data segement of the binary. We can verify this my dissassembling the program using go tool compile -S main.go. Near the bottom, we see the lines:

go.string."haha" SRODATA dupok size=4
        0x0000 68 61 68 61                                      haha

So, the thing getting allocated is then just the interface that points to the string. We can verify that the interface is 16 bytes long with the following check:

import (
    "unsafe"
    "fmt"
)

func main() {
    var empty interface{}
    fmt.Println("Size of empty interface: ", unsafe.Sizeof(empty))
}

Moving onto structs with simple int field lead to a similar puzzling outcome like bools and ints: the escape analysis says that variable escapes, but there are 0 allocations.

There is one clue to what we may be missing (and this is mentioned in the linked blog post as well) - when you try to print out the address of such a variables, so use fmt.Println(&a), the message in the escape analysis is moved to heap: a, something stronger than the usual a escapes to heap. I believe the missing piece is somewhere around here.

When we try with slices and fmt.Println, we get something like the following:

a := []int{1}
810776              1385 ns/op              40 B/op          3 allocs/op
a := []int{1, 2}
717288              1596 ns/op              56 B/op          4 allocs/op
a := []int{1, 2, 3}
807025              1514 ns/op              72 B/op          5 allocs/op

The patterns are clear here, though I am not going to go deeper into what all this means.

One side point to note here: using println with slices leads to it printing addresses instead of the slice itself. Just fyi.

6. Can we avoid uneeded escaping?

There is some prior work from academia on this for Go: such as this work from ICSE'20.

But, scanning this paper makes it clear that Escape Analysis is a long studied field - especially for multi-threaded programming and Java back in 1997. Suprisingly, one of the highly cited papers in this field is from Manish Gupta - well known in Indian Research circles.

7. More memory management things

This function from the golang runtime can be used to read memory stats.

So, clearly Java has a similar approach, given the existence of prior art. This is a nice blog on the Java memory model, which goes into the "generational" approach to memory and garbage collection used in Java. The core principle is to divide memory uprfront into regions of "young" and "old" generation of objects where items are moved from the former to the latter if they survive multiple rounds of gc.

Though the official documentation is a bit unclear, there is this nice blog post on memory management in Rust.

There is this exhaustive collection of prior art in academia on memory management, well worth as a starting point.

It so happens that "Garbage Collection" was also introduced by John McCarthy in the original LISP paper from 1960!