Towards a new pointer analysis library for Go

We are working on pointer analysis in Go. This journey started out with #45735 and my frustration running godoc with ‘-analysis=pointer’. Luckily, recently I also had an opportunity to study the topic of pointer analysis – although in a context which uses a completely different technology.

In this post, I will outline some lessons learned so far in our work on pointer analysis in go, focusing on design guiding principles.

From Complex to Simple

Let’s be frank, pointer/memory analysis tends to be too complex. A myriad of Phd theses, dedicated logics, comments in issues such as the above and elsewhere leaves many of us perplexed or confused. Where there is complexity like that, there are bugs, and it takes a lot of valuable time to track down and fix the bugs.

We need something much simpler. But we also need something incremental. Incremental, simple pointer analysis does not seem to be available as of yet in any context, but we are indeed pursuing just that.

Getting there might be like Go’s generics – it may require several design revisions.

However, at least we are off to a good start, because Go has more restrictive usage of pointers than other languages. These restrictions can make the analysis much simpler.

Data Alignment

Luckily, in safe Go code (no import “unsafe”, no cgo, pointers are always aligned with the type to which they point, they never point in the middle of something as one would find in C and the like:

int i;
int *p = &i;
char *q = ((char *) p) + 1;
// q no longer points to the beginning of the location of 'i'.
// it points somewhere properly inside the location of 'i'.

This simplifies a pointer analysis a great deal. In particular, we no longer need to reduce everything to bytes. Every pointer is always aligned with the type to which it points.

Escape analysis

In Go, escaped pointers are moved to the heap. This means that we do not need to consider undefined behavior when derefencing a pointer that escapes – it just points to whatever it did when it was created.

GC

The garbage collection relieves us of the need to model double free’s or classical alloc/free memory heaps. This is in line with classical pointer analysis literature, where allocations can be considered unique. Without Garbage collection, where there are free()’s, a pointer to leaked memory may equal a subsequent allocation. We do not have this problem in pointer analysis for Go.

Comparison

Go permits equality comparison of pointers, but not other comparison

var i int = 0
var p = &i
var q = &i
if p == q {...} // This is OK.
if p < q {...} // This is a compile error.

This frees us of the need to model when one pointer is less than another.

Pain Points and Decisions

What IR

There are lots of IRs out there, so we have decided to organise ‘pal’ so that it is relatively independent of of the program representation. This allows

Having a separate clean memory model.
Being able to apply ‘pal’ to different IRs, such as go/ssa, the gc compiler ssa IR, or staticcheck’s IR, or other IRs.

Symbolic and Meta Symbolic

What is the common factor between existing tools, such as Infer and Gillian, which are incremental? If you squint a bit, perhaps you will see it as I do, and it is also in other success stories such as semgrep or coccinelle.

All these systems consider variables which represent program variables. I call this meta symbolic, because the program variables are symbolic, and having variables whose values are symbols, is, well, meta-symbolic. In systems using biabduction, the metasymbolic part comes in when the system guesses invariants; implicitly the domain over which guesses occur is symbolic. In systems such as semgrep and coccinelle, it is more evident – ‘$’ means anything in the program more or less.

This idea also can be extended to memory models. Our memory model is just a set of disjoint memory locations, each with a type and possibly sublocations. The locations have a points-to relation between them. This level is symbolic: each location symbolically represents a set of possible locations in some execution of the code under analysis.

To make that meta-symbolic, we just mark special meta-symbolic locations as opaque, or uninterpreted. An opaque location represents the set of all symbolic memory locations known or as of yet to be discovered, one level meta above symbolic locations.
These nodes are like ‘$’ in semgrep.

Once we have such opaque locations, they can be assigned to parameters of functions whose calls are as of yet unknown, or assigned to other variables. A standard pointer analysis can be run, giving us a points-to relation where some of the locations are meta-variables. Adding a mechanism to instantiate the meta-variables to a given analysis, as their possible values are disovered, should permit an incremental analysis and make it clear what is known when.

Values and bounds

One critical application of pointer analysis is to determine when things are out of bounds. For example, consider this code

var i, j int
tab := make([]*int, 3, 7)
tab[0] = &i
tab[1] = &j
p := &tab[3] // in or out of bounds?
p0 := &tab[0]
p1 := &tab[1]

Line 5 is out of bounds. ‘p0’ and ‘p1’ point to different objects.

In order to support this kind of analysis, the analyzer will need to have some understanding of integer expressions in Go. This pulls in a whole host of problems, including casting between integral types.

Decision.

We have decided to make the analysis parametric on an integral value analyzer, and to supply some very simple value analyzers to choose from.

(In turns out that doing this can benefit a great deal from Go’s upcoming type parameters. So we’ve decided to keep a dev.typeparams branch in our pal repo).

Conclusion

These are some high level thoughts about next steps for pointer analysis for Go. It is still exploratory.

If you are interested in co-developing or co-designing our first working prototype of ‘pal’ (of course in a very open open source way), drop us a line on

More heads are better than one.

If you are interested in sponsoring this work, it would help us help you a great deal. Please visit us here.

Thanks!

References

[1] Pointer Analysis. Foundations and Trends in Programming Languages Vol. 2, No. 1 (2015) 1–69 2015 Y. Smaragdakis and G. Balatsouras DOI: 10.1561/2500000014 (https://yanniss.github.io/points-to-tutorial15.pdf)

[2] Infer Compositional Analysis by means of bi-abduction Journal of the ACM Volume 58 Issue 6 December 2011 Article No.: 26pp 1–66https://doi.org/10.1145/2049697.2049700

[3] Andersen, Lars Ole (1994). Program Analysis and Specialization for the C Programming Language (PDF) (PhD thesis).

[4] Steensgaard, Bjarne (1996). “Points-to analysis in almost linear time” (PDF). POPL ‘96: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. New York, NY, USA: ACM. pp. 32–41. doi:10.1145/237721.237727. ISBN 0-89791-769-3.

[5] Petar Maksimović and José Fragoso Santos and Sacha-Élie Ayoun and Philippa Gardner (2021). Gillian: A Multi-Language Platform for Unified Symbolic Analysis.

[6] Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). “srcPtr: A Framework for Implementing Static Pointer Analysis Approaches” (PDF). ICPC ‘19: Proceedings of the 27th IEEE International Conference on Program Comprehension. Montreal, Canada: IEEE.

[7] golang.org/x/tools/go/pointer.

[8] semgrep. https://semgrep.dev.

[9] coccinelle. https://coccinelle.gitlabpages.inria.fr/website/papers.html.