Closed7

Go Memory Model (1.19)

yohhoyyohhoy

The Go Memory Model (Version of June 6, 2022):

Advice
If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Don't be clever.

Go 1.19 Release Notes

Memory Model
The Go memory model has been revised to align Go with the memory model used by C, C++, Java, JavaScript, Rust, and Swift. Go only provides sequentially consistent atomics, not any of the more relaxed forms found in other languages. Along with the memory model update, Go 1.19 introduces new types in the sync/atomic package that make it easier to use atomic values, such as atomic.Int64 and atomic.Pointer[T].

yohhoyyohhoy

Part 1: Hardware Memory Models

"Sequential Consistency" by Leslie Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," 1979

  • x86 TSO(total store order)
  • x86 TLO+CC(total lock order + causal consistency)
  • ARM/POWER Relaxed Memory Model

It appears that all Intel processors did implement x86-TSO from the start, even though it took a decade for Intel to decide to commit to that.

"data-race-free (DRF)" by Sarita Adve , Mark D. Hill, "Weak Ordering - A New Definition," 1990

This idea that a system guarantees to data-race-free programs the appearance of sequential consistency is often abbreviated DRF-SC.

DRF-SC marked a turning point in hardware memory models, providing a clear strategy for both hardware designers and software authors, at least those writing software in assembly language.

yohhoyyohhoy

Part 2: Programming Language Memory Models

Programming language memory models answer the question of what behaviors parallel programs can rely on to share memory between their threads.

Even today there is significant variation among languages on second-order questions, including:

  • What are the ordering guarantees for atomic variables themselves?
  • Can a variable be accessed by both atomic and non-atomic operations?
  • Are there synchronization mechanisms besides atomics?
  • Are there atomic operations that don't synchronize?
  • Do programs with races have any guarantees at all?

Processors that provide DRF-SC (all of them, these days) guarantee that programs without data races behave as if they were running on a sequentially consistent architecture. This is the fundamental guarantee that makes it possible to write correct multithreaded assembly programs on modern processors.

The programming language memory model is an attempt to precisely answer these questions about which optimizations are allowed and which are not.

Original Java Memory Model (1996)

Unfortunately, this attempt, in the first edition of the Java Language Specification (1996), had at least two serious flaws.

The first flaw was that volatile atomic variables were non-synchronizing, so they did not help eliminate races in the rest of the program.

The orginal Java memory model was also too strong: mandating coherence—once a thread had read a new value of a memory location, it could not appear to later read the old value—disallowed basic compiler optimizations.

New Java Memory Model (2004)

Jeremy Manson, Bill Pugh, Sarita Adve, "The Java Memory Model," 2005

The new model follows the DRF-SC approach: Java programs that are data-race-free are guaranteed to execute in a sequentially consistent manner.

The new Java memory model, like the original, defined the behavior of racy programs, for a number of reasons:

  • To support Java’s general security and safety guarantees.
  • To make it easier for programmers to find mistakes.
  • To make it harder for attackers to exploit problems, because the damage possible due to a race is more limited.
  • To make it clearer to programmers what their programs do.

[...] That said, it’s also still not quite right: there are problems with this use of happens-before for trying to define the semantics of racy programs.

Happens-before does not rule out incoherence

The first problem with happens-before for defining program semantics has to do with coherence (again!) [....]

[...] But the fact that the memory model allows different reads shows that it is, in a certain technical way, not precisely describing real Java implementations.

Jaroslav Sevcik and David Aspinall, "On Validity of Program Transformations in the Java Memory Model," 2007

Happens-before does not rule out acausality

[...] In this example, 42 is called an out-of-thin-air value, because it appeared without any justification but then justified itself with circular logic. [...]

By Sarita V. Adve, Hans-J. Boehm, "Memory Models: A Case For Rethinking Parallel Languages and Hardware," 2010

Prohibiting such causality violations in a way that does not also prohibit other desired optimizations turned out to be surprisingly difficult. … After many proposals and five years of spirited debate, the current model was approved as the best compromise. … Unfortunately, this model is very complex, was known to have some surprising behaviors, and has recently been shown to have a bug.

C++11 Memory Model (2011)

ISO/IEC 14882:2011 Programming Languages -- C++ (draft ver.)

First, C++ makes no guarantees at all for programs with data races, which would seem to remove the need for much of the complexity of the Java model. Second, C++ provides three kinds of atomics: strong synchronization (“sequentially consistent”), weak synchronization (“acquire/release”, coherence-only), and no synchronization (“relaxed”, for hiding races).

Unlike Java, C++ gives no guarantees to programs with races. Any program with a race anywhere in it falls into "undefined behavior." [...] This is often called "DRF-SC or Catch Fire": if the program is data-race free it runs in a sequentially consistent manner, and if not, it can do anything at all, including catch fire.

Briefly, there are four common justifications for this position:

  • C and C++ are already rife with undefined behavior, corners of the language where compiler optimizations run wild and users had better not wander or else. What's the harm in one more?
  • Existing compilers and libraries were written with no regard to threads, breaking racy programs in arbitrary ways. It would be too difficult to find and fix all the problems, or so the argument goes, although it is unclear how those unfixed compilers and libraries are meant to cope with relaxed atomics.
  • Programmers who really know what they are doing and want to avoid undefined behavior can use the relaxed atomics.
  • Leaving race semantics undefined allows an implementation to detect and diagnose races and stop execution.

In essence, modern C and C++ compilers assume no programmer would dare attempt undefined behavior.

C, Rust and Swift Memory Models

C11 adopted the C++11 memory model as well, making it the C/C++11 memory model.

Rust 1.0.0 in 2015 and Swift 5.3 in 2020 both adopted the C/C++ memory model in its entirety, with DRF-SC or Catch Fire and all the atomic types and atomic fences.

JavaScript Memory Model (2017)

ECMAScript 2017 Language Specification (8th Ed.), §27 Memory Model

Of course, having shared writable memory also requires defining atomic operations for synchronization and a memory model. JavaScript deviates from C++ in three important ways:

  • First, it limits the atomic operations to just sequentially consistent atomics. Other atomics can be compiled to sequentially consistent atomics with perhaps a loss in efficiency but no loss in correctness, and having only one kind simplifies the rest of the system.
  • Second, JavaScript does not adopt "DRF-SC or Catch Fire." Instead, like Java, it carefully defines the possible results of racy accesses. The rationale is much the same as Java, in particular security. Allowing a racy read to return any value at all allows (arguably encourages) implementations to return unrelated data, which could lead to leaking private data at run time.
  • Third, in part because JavaScript provides semantics for racy programs, it defines what happens when atomic and non-atomic operations are used on the same memory location, as well as when the same memory location is accessed using different-sized accesses.

It is encouraging that at least for now JavaScript has avoided adding any other atomics besides the sequentially consistent ones and has resisted "DRF-SC or Catch Fire." The result is a memory model valid as a C/C++ compilation target but much closer to Java.

Conclusions

  • They all provide sequentially consistent synchronizing atomics for coordinating the non-atomic parts of a parallel program.
  • They all aim to guarantee that programs made data-race-free using proper synchronization behave as if executed in a sequentially consistent manner.
  • Java resisted adding weak (acquire/release) synchronizing atomics until Java 9 introduced VarHandle. JavaScript has avoided adding them as of this writing.
  • They all provide a way for programs to execute “intentional” data races without invalidating the rest of the program. [...]
  • None of the languages have found a way to formally disallow paradoxes like out-of-thin-air values, but all informally disallow them.
yohhoyyohhoy

Part 3: Updating the Go Memory Model

The current Go language memory model was written in 2009, with minor updates since. It is clear that there are at least a few details that we should add to the current memory model, among them an explicit endorsement of race detectors and a clear statement of how the APIs in sync/atomic synchronize programs.

Go's Design Philosophy

Go aims not just for understandable programs but also for an understandable language and understandable package APIs. Complex or subtle language features or APIs contradict that goal.

Another aspect of Go being a useful programming environment is having well-defined semantics for the most common programming mistakes, which aids both understandability and debugging. This idea is hardly new.

The common sense of having well-defined semantics for buggy programs is not as common as one might expect. In C/C++, undefined behavior has evolved into a kind of compiler writer's carte blanche to turn slightly buggy programs into very differently buggy programs, in ever more interesting ways. Go does not take this approach: there is no “undefined behavior.” In particular, bugs like null pointer dereferences, integer overflow, and unintentional infinite loops all have defined semantics in Go.

carte blanche(仏語): 自由裁量、全権委任。白紙委任状。

Go's Memory Model Today

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.

This remains good advice. The advice is also consistent with other languages' encouragement of DRF-SC: synchronize to eliminate data races, and then programs will behave as if sequentially consistent, leaving no need to understand the remainder of the memory model.

Changes to Go's Memory Model

Document Go's overall approach

Other programming languages typically take one of two approaches to programs containing data races. The first, exemplified by C and C++, is that programs with data races are invalid: a compiler may break them in arbitrarily surprising ways. The second, exemplified by Java and JavaScript, is that programs with data races have defined semantics, limiting the possible impact of a race and making programs more reliable and easier to debug. Go's approach sits between these two. Programs with data races are invalid in the sense that an implementation may report the race and terminate the program. But otherwise, programs with data races have defined semantics with a limited number of outcomes, making errant programs more reliable and easier to debug.

[...] Defining enough of a memory model to guide programmers and compiler writers is important, but defining one in complete formality—correctly!—seems still just beyond the grasp of the most talented researchers. It should suffice for Go to continue to say the minimum needed to be useful.

Document happens-before for sync libraries

New APIs have been added to the sync package since the memory model was written. We need to add them to the memory model (issue #7948).

sync.Cond, sync.Map, sync.Pool, sync.WaitGroup

Document happens-before for sync/atomic

Atomic operations are missing from the memory model. We need to add them (issue #5045)

Maybe: Add a typed API to sync/atomic

Whether a particular value should be accessed with atomics is therefore a property of the value and not of a particular access. Because of this, most languages put this information in the type system, like Java's volatile int and C++'s atomic<int>. Go's current APIs do not, meaning that correct usage requires careful annotation of which fields of a struct or global variables are expected to only be accessed using atomic APIs.

Maybe: Add unsynchronized atomics

All other modern programming languages provide a way to make concurrent memory reads and writes that do not synchronize the program but also don't invalidate it (don't count as a data race). C, C++, Rust, and Swift have relaxed atomics. Java has VarHandle's “plain” mode. JavaScript has non-atomic accesses to the SharedArrayBuffer (the only shared memory). Go has no way to do this. Perhaps it should. I don't know.

(Go 1.19には導入されない)

robpike, https://github.com/golang/go/discussions/47141#discussioncomment-996275 (2021-07-13)

I would argue strongly against adding unsynchronized atomics, to keep with the philosophy of enough is enough, and the subtlety of adding more operations with different semantics.

Document disallowed compiler optimizations

The Go memory model restricts compiler optimizations as much as it does Go programs. Some compiler optimizations that would be valid in single-threaded programs are not valid in Go programs. In particular, a compiler must not introduce a data race in a race-free program. It must not allow a single read to observe multiple values. And it must not allow a single write to write multiple values.

Conclusion

Go's general approach of being conservative in its memory model has served us well and should be continued. There are, however, a few changes that are overdue, including defining the synchronization behavior of new APIs in the sync and sync/atomic packages. The atomics in particular should be documented to provide sequentially consistent behavior that creates happens-before edges synchronizing the non-atomic code around them. This would match the default atomics provided by all other modern systems languages.

Perhaps the most unique part of the update is the idea of clearly stating that programs with data races may be stopped to report the race but otherwise have well-defined semantics. This constrains both programmers and compilers, and it prioritizes the debuggability and correctness of concurrent programs over convenience for compiler writers.

このスクラップは2023/01/14にクローズされました