Provenance-aware Memory Object Model for C
ISO/IEC TS 6010 A Provenance-aware Memory Object Model for C
Draft WG14/N3005 (WG21/P2318R2?)
WG14
- Defect Report #260
- N2311 Exploring C Semantics and Pointer Provenance
- N2363 C provenance semantics: examples
- N2364 C provenance semantics: detailed semantics
Articles
- 旧石器時代のポインタをご利用の皆様へ ~provenance入門~
- C/C++の「ポインタ」とは
- The Tower of Weakenings: Memory Models For Everyone
- Pointers Are Complicated II, or: We need better language specs
- Pointers Are Complicated III, or: Pointer-integer casts exposed
Rust
ISO/IEC TS 6010 (N3005), §1 Introduction
In a committee discussion from 2004 concerning DR260, WG14 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with same address but non-overlapping lifetimes.
Implementations started to use that concept, in optimisations relying on provenancebased alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard.This Technical Specification provides a solution for this: a provenance-aware memory object model for C to put C programmers and implementers on a solid footing in this regard. This draft Technical Specification is based on, and incorporates the content of, three earlier WG14 documents:
- N2362 Moving to a provenance-aware memory model for C: proposal for C2x by the memory object model study group. Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. This introduced the proposal and gives the proposed change to the standard text, presented as change-highlighted pages of the standard. Here, as appropriate for a Technical Specification, we instead present the proposed changes with respect to ISO/IEC 9899:2018.
- N2363 C provenance semantics: examples. Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker. This explains the proposal and its design choices with discussion of a series of examples.
- N2364 C provenance semantics: detailed semantics. Peter Sewell, Kayvan Memarian, Victor B. F. Gomes. This gives a detailed mathematical semantics for the proposal
In the first draft of this Technical Specification, the latter two parts have identical text to those earlier N-papers. Later, we integrated the following papers into the specification of this TS:
- N2861 Indeterminate Values and Trap Representations. Martin Uecker, Jens Gustedt. This paper has already been accepted by WG14 for ISO/IEC 9899:2023. It clarifies the previous contradictory terminology for what was then called "indeterminate values", but that described a property of an object representation.
- N2888 Exact-width Integer Type Interfaces. Jens Gustedt. This paper has also been accepted by WG14 for ISO/IEC 9899:2023. It clarifies some issues about integer types and is the basis for the integration of the following paper.
- N2889 Pointers and integer types. Jens Gustedt. Although this paper has not been accepted for ISO/IEC 9899:2023, WG14 voted in favor to integrate it in this TS. It makes the type
uintptr_tmandatory and thereby eases the specifications that are proposed here.In addition:
- At http://cerberus.cl.cam.ac.uk/cerberus we provide an executable version of the semantics, with a web interface that allows one to explore and visualise the behaviour of small test programs. Following N2363, we include the results of this for the example programs and for some major compilers.
The proposal has been developed in discussion among the C memory object model study group, including the authors listed above, Hubert Tong, Martin Sebor, and Hal Finkel. It has been discussed with WG14 (in multiple meetings) and at the March 2019 Cologne meeting of WG21, in SG12 UB & Vulnerabilities. Both of these have approved the overall direction, subject to implementation experience. It has also been discussed with the Clang/LLVM and GCC communities, with presentations and informal conversations at EuroLLVM and the GNU Tools Cauldron in 2018.
To the best of our knowledge and ability, the proposal reconciles the various demands of existing implementations and the corpus of existing C code.
This Technical Specification does not address subobject provenance.
- PVI (provenance-via-integers) model
- PNVI (provenance-not-via-integer) model
- PNVI plain
- PNVI-ae (address-exposed)
- PNVI-ae-udi (address-exposed user-disambiguation) (TS 6010)
The Tower of Weakenings: Memory Models For Everyone, §4 Strict Provenance: No More Getting Lucky
Ah ok, but the C folks seem to have cracked this one. If you hop on over to this 123 page paper, you’ll see the PNVI-ae-udi model, which at a high level roughly says:
- The Abstract Machine has a global list of all "exposed" allocations
- Allocations come into the world unexposed (unaliased)
- Casting a pointer to an integer exposes the allocation for that pointer’s provenance
- Casting an integer to a pointer does a global hittest on all exposed allocations
- If you hit an exposed allocation, congrats! You get provenance to it.
- If you hit two exposed allocations (because of one-past-the-end), be sad and tell the programmer to "just be consistent" and only access one of them.
This makes a lot of sense! Basically it’s saying if you cast a pointer to an integer, the compiler has to throw up its hands in disgust and just accept that any integer gets to be cast into a pointer into that allocation. That’s… pretty tame? Assuming you don’t need to allow other operations to expose.