This is the mail archive of the
mailing list for the GCC project.
Re: Questions about C as used/implemented in practice
- From: Peter Sewell <Peter dot Sewell at cl dot cam dot ac dot uk>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Mon, 25 May 2015 15:02:39 +0100
- Subject: Re: Questions about C as used/implemented in practice
- Authentication-results: sourceware.org; auth=none
- References: <CAHWkzRS5Pfedgre-b93g1yfRT4CeMkEkm+qzok9DFwYGVVZSeg at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1504252059210 dot 30042 at digraph dot polyomino dot org dot uk>
- Reply-to: Peter dot Sewell at cl dot cam dot ac dot uk
Many thanks for these responses. We'll want to discuss some of them
further, but, before we do, survey responses from any other GCC
developers would be very welcome, especially from those who know the
analysis and optimisation code. (So far GCC is relatively
under-represented in our data; we have more responses from Clang and
OS kernel developers). The survey is here:
It consists of 15 short questions about the sequential behaviour of C
memory and pointers.
On 25 April 2015 at 22:42, Joseph Myers <email@example.com> wrote:
> On Fri, 17 Apr 2015, Peter Sewell wrote:
>> [1/15] How predictable are reads from padding bytes?
>> If you zero all bytes of a struct and then write some of its members, do
>> reads of the padding return zero? (e.g. for a bytewise CAS or hash of
>> the struct, or to know that no security-relevant data has leaked into
> The padding may not be zero (both in practice, and as specified by C11
> 220.127.116.11#6). A plausible sequence of optimizations is to apply SRA,
> replacing the memset with a sequence of member assignments (discarding
> assignments to padding) in order to do so. To avoid leaks, allow hashing
> etc., padding should be explicitly named.
>> [2/15] Uninitialised values
>> Is reading an uninitialised variable or struct member (with a current
>> mainstream compiler):
>> (This might either be due to a bug or be intentional, e.g. when copying
>> a partially initialised struct, or to output, hash, or set some bits of
>> a value that may have been partially initialised.)
> Going to give arbitrary, unstable values (that is, the variable assigned
> from the uninitialised variable itself acts as uninitialised and having no
> consistent value). (Quite possibly subsequent transformations will have
> the effect of undefined behavior.)
> Inconsistency of observed values is an inevitable consequence of
> transformations PHI (undefined, X) -> X (useful in practice for programs
> that don't actually use uninitialised variables, but where the compiler
> can't see that).
>> [3/15] Can one use pointer arithmetic between separately allocated C
>> If you calculate an offset between two separately allocated C memory
>> objects (e.g. malloc'd regions or global or local variables) by pointer
>> subtraction, can you make a usable pointer to the second by adding the
>> offset to the address of the first?
> This is not safe in practice even if the alignment is sufficient (and if
> the alignment of the type is less than its size, obviously such a
> subtraction can't possibly work even with a naive compiler).
>> [4/15] Is pointer equality sensitive to their original allocation sites?
>> For two pointers derived from the addresses of two separate allocations,
>> will equality testing (with ==) of them just compare their runtime
>> values, or might it take their original allocations into account and
>> assume that they do not alias, even if they happen to have the same
>> runtime value? (for current mainstream compilers)
> It is not safe to assume that equality has a stable result in such cases
> (either in practice, or in my view of the standard as discussed in bug
>> [5/15] Can pointer values be copied indirectly?
>> Can you make a usable copy of a pointer by copying its representation
>> bytes with code that indirectly computes the identity function on them,
>> e.g. writing the pointer value to a file and then reading it back, and
>> using compression or encryption on the way?
> Yes, it is valid to copy any object that way (of course, the original
> pointer must still be valid at the time it is read back in).
> It is not, however, valid or safe to manufacture a pointer value out of
> thin air by, for example, generating random bytes and seeing if the
> representation happens to compare equal to that of a pointer. See DR#260.
> Practical safety may depend on whether the compiler can see through how
> the pointer representation was generated.
>> [6/15] Pointer comparison at different types
>> Can one do == comparison between pointers to objects of different types
>> (e.g. pointers to int, float, and different struct types)?
> Such a comparison violates the constraints on equality operators (C11
> 6.5.9#2). If you use conversions to compatible types or pointers to void,
> it can only be expected to be safe if you restrict yourself to cases where
> 18.104.22.168 defines the value resulting from the conversion (aliasing rules
> are based on the limitations on when pointer conversions are defined, not
> just on 6.5#7, and comparisons can get optimised in practice based on
> those rules).
>> [7/15] Pointer comparison across different allocations
>> Can one do < comparison between pointers to separately allocated
> This is likely to work in practice (for e.g. implementing functions like
> memmove) although not permitted by ISO C.
>> [8/15] Pointer values after lifetime end
>> Can you inspect (e.g. by comparing with ==) the value of a pointer to an
>> object after the object itself has been free'd or its scope has ended?
> Such a comparison may not give meaningful or consistent results (although
> the consequences are likely to be bounded in practice).
>> [9/15] Pointer arithmetic
>> Can you (transiently) construct an out-of-bounds pointer value (e.g.
>> before the beginning of an array, or more than one-past its end) by
>> pointer arithmetic, so long as later arithmetic makes it in-bounds
>> before it is used to access memory?
> This is not safe; compilers may optimise based on pointers being within
> bounds. In some cases, it's possible such code might not even link,
> depending on the offsets allowed in any relocations that get used in the
> object files.
>> [10/15] Pointer casts
>> Given two structure types that have the same initial members, can you
>> use a pointer of one type to access the intial members of a value of the
> This is not safe in practice (unless a union is visibly used as described
> in 22.214.171.124#6).
>> [11/15] Using unsigned char arrays
>> Can an unsigned character array be used (in the same way as a mallocâd
>> region) to hold values of other types?
> No, this is not safe (if it's visible to the compiler that the memory in
> question has unsigned char as its declared type).
>> [12/15] Null pointers from non-constant expressions
>> Can you make a null pointer by casting from an expression that isn't a
>> constant but that evaluates to 0?
> In practice this is safe with GCC (as a consequence of casting between
> pointers and integers working), although not guaranteed by ISO C.
>> [13/15] Null pointer representations
>> Can null pointers be assumed to be represented with 0?
> For all targets supported by GCC, yes.
>> [14/15] Overlarge representation reads
>> Can one read the byte representation of a struct as aligned words
>> without regard for the fact that its extent might not include all of the
>> last word?
> In practice this is safe with GCC except for possibly generating errors
> with sanitizers, valgrind etc. (but should be avoided except in special
> cases such as vectorized string operations).
>> [15/15] Union type punning
>> When is type punning - writing one union member and then reading it as a
>> different member, thereby reinterpreting its representation bytes -
>> guaranteed to work (without confusing the compiler analysis and
>> optimisation passes)?
> It should work in all cases, though in practice internal compiler errors
> have occasionally been known to occur for some of the less likely cases if
> they result in things the back end didn't expect to see, e.g.
> reinterpreting a pointer to a string constant as a floating-point number.
> This was defined as a GCC extension even before C99 TC3 added a footnote
> (non-normative) describing type punning.
> Joseph S. Myers