This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC aliasing rules: more aggressive than C99?


Richard Guenther <richard.guenther <at> gmail.com> writes:
> On Sun, Jan 3, 2010 at 6:46 AM, Joshua Haberman <jhaberman <at> 
gmail.com> wrote:
> > The aliasing policies that GCC implements seem to be more strict than
> > what is in the C99 standard.  I am wondering if this is true or whether
> > I am mistaken (I am not an expert on the standard, so the latter is
> > definitely possible).
> >
> > The relevant text is:
> >
> >  An object shall have its stored value accessed only by an lvalue
> >  expression that has one of the following types:
> >
> >  * a type compatible with the effective type of the object,
> >  [...]
> >  * an aggregate or union type that includes one of the aforementioned
> >    types among its members (including, recursively, a member of a
> >    subaggregate or contained union), or 
>
> Literally interpreting this sentence the way you do removes nearly all
> advantages of type-based aliasing that you have when dealing with
> disambiguating a pointer dereference vs. an object reference
> and thus cannot be the desired interpretation (and thus we do not allow this).

Thank you for the information.  I am very interested in distilling this
issue into a concise and easy to understand guideline that C and C++
programmers can use to determine whether they are following the rules
correctly or not, especially since the warnings are not perfect.  The
GCC manpage gives a basic rule:
  
  In particular, an object of one type is assumed never to reside at the
  same address as an object of a different type, unless the types are
  almost the same.  For example, an "unsigned int" can alias an "int",
  but not a "void*" or a "double".  A character type may alias any other
  type.
  
However, this explanation does not address how the rule applies to
aggregates (structures and arrays) and unions.  Here is my attempt;
please correct anything that looks wrong.

The best way I have had this explained to me so far is that
dereferencing "upcasted" pointers is ok, but "downcasted" pointers not.
For the purposes of this explanation only, we define "upcasts" and
"downcasts" as:

  struct A { int x; } a;
  int i;

  int *pi = &a.x;  // upcast
  int foo = *pi;   // ok
    
  struct A *pa = (struct A*)&i;  // downcast
  int bar = pa->x;    // NOT ok
  struct A a2 = *pa;  // NOT ok

A distinguishing feature of the downcast is that it requires an actual
cast.  So in general, casts from one pointer type to another indicate
a likely problem.  Pointer casts *can* be valid, but only if you know
that the object was previously written as the casted-to type:

  struct A { int x; } a;
  int i;

  int *pi = &a.x;  // upcast
  // this downcast is just "undoing" the previous upcast.
  struct A *pa = (struct A*)&i;
  int foo = pa->x;  // ok

This is why perfect warnings about this issue are not possible; if we
see a downcast in isolation, we do not know if it is undoing a previous
upcast or not.  Only a tool like valgrind could check this perfectly, by
observing reads and writes at runtime and checking the types of pointers
that were used to perform the read/write.

It is possible in C (not C++) to run into trouble even without pointer
casts, since void* can assign to any pointer type without a cast:

  int i;
  void *voidp = &i;
  // Effective downcast.
  struct A *pa = voidp;
  int foo = pa->x;  // NOT ok

But since chars can alias anything, it is always allowed to read or
write an object's representation via char*.

  int i;
  char ch = *(char*)&i;  // ok

  char charray[sizeof(long)] = {...};
  long l = *(long*)charray;  // ok

This does not mean that casts to/from char* are always safe, for the
same reason that we have to watch out for void*: the object may have
previously been written as a different type.

Besides observing the upcast/downcast rule, the other major rule is that
pointers to union members may only be dereferenced for the *active*
union member, which can only be set by using the union directly.

  union U {
    int i;
    long l;
  } u;
  int *pi = &u.i;
  long *pl = &u.l;

  u.i = 5;
  int foo = *pi;   // ok, u.i is the active member.
  long bar = *pl;  // NOT ok, u.l is not the active member.

Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]