This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC aliasing rules: more aggressive than C99?


The aliasing policies that GCC implements seem to be more strict than
what is in the C99 standard.  I am wondering if this is true or whether
I am mistaken (I am not an expert on the standard, so the latter is
definitely possible).

The relevant text is:

  An object shall have its stored value accessed only by an lvalue
  expression that has one of the following types:

  * a type compatible with the effective type of the object,
  [...]
  * an aggregate or union type that includes one of the aforementioned
    types among its members (including, recursively, a member of a
    subaggregate or contained union), or

To me this allows the following:

  int i;
  union u { int x; } *pu = (union u*)&i;
  printf("%d\n", pu->x);

In this example, the object "i", which is of type "int", is having its
stored value accessed by an lvalue expression of type "union u", which
includes the type "int" among its members.

I have seen other articles that interpret the standard in this way.
See section "Casting through a union (2)" from this article, which
claims that casts of this sort are legal and that GCC's warnings
against them are false positives:
  http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

However, this appears to be contrary to GCC's documentation.  From the
manpage:

  Similarly, access by taking the address, casting the resulting
  pointer and dereferencing the result has undefined behavior, even
  if the cast uses a union type, e.g.:

          int f() {
            double d = 3.0; 
            return ((union a_union *) &d)->i;
          }    

I have also been able to experimentally verify that GCC will mis-compile
this fragment if we expect the behavior the standard specifies:
  int g;
  struct A { int x; };
  int foo(struct A *a) {
    if(g) a->x = 5;
    return g;
  }

With GCC 4.3.3 -O3 on x86-64 (Ubuntu), g is only loaded once:

0000000000000000 <foo>:
   0:   8b 05 00 00 00 00       mov    eax,DWORD PTR [rip+0x0]        # 6 <foo+0x6>
   6:   85 c0                   test   eax,eax
   8:   74 06                   je     10 <foo+0x10>
   a:   c7 07 05 00 00 00       mov    DWORD PTR [rdi],0x5
  10:   f3 c3                   repz ret

But this is incorrect if foo() was called as:
  
  foo((struct A*)&g);

Here is another example:
  
  struct A { int x; };
  struct B { int x; }; 
  int foo(struct A *a, struct B *b) { 
    if(a->x) b->x = 5;
    return a->x;
  }

When I compile this, a->x is only loaded once, even though foo()
could have been called like this:
  
  int i;
  foo((struct A*)&i, (struct B*)&i);

>From this I conclude that GCC diverges from the standard, in that it does not
allow casts of this sort.  In one sense this is good (because the policy GCC
implements is more aggressive, and yet still reasonable) but on the other hand
it means (if I am not mistaken) that GCC will incorrectly optimize strictly
conforming programs.

Clarifications are most welcome!

Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]