Bug 46488 - [4.5 regression] server/core_filters.c from apache httpd 2.2.17 miscompiled at -O3
Summary: [4.5 regression] server/core_filters.c from apache httpd 2.2.17 miscompiled a...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.5.1
: P3 major
Target Milestone: 4.5.2
Assignee: Eric Botcazou
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-15 18:50 UTC by Lance Kinley
Modified: 2010-12-01 22:14 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.3
Known to fail: 4.5.2, 4.6.0
Last reconfirmed: 2010-11-16 17:11:11


Attachments
Preprocessed server/core_filters.c (51.13 KB, application/octet-stream)
2010-11-15 18:50 UTC, Lance Kinley
Details
test case (800 bytes, text/x-c)
2010-11-29 21:26 UTC, Paulo César Pereira de Andrade
Details
Stand-alone testcase (1.21 KB, text/plain)
2010-11-30 08:14 UTC, Eric Botcazou
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lance Kinley 2010-11-15 18:50:34 UTC
Created attachment 22403 [details]
Preprocessed server/core_filters.c

Compiling the latest Apache httpd (2.2.17) using GCC 4.5.1 on Solaris Sparc and 64-bit produces a non-functioning web server.

Attempts to connect to httpd result in an immediate hang-up with no logs.

Recompiling server/core_filters.c without -O[2,3] produces a working httpd.

GCC was built with the following options:
../configure --with-gnu-as --with-as=/sites/compiler/bin/as \
        --without-gnu-ld \
        --with-ld=/usr/ccs/bin/ld \
        --enable-shared --enable-languages=c,c++ \
        --with-gmp=/sites/compiler/64bit --with-mpfr=/sites/compiler/64bit \
        --prefix=/sites/compiler/64bit \
        --build=sparc64-sun-solaris2.10

GNU C (GCC) version 4.5.1 (sparc64-sun-solaris2.10)
        compiled by GNU C version 4.5.1, GMP version 4.3.2, MPFR version 2.4.2, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

There are no compiler warnings when compiling httpd.

httpd configure command:

./configure --prefix=/sites/httpd-2.2.17-worker-niagara-64 --enable-mods-shared=all --with-mpm=worker --enable-logio --enable-headers --disable-autoindex --enable-cgi --disable-userdir --enable-so --enable-ssl --disable-ipv6 --with-ssl=/sites/OPENSSL/openssl-1.0.0-niagara-64 --enable-info --enable-nonportable-atomics=yes --enable-deflate --enable-unique-id --with-pcre=/sites/utils --enable-proxy --build=sparc64-sun-solaris2.10
CFLAGS are:
CFLAGS='-m64 -mcpu=niagara -mtune=niagara -O3'

Pre-processed file is attached.
Comment 1 Eric Botcazou 2010-11-16 08:28:32 UTC
Any clue as to which function/which lines are miscompiled exactly?  Does it work if you compile the file with -O -fno-delayed-branch?
Comment 2 Lance Kinley 2010-11-16 16:37:20 UTC
Adding -fno-delayed-branch does not resolve the issue.

I don't know which function is the culprit.  I'll try to dig into it more.
Comment 3 Lance Kinley 2010-11-16 16:48:11 UTC
My apologies, -O -fno-delayed-branch does fix it.

-O3 -fno-delayed-branch does not work, however.
Comment 4 Eric Botcazou 2010-11-16 17:11:11 UTC
> My apologies, -O -fno-delayed-branch does fix it.
> 
> -O3 -fno-delayed-branch does not work, however.

There might be several issues then.  Could you try the latest 4.5.x snapshot?
At least a couple of issues related to -fdelayed-branch were fixed in between.
Comment 5 Lance Kinley 2010-11-16 23:52:24 UTC
I compiled the latest snapshot:
gcc (GCC) 4.5.2 20101111 (prerelease)

and tried -O3 -fno-delayed-branch again on server/core_filters.c and that still doesn't work.

I retried -O -fno-delayed-branch and that continues to work with the new gcc build.
Comment 6 Eric Botcazou 2010-11-17 00:09:42 UTC
> I compiled the latest snapshot:
> gcc (GCC) 4.5.2 20101111 (prerelease)
> 
> and tried -O3 -fno-delayed-branch again on server/core_filters.c and that still
> doesn't work.
> 
> I retried -O -fno-delayed-branch and that continues to work with the new gcc
> build.

-fno-delayed-branch is supposed to be useless with this snapshot though.
Comment 7 Lance Kinley 2010-11-17 06:15:36 UTC
I compiled and used the latest SVN snapshot:
gcc (GCC) 4.6.0 20101116 (experimental) [trunk revision 166840]

And got the same exact results as before with regard to both -O and -O3 in combination with -fno-delayed-branch
Comment 8 Eric Botcazou 2010-11-17 07:41:44 UTC
> And got the same exact results as before with regard to both -O and -O3 in
> combination with -fno-delayed-branch

Sorry for being unclear: starting with the 4.5.2 snapshot, issues related to -fno-delayed-branch should be fixed so you should drop this workaround entirely.
Comment 9 Richard Biener 2010-11-25 15:59:18 UTC
Works with GCC version ... ?
Comment 10 Eric Botcazou 2010-11-25 16:34:32 UTC
Does the problem only happen at -O3 with the 4.5.x snapshot, i.e. do compilations with bare -O or -O2 generate working executables?
Comment 11 Lance Kinley 2010-11-26 22:10:16 UTC
Works with -O and -O2 on the 4.5 snapshot and the 4.6 snapshot.

-O3 is the culprit.

I have not tested with any other versions other than 4.5.1 and the two snapshots mentioned.
Comment 12 Richard Biener 2010-11-26 22:22:12 UTC
I you can't confirm that it worked with 4.4.x or earlier versions with the
options that break with 4.5 and 4.6 this isn't a regression.

Btw, in the original report it sounds like -O2 is broken as well, but in
reality only -O3 is broken?  Both for 4.5 and 4.6 snapshots?
Comment 13 Lance Kinley 2010-11-26 22:57:08 UTC
-O and -O2 are broken in 4.5.1 but work in the 4.5 and 4.6 snapshots.
Comment 14 Eric Botcazou 2010-11-27 12:19:09 UTC
> Works with -O and -O2 on the 4.5 snapshot and the 4.6 snapshot.
> 
> -O3 is the culprit.

OK, thanks for confirming.  Any idea as to which part of the code is miscompiled?
Comment 15 Frédéric Buclin 2010-11-27 23:18:58 UTC
At least OpenSUSE and Mandriva Linux are affected by this issue, see:

 https://qa.mandriva.com/show_bug.cgi?id=61384
 https://issues.apache.org/bugzilla/show_bug.cgi?id=50190

Also, the 2nd report says that it was working fine with gcc 4.4, so it really looks like a regression in gcc 4.5.x.
Comment 16 Eric Botcazou 2010-11-28 08:58:47 UTC
> At least OpenSUSE and Mandriva Linux are affected by this issue, see:
> 
>  https://qa.mandriva.com/show_bug.cgi?id=61384
>  https://issues.apache.org/bugzilla/show_bug.cgi?id=50190

This looks indeed similar, thus recategorizing.
Comment 17 Eric Botcazou 2010-11-28 12:28:27 UTC
It appears that inlining brigade_move into ap_core_input_filter and ap_core_output_filter at -O3 causes (one of) the latter functions to be miscompiled with -fstrict-aliasing.

The code uses clever type punning which seems to break C aliasing rules:

/**
 * The Ring Sentinel
 *
 * This is the magic pointer value that occurs before the first and
 * after the last elements in the ring, computed from the address of
 * the ring's head.  The head itself isn't an element, but in order to
 * get rid of all the special cases when dealing with the ends of the
 * ring, we play typecasting games to make it look like one.

[...]

#define APR_RING_SENTINEL(hp, elem, link)				\
    (struct elem *)((char *)(&(hp)->next) - APR_OFFSETOF(struct elem, link))

(hp)->next is struct elem* but it is accessed (its link field) as a struct elem through various macros:

#define APR_RING_NEXT(ep, link)	(ep)->link.next
#define APR_RING_PREV(ep, link)	(ep)->link.prev
Comment 18 Eric Botcazou 2010-11-28 13:21:51 UTC
Using

#define APR_RING_SENTINEL(hp, elem, link)				\
    (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))

should be safer wrt strict aliasing.
Comment 19 Joe Orton 2010-11-29 11:02:26 UTC
(In reply to comment #18)
> Using
> 
> #define APR_RING_SENTINEL(hp, elem, link)                \
>     (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> 
> should be safer wrt strict aliasing.

We changed the macro from that to the current definition to avoid strict aliasing warnings from gcc:

  http://svn.apache.org/viewvc?view=revision&revision=662299

Does the cast through a char * not have the desired effect of allowing the pointer to alias any other pointer regardless of type?  That is one of the exceptions in C99 6.5/par 7.

Why specifically does this result in an C99 aliasing violation anyway?  The pointers to which this macro evaluates are never dereferenced, only compared for equality.
Comment 20 Eric Botcazou 2010-11-29 11:15:58 UTC
> We changed the macro from that to the current definition to avoid strict
> aliasing warnings from gcc:
> 
>   http://svn.apache.org/viewvc?view=revision&revision=662299

Bummer. :-)

> Does the cast through a char * not have the desired effect of allowing the
> pointer to alias any other pointer regardless of type?  That is one of the
> exceptions in C99 6.5/par 7.

Classical misconception.  What matters is the type of the lvalue used to access the object, not the types through which the pointer value is casted, i.e:

  float f = 0.0;
  int t;

  t = *(int *)(char *)&f;

is as erroneous as:

  float f = 0.0;
  int t;

  t = *(int *)&f;

> Why specifically does this result in an C99 aliasing violation anyway?  The
> pointers to which this macro evaluates are never dereferenced, only compared
> for equality.

I think they are dereferenced by the macro APR_RING_{NEXT,PREV} in some cases.
Comment 21 Joe Orton 2010-11-29 11:38:44 UTC
Thanks for the explanation.

(In reply to comment #20)
> > Why specifically does this result in an C99 aliasing violation anyway?  The
> > pointers to which this macro evaluates are never dereferenced, only compared
> > for equality.
> 
> I think they are dereferenced by the macro APR_RING_{NEXT,PREV} in some cases.

The pointers are constructed explicitly to never be dereferenced, only compared for equality; if a dereference exists it would be a bug, but I don't see one.

In the absence of such a smoking gun, can an C99 aliasing issue occur merely in handling pointer equivalence?
Comment 22 Eric Botcazou 2010-11-29 12:09:30 UTC
> The pointers are constructed explicitly to never be dereferenced, only
> compared for equality; if a dereference exists it would be a bug, but
> I don't see one.

APR_RING_SPLICE_HEAD does such a dereference as far I can see:

#define APR_RING_SPLICE_HEAD(hp, ep1, epN, elem, link)			\
	APR_RING_SPLICE_AFTER(APR_RING_SENTINEL((hp), elem, link),	\
			     (ep1), (epN), link)

#define APR_RING_SPLICE_AFTER(lep, ep1, epN, link) do {			\
	APR_RING_PREV((ep1), link) = (lep);				\
	APR_RING_NEXT((epN), link) = APR_RING_NEXT((lep), link);	\
	APR_RING_PREV(APR_RING_NEXT((lep), link), link) = (epN);	\
	APR_RING_NEXT((lep), link) = (ep1);				\
    } while (0)

#define APR_RING_NEXT(ep, link)	(ep)->link.next
#define APR_RING_PREV(ep, link)	(ep)->link.prev

> In the absence of such a smoking gun, can an C99 aliasing issue occur merely
> in handling pointer equivalence?

That would be a real compiler bug. :-)


You seem to be quite familiar with the httpd code.  Can you pinpoint what is
miscompiled exactly in ap_core_input_filter or ap_core_output_filter?  I'm not saying that the miscompilation comes from this strict aliasing issue, only that it might at this point.
Comment 23 Richard Biener 2010-11-29 13:36:20 UTC
(In reply to comment #22)
> > The pointers are constructed explicitly to never be dereferenced, only
> > compared for equality; if a dereference exists it would be a bug, but
> > I don't see one.
> 
> APR_RING_SPLICE_HEAD does such a dereference as far I can see:
> 
> #define APR_RING_SPLICE_HEAD(hp, ep1, epN, elem, link)            \
>     APR_RING_SPLICE_AFTER(APR_RING_SENTINEL((hp), elem, link),    \
>                  (ep1), (epN), link)
> 
> #define APR_RING_SPLICE_AFTER(lep, ep1, epN, link) do {            \
>     APR_RING_PREV((ep1), link) = (lep);                \
>     APR_RING_NEXT((epN), link) = APR_RING_NEXT((lep), link);    \
>     APR_RING_PREV(APR_RING_NEXT((lep), link), link) = (epN);    \
>     APR_RING_NEXT((lep), link) = (ep1);                \
>     } while (0)
> 
> #define APR_RING_NEXT(ep, link)    (ep)->link.next
> #define APR_RING_PREV(ep, link)    (ep)->link.prev
> 
> > In the absence of such a smoking gun, can an C99 aliasing issue occur merely
> > in handling pointer equivalence?
> 
> That would be a real compiler bug. :-)
> 
> 
> You seem to be quite familiar with the httpd code.  Can you pinpoint what is
> miscompiled exactly in ap_core_input_filter or ap_core_output_filter?  I'm not
> saying that the miscompilation comes from this strict aliasing issue, only that
> it might at this point.

Btw, in 4.6 this was likely "fixed" by making all pointers inherit the
same alias set as void * (as this is a very common misconception that
you can alias all pointers with void *, or that all pointers do alias
in general):

2010-08-25  Richard Guenther  <rguenther@suse.de>

        * alias.c (get_alias_set): Assign a single alias-set to all pointers.
        * gimple.c (gimple_get_alias_set): Remove special handling
        for pointers.

it doesn't make the type-punning valid, of course, but in 4.6 we try
harder to not break users code just because we are allowed to.
Comment 24 Richard Biener 2010-11-29 13:39:02 UTC
(In reply to comment #18)
> Using
> 
> #define APR_RING_SENTINEL(hp, elem, link)                \
>     (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> 
> should be safer wrt strict aliasing.

not really.  But using

typedef struct elem elem_ __attribute__((may_alias));
#define APR_RING_SENTINEL(hp, elem, link) \
    (struct elem_ *)((char *)(hp) - APR_OFFSETOF(struct elem, link))

would be safe wrt strict aliasing (if the pointer is only dereferenced
directly and not casted to another pointer type before, of course).
Comment 25 Joe Orton 2010-11-29 14:03:40 UTC
(In reply to comment #22)
> APR_RING_SPLICE_HEAD does such a dereference as far I can see:

You are right, sorry, I had forgotten exactly how this code works.

(In reply to comment #24)
> not really.  But using
> 
> typedef struct elem elem_ __attribute__((may_alias));
> #define APR_RING_SENTINEL(hp, elem, link) \
>     (struct elem_ *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> 
> would be safe wrt strict aliasing (if the pointer is only dereferenced
> directly and not casted to another pointer type before, of course).

I don't think we can guarantee this with the API unchanged; the sentinel pointer could well be stored in a "struct elem *" pointer.
Comment 26 rguenther@suse.de 2010-11-29 14:20:45 UTC
On Mon, 29 Nov 2010, jorton at redhat dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46488
> 
> --- Comment #25 from Joe Orton <jorton at redhat dot com> 2010-11-29 14:03:40 UTC ---
> (In reply to comment #22)
> > APR_RING_SPLICE_HEAD does such a dereference as far I can see:
> 
> You are right, sorry, I had forgotten exactly how this code works.
> 
> (In reply to comment #24)
> > not really.  But using
> > 
> > typedef struct elem elem_ __attribute__((may_alias));
> > #define APR_RING_SENTINEL(hp, elem, link) \
> >     (struct elem_ *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> > 
> > would be safe wrt strict aliasing (if the pointer is only dereferenced
> > directly and not casted to another pointer type before, of course).
> 
> I don't think we can guarantee this with the API unchanged; the sentinel
> pointer could well be stored in a "struct elem *" pointer.

You could stick the attribute on the elem struct declaration itself then.

Richard.
Comment 27 Paulo César Pereira de Andrade 2010-11-29 21:26:32 UTC
Created attachment 22572 [details]
test case

I learned most of apache/apr/apr-util internals while debugging this problem in mandriva cooker,
so, maybe the test case can be made simpler... The crash in mandriva is only confirmed to happen
in i586. Sample run of the test case:

[pcpa@localhost ~]$ gcc -O0 -fstrict-aliasing -D_LARGEFILE64_SOURCE -I/usr/include/apr-1 -g3 apr-test.c; LC_ALL=C ./a.out
It works!
[pcpa@localhost ~]$ gcc -O3 -fstrict-aliasing -D_LARGEFILE64_SOURCE -I/usr/include/apr-1 -g3 apr-test.c; LC_ALL=C ./a.out
a.out: apr-test.c:59: check: Assertion `e->link.next->link.prev == e' failed.
Abortado
[pcpa@localhost ~]$ gcc -O3 -fno-strict-aliasing -D_LARGEFILE64_SOURCE -I/usr/include/apr-1 -g3 apr-test.c; LC_ALL=C ./a.out
It works!

[pardon choosing the "It works!" string :-)]

It also appears to correct the issue with -fstrict-aliasing by adding temporaries, e.g:
-#define APR_RING_SPLICE_HEAD(hp, ep1, epN, elem, link)                 \
-       APR_RING_SPLICE_AFTER(APR_RING_SENTINEL((hp), elem, link),      \
-                            (ep1), (epN), link)
+#define APR_RING_SPLICE_HEAD(hp, ep1, epN, elem, link) do {            \
+       struct elem *lep = APR_RING_SENTINEL((hp), elem, link);         \
+       APR_RING_SPLICE_AFTER(lep, (ep1), (epN), link);                 \
+    } while (0)

The attribute noinline in the test case was to "somehow" make gcc not understand where
the pointers come from, otherwise, if removing any of them, it will also work.
Comment 28 Eric Botcazou 2010-11-30 07:40:00 UTC
> I learned most of apache/apr/apr-util internals while debugging this problem in
> mandriva cooker,
> so, maybe the test case can be made simpler... The crash in mandriva is only
> confirmed to happen in i586.

Thanks a lot, I can reproduce with 4.5.x and -O2 on i586.  Investigating.
Comment 29 Eric Botcazou 2010-11-30 08:14:30 UTC
Created attachment 22575 [details]
Stand-alone testcase

Fails at -O2 with 4.5.x on x86, doesn't fail with -fno-strict-aliasing or 4.6.0.
Comment 30 Richard Biener 2010-11-30 10:15:02 UTC
From a quick look I can see that with -fstrict-aliasing we will never consider
ptr->link.next to alias ptr->list.next (so if they are made to alias via
casting the testcase is invalid).

Which I can see here:

brigade_move(apr_bucket_brigade *b, apr_bucket_brigade *a, apr_bucket *e)
{
    apr_bucket *f;

    if (e != (struct apr_bucket *)((char *)(&(&(b)->list)->next) - ((long) (((char *) (&(((struct apr_bucket*)0)->link))) - ((char *) 0))))) 

(struct apr_bucket *)((char *)(&(&(b)->list)->next) does exactly this
(would be nice to rewrite the testcase to use offsetof btw).
Comment 31 Eric Botcazou 2010-11-30 12:57:29 UTC
> From a quick look I can see that with -fstrict-aliasing we will never consider
> ptr->link.next to alias ptr->list.next (so if they are made to alias via
> casting the testcase is invalid).

Yes, that's it.  The miscompilation happens very late (equiv memory location handling during register allocation) but ultimately indirect_refs_may_alias_p decides that:

(gdb) p debug_generic_expr(ref1)
D.2012_2->list.prev

(gdb) p debug_generic_expr(ref2)
D.2136_50->link.prev

don't alias by virtue of TBAA:

  /* Do type-based disambiguation.  */
  if (base1_alias_set != base2_alias_set
      && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
    return false;

PTR1 points to apr_bucket_brigade and PTR2 points to apr_bucket.

The aliasing violation is the dereference of APR_RING_SENTINEL which happens in APR_RING_SPLICE_HEAD and accesses an apr_bucket_brigade as an apr_bucket.

The violation is "sophisticated": the memory location have the same alias set

(gdb) p debug_rtx(mem)
(mem/s/v/f:SI (plus:SI (reg/f:SI 84 [ D.2136 ])
        (const_int 4 [0x4])) [2 D.2136_50->link.prev+0 S4 A32])

(gdb) p debug_rtx(x)
(mem/s/f:SI (plus:SI (reg/f:SI 97)
        (const_int 4 [0x4])) [2 D.2012_2->list.prev+0 S4 A32])

This may explain why the compiler doesn't warn with -Wstrict-aliasing.  It does warn with the more natural:

#define APR_RING_SENTINEL(hp, elem, link)				\
    (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))

pr46488.c: In function 'brigade_move':
pr46488.c:107:2: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:107:2: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:107:2: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c: In function 'test':
pr46488.c:158:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:158:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:158:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:159:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:159:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
pr46488.c:159:5: warning: dereferencing type-punned pointer will break strict-aliasing rules
Comment 32 Eric Botcazou 2010-11-30 15:10:58 UTC
The problem appears to be deeply rooted in the Ring construct, more precisely in the HEAD trick.  IIUC the idea is to "attach" a doubly-linked list to another structure by means of a "virtual" member overlaid on top of the structure; the only thing they actually share is a special APR_RING_ENTRY (the APR_RING_HEAD).  But this overlay fundamentally violates the aliasing rules even if one try to narrow the accesses to just the shared part.

Richard, is that how the aliasing rules are implemented in the 4.5.x series?  Has this been changed in 4.6.0?
Comment 33 rguenther@suse.de 2010-11-30 15:41:36 UTC
On Tue, 30 Nov 2010, ebotcazou at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46488
> 
> --- Comment #32 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2010-11-30 15:10:58 UTC ---
> The problem appears to be deeply rooted in the Ring construct, more precisely
> in the HEAD trick.  IIUC the idea is to "attach" a doubly-linked list to
> another structure by means of a "virtual" member overlaid on top of the
> structure; the only thing they actually share is a special APR_RING_ENTRY (the
> APR_RING_HEAD).  But this overlay fundamentally violates the aliasing rules
> even if one try to narrow the accesses to just the shared part.
> 
> Richard, is that how the aliasing rules are implemented in the 4.5.x series? 

Yes, also in 4.4 and 4.3, but maybe you need to be more lucky to trigger
the problem there.

> Has this been changed in 4.6.0?

No.  But with 4.6 we can ignore pointer types when doing copy-propagation
and thus we probably see that they must-alias (in which case we will
not apply TBAA to be more nice to our users).

Richard.
Comment 34 Paulo César Pereira de Andrade 2010-11-30 23:15:16 UTC
(In reply to comment #32)
> The problem appears to be deeply rooted in the Ring construct, more precisely
> in the HEAD trick.  IIUC the idea is to "attach" a doubly-linked list to
> another structure by means of a "virtual" member overlaid on top of the
> structure; the only thing they actually share is a special APR_RING_ENTRY (the
> APR_RING_HEAD).  But this overlay fundamentally violates the aliasing rules
> even if one try to narrow the accesses to just the shared part.
> 
> Richard, is that how the aliasing rules are implemented in the 4.5.x series? 
> Has this been changed in 4.6.0?

I said in my comment that adding temporaries could correct the issue,
but they do not. I had some different test trees, and reviewing them,
what actually "corrected" in one of the builds was adding a volatile
modifier to APR_RING_HEAD, as in:

#define APR_RING_HEAD(head, elem)					\
    struct head {							\
	struct elem * volatile next;					\
	struct elem * volatile prev;					\
    }

to better match APR_RING_ENTRY.
Comment 35 Eric Botcazou 2010-11-30 23:25:57 UTC
> I said in my comment that adding temporaries could correct the issue,
> but they do not. I had some different test trees, and reviewing them,
> what actually "corrected" in one of the builds was adding a volatile
> modifier to APR_RING_HEAD, as in:
> 
> #define APR_RING_HEAD(head, elem)                    \
>     struct head {                            \
>     struct elem * volatile next;                    \
>     struct elem * volatile prev;                    \
>     }
> 
> to better match APR_RING_ENTRY.

Probably a reasonable workaround.  Having 2 different constructs APR_RING_ENTRY and APR_RING_HEAD for the same object is strange in any case.
Comment 36 rguenther@suse.de 2010-12-01 10:12:21 UTC
On Tue, 30 Nov 2010, ebotcazou at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46488
> 
> --- Comment #35 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2010-11-30 23:25:57 UTC ---
> > I said in my comment that adding temporaries could correct the issue,
> > but they do not. I had some different test trees, and reviewing them,
> > what actually "corrected" in one of the builds was adding a volatile
> > modifier to APR_RING_HEAD, as in:
> > 
> > #define APR_RING_HEAD(head, elem)                    \
> >     struct head {                            \
> >     struct elem * volatile next;                    \
> >     struct elem * volatile prev;                    \
> >     }
> > 
> > to better match APR_RING_ENTRY.
> 
> Probably a reasonable workaround.  Having 2 different constructs APR_RING_ENTRY
> and APR_RING_HEAD for the same object is strange in any case.

If you'd re-use the same type the issue would go away as well (I think).

Richard.
Comment 37 Eric Botcazou 2010-12-01 12:15:02 UTC
> If you'd re-use the same type the issue would go away as well (I think).

Yes, this works for the testcase, but probably only because the problematic fields are now volatile too (APR_RING_ENTRY has volatile fields, APR_RING_HEAD doesn't).  The warning is still emitted for the more natural:

#define APR_RING_SENTINEL(hp, elem, link)				\
    (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))

with -Wstrict-aliasing so the underlying aliasing issue is probably still there.

Closing as "invalid" since the original SPARC issues were fixed elsewhere and the -O3 issue is an aliasing violation in the source code.
Comment 38 Paulo César Pereira de Andrade 2010-12-01 18:23:36 UTC
(In reply to comment #37)
> > If you'd re-use the same type the issue would go away as well (I think).
> 
> Yes, this works for the testcase, but probably only because the problematic
> fields are now volatile too (APR_RING_ENTRY has volatile fields, APR_RING_HEAD
> doesn't).  The warning is still emitted for the more natural:
> 
> #define APR_RING_SENTINEL(hp, elem, link)                \
>     (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> 
> with -Wstrict-aliasing so the underlying aliasing issue is probably still
> there.
> 
> Closing as "invalid" since the original SPARC issues were fixed elsewhere and
> the -O3 issue is an aliasing violation in the source code.

  Maybe this should this be considered a problem in the sense that it
does not warn with -Wstrict-aliasing ? As in this case, httpd source
was patched to "correct" the warning, but the problem remained.
  I mean, otherwise it is a kind of "please use -fno-strict-aliasing"
for everything you do not want to risk having incorrect code generation.
Comment 39 Paulo César Pereira de Andrade 2010-12-01 18:25:50 UTC
(In reply to comment #38)
> (In reply to comment #37)
> > > If you'd re-use the same type the issue would go away as well (I think).
> > 
> > Yes, this works for the testcase, but probably only because the problematic
> > fields are now volatile too (APR_RING_ENTRY has volatile fields, APR_RING_HEAD
> > doesn't).  The warning is still emitted for the more natural:
> > 
> > #define APR_RING_SENTINEL(hp, elem, link)                \
> >     (struct elem *)((char *)(hp) - APR_OFFSETOF(struct elem, link))
> > 
> > with -Wstrict-aliasing so the underlying aliasing issue is probably still
> > there.
> > 
> > Closing as "invalid" since the original SPARC issues were fixed elsewhere and
> > the -O3 issue is an aliasing violation in the source code.
> 
>   Maybe this should this be considered a problem in the sense that it
> does not warn with -Wstrict-aliasing ? As in this case, httpd source
> was patched to "correct" the warning, but the problem remained.
>   I mean, otherwise it is a kind of "please use -fno-strict-aliasing"
> for everything you do not want to risk having incorrect code generation.

Damn, sorry for quickly replying to myself. It warns with both
-Wstrict-aliasing=1 and -Wstrict-aliasing=2
Comment 40 Eric Botcazou 2010-12-01 18:55:28 UTC
>   Maybe this should this be considered a problem in the sense that it
> does not warn with -Wstrict-aliasing ? As in this case, httpd source
> was patched to "correct" the warning, but the problem remained.

The level 3 warning isn't guaranteed to catch all the cases, especially when one explicitly obfuscates the issue. :-)  But, yes, worth taking a look.
Comment 41 Eric Botcazou 2010-12-01 21:53:23 UTC
The warning machinery is faced with:

  (struct apr_bucket *) &a->list.next

It doesn't warn because, although &a->list has a different type than struct apr_bucket, its field 'next' has the same type as the 'next' sub-field of the 'link' field of struct apr_bucket.  Too much sophistication for the machinery.
One would probably need to mimic the disambiguation rules of tree-ssa-alias.c.

Richard, any idea?
Comment 42 Richard Biener 2010-12-01 22:14:23 UTC
(In reply to comment #41)
> The warning machinery is faced with:
> 
>   (struct apr_bucket *) &a->list.next
> 
> It doesn't warn because, although &a->list has a different type than struct
> apr_bucket, its field 'next' has the same type as the 'next' sub-field of the
> 'link' field of struct apr_bucket.  Too much sophistication for the machinery.
> One would probably need to mimic the disambiguation rules of tree-ssa-alias.c.
> 
> Richard, any idea?

The frontend machinery uses get_alias_set on types but it should use it
on the references (but it also warns at places that are not memory references
but just pointer casts, which of course will result in false positives and
us not being able to do real analysis).

I am repeatedly about to remove all but the memory-reference cases from
the frontend warning code as with the many false positives people only
learn how to circumvent it via casts which is never a real fix.

If you want to experiment try using get_alias_set on the reference
or get_deref_alias_set on the ADDR_EXPRs.

The warn_strict_aliasing == 2 && !alias_sets_must_conflict_p check doesn't
make sense at all btw. and the alias_sets_conflict_p check should really
be set1 != set2 && !alias_set_subset_of (set2, set1).

The interface of strict_aliasing_warning is also not good.  Callers seem
to check the warning level dependent on if they are in a dereference or
just a casting context, instead they should pass a flag.  And really
the frontend should only warn about the dereference case.

We can do a warning in the middle-end whenever we disambiguate using
TBAA and could not do with only points-to info.   We could restrict this
to type-punned pointers if we can mark them somehow.  But again this
would result in very many false positives.  Warning about TBAA
violations is hard - you have to be better at alias-analysis than
the optimizer (which otherwise would see the alias and not optimize).