Bug 82224

Summary: Strict-aliasing not noticing valid aliasing of two unions with active members
Product: gcc Reporter: Melissa <myriachan>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: ASSIGNED ---    
Severity: normal CC: dimhen, dushistov, fw, sjames, victor.dyachenko
Priority: P3 Keywords: alias, wrong-code
Version: 7.2.0   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2017-09-18 00:00:00

Description Melissa 2017-09-15 20:29:44 UTC
Consider the following C/C++ code with -O3 -fstrict-aliasing:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}

int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
  if (read_s1x(&p1->v1))
  {
    unsigned short temp;
    temp = p3->v1.x;
    p3->v2.x = temp;
    write_s2x(&p2->v2,1234);
    temp = p3->v2.x;
    p3->v1.x = temp;
  }
  return read_s1x(&p1->v1);
}
int test2(int x)
{
  union s1s2 q[2];
  q->v1.x = 4321;
  return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
  printf("%d\n",test2(0));
}

GCC (and Clang) generate code that outputs 4321 instead of the expected 1234.

I don't really understand things in terms of the C standard, but in terms of the C++ standard, it seems as if GCC and Clang are incorrect, and this code is well-defined.  (The output is 4321 in both C and C++ mode.)

According to [class.union]/5 in the C++17 draft N4659, the assignment expression "p3->v2.x = temp;" changes the active member of the union.  It's done through a union member access expression.  Thus the pointer &p2->v2 is valid here.

Even if I switch this to "p3->v2 = { x };", avoiding the nested case, the problem still happens.

Even if I explicitly change the active member of the union with placement new as "new(&p3.v2) s2;", the problem still happens.

Is it possible that Clang doesn't see the possibility that p2 and p3 point to the same object?

Clang cross-reference: https://bugs.llvm.org/show_bug.cgi?id=34632
Comment 1 Andrew Pinski 2017-09-15 21:25:42 UTC
See PR 14319 which I think this is a dup of.
Comment 2 Andrew Pinski 2017-09-15 21:26:55 UTC
And PR 65892.
Comment 3 Melissa 2017-09-15 21:59:35 UTC
(In reply to Andrew Pinski from comment #1)
> See PR 14319 which I think this is a dup of.

PR 14319 refers to a case in which you are allowed to read the common prefix of a structure when the structure is not the active member of the union.

In this report, however, it is never the case that an inactive member is accessed.  Instead, it appears that GCC isn't aware of aliasing of two pointers when it comes to tracking which is the active member.

Melissa
Comment 4 Richard Biener 2017-09-18 10:07:51 UTC
It's more related to PR77745.  With the testcase we are still removing the
stores that change the active union member.  They do have the same alias set
unfortunately given they use union s1s2 * in their access path.

The IL that we then up optimizing is

  <bb 2> [100.00%] [count: INV]:
  q[0].v1.x = 4321;
...
  _3 = &q + _2;
  _14 = MEM[(struct s1 *)&q].x;
...
  if (_15 != 0)
    goto <bb 3>; [33.00%] [count: INV]
  else
    goto <bb 4>; [67.00%] [count: INV]

  <bb 3> [33.00%] [count: INV]:
  MEM[(struct s2 *)_3].x = 1234;

  <bb 4> [100.00%] [count: INV]:
  _17 = MEM[(struct s1 *)&q].x;
...

where you can see all loads/stores being done through s1/s2 types which do
_not_ alias.

So we have

 q->v1.x = 4321; // make v1 active
 if ((&q->v1)->x) // read via pointer to active member, supposedly valid
   {
     q->v2.x = q->v1.x; // change v2 to be active, same value, optimized away
     (&q->v2)->x = 1234; // write via pointer to active member
     q->v1.x = q->v2.x; // change v1 to be active, same value, optimized away
   }
 if ((&q->v1)->x != 1234)
   abort ();

if we do not DSE the active member changing stores everything works as
desired.

Note that for GCC it isn't necessary to go through the union type when
changing the active type.  Note it definitely is desirable to elide
the load/store changing the active member in the end.

It's going to be tricky to find a solution that doesn't pessimize optimization
when unions are involved.  DOM is similarly affected, in the end it needs
sufficiently obfuscated IL to trick our "handle must-aliases conservatively"
code enough.
Comment 5 Melissa 2017-09-25 20:22:10 UTC
This originated from a Stack Overflow post "supercat" made (I'm the "Myria" there).

https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/
Comment 6 Alexander Cherepanov 2017-10-23 07:21:40 UTC
Here are simplified testcases. With a union (C and C++):

----------------------------------------------------------------------
#include <stdio.h>

union u {
  long x;
  long long y;
};

static long test(long *px, long long *py, union u *pu)
{
  pu->x = 0;            // make .x active member (for C++)
  *px = 0;              // access .x via pointer

  pu->y = pu->x;        // switch active member to .y (for C++)
  *py = 1;              // access .y via pointer

  pu->x = pu->y;        // switch active member back to .x
  return *px;           // access .x via pointer
}

int main(void)
{
  union u u;

  printf("%ld\n", test(&u.x, &u.y, &u));
}
----------------------------------------------------------------------

Results:

----------------------------------------------------------------------
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
1
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
0
----------------------------------------------------------------------

And with allocated memory (C; add placement new's for C++):

----------------------------------------------------------------------
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static long test(long *px, long long *py, void *pu)
{
  *px = 0;
  *py = 1;

  // change effective type from long long to long
  long tmp;
  memcpy(&tmp, pu, sizeof(tmp));
  memcpy(pu, &tmp, sizeof(tmp));

  return *px;
}

int main(void)
{
  void *p = malloc(10);

  printf("%ld\n", test(p, p, p));
}
----------------------------------------------------------------------

Results:

----------------------------------------------------------------------
$ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
1
$ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
0
----------------------------------------------------------------------

gcc version: gcc (GCC) 8.0.0 20171023 (experimental)
Comment 7 Richard Biener 2017-10-27 09:41:14 UTC
(In reply to Alexander Cherepanov from comment #6) 
> And with allocated memory (C; add placement new's for C++):
> 
> ----------------------------------------------------------------------
> #include <stdlib.h>
> #include <string.h>
> #include <stdio.h>
> 
> static long test(long *px, long long *py, void *pu)
> {
>   *px = 0;
>   *py = 1;
> 
>   // change effective type from long long to long
>   long tmp;
>   memcpy(&tmp, pu, sizeof(tmp));
>   memcpy(pu, &tmp, sizeof(tmp));

I believe this one is invalid - memcpy transfers the dynamic
type and *pu is currently 'long long'.  So it's either not
changing the dynamic type because, well, the type transfers
through 'tmp' or you are accessing 'tmp' with declared type
long as 'long long'.

>   return *px;
> }
> 
> int main(void)
> {
>   void *p = malloc(10);
> 
>   printf("%ld\n", test(p, p, p));
> }
> ----------------------------------------------------------------------
> 
> Results:
> 
> ----------------------------------------------------------------------
> $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out
> 1
> $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out
> 0
> ----------------------------------------------------------------------
> 
> gcc version: gcc (GCC) 8.0.0 20171023 (experimental)
Comment 8 Alexander Cherepanov 2017-10-27 10:15:13 UTC
On 2017-10-27 12:41, rguenth at gcc dot gnu.org wrote:
>> And with allocated memory (C; add placement new's for C++):
>>
>> ----------------------------------------------------------------------
>> #include <stdlib.h>
>> #include <string.h>
>> #include <stdio.h>
>>
>> static long test(long *px, long long *py, void *pu)
>> {
>>    *px = 0;
>>    *py = 1;
>>
>>    // change effective type from long long to long
>>    long tmp;
>>    memcpy(&tmp, pu, sizeof(tmp));
>>    memcpy(pu, &tmp, sizeof(tmp));
> 
> I believe this one is invalid - memcpy transfers the dynamic
> type and *pu is currently 'long long'.  So it's either not
> changing the dynamic type because, well, the type transfers
> through 'tmp' or you are accessing 'tmp' with declared type
> long as 'long long'.
AIUI memcpy is always valid if there is enough space, no matter what 
types its source and destination have. It accesses both of them as 
arrays of chars -- C11, 7.24.1p1: "The header <string.h> declares one 
type and several functions, and defines one  macro useful for 
manipulating arrays of character type and other objects treated as 
arrays of character type."

OTOH memcpy transfers effective type from the source to the destination 
but only if the destination has no declared type -- C11, 6.5p6: "If a 
value is copied into an object having no declared type using memcpy or 
memmove, or is copied as an array of character type, then the effective 
type of the modified object for that access and for subsequent accesses 
that do not modify the value is the effective type of the object from 
which the value is copied, if it has one."
Placement new in C++ can change dynamic types of objects with declared 
type but I don't think there are such facilities in C.

So in this example the first memcpy copies the value of *pu to tmp but 
drops its effective type (long long) and the second memcpy copies the 
value and passes the effective type of 'tmp' (long) along.
Comment 9 Richard Biener 2017-10-27 10:41:29 UTC
One complication when deciding whether the downstream uses are fine is
that we assign the same alias set to union accesses u->x and u->y.

That said, the union read/write pairs we remove essentially act as
an optimization barrier - they "union" the alias sets of all components
of the unions (not only of those accessed).  I think that's what other
readings of the standard ask for (there's a bug about this as well).
A mere declaration of a union needs to union the alias-sets and the
following should be valid:

union { long l; long long ll; };

long foo (long *p)
{
  *p = 0;
  *(long long *)p = 1;
  return *p;
}

this means (from an implementation perspective) that iff we do not
want to go this far then accessing a union member via a non-union
type isn't valid.

Anyway ;)  I don't see an easy way to fix the testcases in this PR
without removing the optimization entirely.  It's currently guarded
like

          /* If the TBAA state isn't the same for downstream reads
             we cannot value-number the VDEFs the same.  */
              alias_set_type set = get_alias_set (lhs);
              if (! vnresult
                  || vnresult->set == set
                  || alias_set_subset_of (set, vnresult->set))

but we don't know whether vnresult is from a load or a store so we
don't know whether it fully specifies the "TBAA state".  So even
dumbing that down to

          if (vnresult->set != set
              || set != get_alias_set (TREE_TYPE (lhs)))
            resultsame = false;

where the latter test sees if the alias set for the access and for an
access using a pointer is the same might keep some holes open.

Some of the more interesting cases _did_ involve unions and the above
would likely make them not optimized again (I remember sth with vector
intrinsics and initialization / copying via unions).  Oh, even more
trivial union->x = union->x; will stop from being optimized.

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c        (revision 254135)
+++ gcc/tree-ssa-sccvn.c        (working copy)
@@ -3800,11 +3800,11 @@ visit_reference_op_store (tree lhs, tree
       resultsame = expressions_equal_p (result, op);
       if (resultsame)
        {
-         /* If the TBAA state isn't compatible for downstream reads
+         /* If the TBAA state isn't the same for downstream reads
             we cannot value-number the VDEFs the same.  */
          alias_set_type set = get_alias_set (lhs);
          if (vnresult->set != set
-             && ! alias_set_subset_of (set, vnresult->set))
+             || set != get_alias_set (TREE_TYPE (lhs)))
            resultsame = false;
        }
     }
@@ -5628,9 +5628,9 @@ eliminate_dom_walker::before_dom_childre
                 at least all accesses the later one does or if the store
                 was to readonly memory storing the same value.  */
              alias_set_type set = get_alias_set (lhs);
-             if (! vnresult
-                 || vnresult->set == set
-                 || alias_set_subset_of (set, vnresult->set))
+             if (vnresult
+                 && vnresult->set == set
+                 && set == get_alias_set (TREE_TYPE (lhs)))
                {
                  if (dump_file && (dump_flags & TDF_DETAILS))
                    {
Index: gcc/tree-ssa-dom.c
===================================================================
--- gcc/tree-ssa-dom.c  (revision 254135)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.
 #include "gimplify.h"
 #include "tree-cfgcleanup.h"
 #include "dbgcnt.h"
+#include "alias.h"
 
 /* This file implements optimizations on the dominator tree.  */
 
@@ -1953,7 +1954,8 @@ dom_opt_dom_walker::optimize_stmt (basic
          gimple_set_vuse (new_stmt, gimple_vuse (stmt));
          cached_lhs = m_avail_exprs_stack->lookup_avail_expr (new_stmt, false,
                                                               false);
-         if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0))
+         if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0)
+             && get_alias_set (lhs) == get_alias_set (TREE_TYPE (lhs)))
            {
              basic_block bb = gimple_bb (stmt);
              unlink_stmt_vdef (stmt);

FAILs at least gcc.dg/tree-ssa/ssa-pre-26.c

as said I'm not 100% convinced the more strict handling avoids all of the
cases.
Comment 10 Richard Biener 2017-10-27 10:43:05 UTC
(In reply to Alexander Cherepanov from comment #8)
> On 2017-10-27 12:41, rguenth at gcc dot gnu.org wrote:
> >> And with allocated memory (C; add placement new's for C++):
> >>
> >> ----------------------------------------------------------------------
> >> #include <stdlib.h>
> >> #include <string.h>
> >> #include <stdio.h>
> >>
> >> static long test(long *px, long long *py, void *pu)
> >> {
> >>    *px = 0;
> >>    *py = 1;
> >>
> >>    // change effective type from long long to long
> >>    long tmp;
> >>    memcpy(&tmp, pu, sizeof(tmp));
> >>    memcpy(pu, &tmp, sizeof(tmp));
> > 
> > I believe this one is invalid - memcpy transfers the dynamic
> > type and *pu is currently 'long long'.  So it's either not
> > changing the dynamic type because, well, the type transfers
> > through 'tmp' or you are accessing 'tmp' with declared type
> > long as 'long long'.
> AIUI memcpy is always valid if there is enough space, no matter what 
> types its source and destination have. It accesses both of them as 
> arrays of chars -- C11, 7.24.1p1: "The header <string.h> declares one 
> type and several functions, and defines one  macro useful for 
> manipulating arrays of character type and other objects treated as 
> arrays of character type."
> 
> OTOH memcpy transfers effective type from the source to the destination 
> but only if the destination has no declared type -- C11, 6.5p6: "If a 
> value is copied into an object having no declared type using memcpy or 
> memmove, or is copied as an array of character type, then the effective 
> type of the modified object for that access and for subsequent accesses 
> that do not modify the value is the effective type of the object from 
> which the value is copied, if it has one."
> Placement new in C++ can change dynamic types of objects with declared 
> type but I don't think there are such facilities in C.
> 
> So in this example the first memcpy copies the value of *pu to tmp but 
> drops its effective type (long long) and the second memcpy copies the 
> value and passes the effective type of 'tmp' (long) along.

Hmm, ok.  For GCC internals the memcpy changes the dynamic type to
"anything" (it uses alias-set zero reads/writes) so for GCC internals
the testcase is valid in that the later read needs to consider the
memcpy store as aliasing.
Comment 11 Alexander Cherepanov 2017-10-28 21:11:06 UTC
When I was converting the original testcase to malloc I started with a 
simple assignment with a cast. Clang optimized it as well (and I posted 
it to the clang bug) bug gcc didn't. Essentially this example:

void f(long long *p)
{
   *(long *)p = *p;
}

tree optimization turns to

f (long long int * p)
{
   long long int _1;

   <bb 2> [100.00%] [count: INV]:
   _1 = *p_3(D);
   MEM[(long int *)p_3(D)] = _1;
   return;

}

The same happens with "new (p) long (*p);". So the question: why is 
that? If this result is intended then perhaps memcpys and assignment of 
union members from this bug report could be converted by gcc to the same 
form? It would solve the problem.
Comment 12 Richard Biener 2017-10-30 10:30:36 UTC
(In reply to Richard Biener from comment #9)
> One complication when deciding whether the downstream uses are fine is
> that we assign the same alias set to union accesses u->x and u->y.
> 
> That said, the union read/write pairs we remove essentially act as
> an optimization barrier - they "union" the alias sets of all components
> of the unions (not only of those accessed).  I think that's what other
> readings of the standard ask for (there's a bug about this as well).
> A mere declaration of a union needs to union the alias-sets and the
> following should be valid:
> 
> union { long l; long long ll; };
> 
> long foo (long *p)
> {
>   *p = 0;
>   *(long long *)p = 1;
>   return *p;
> }
> 
> this means (from an implementation perspective) that iff we do not
> want to go this far then accessing a union member via a non-union
> type isn't valid.
> 
> Anyway ;)  I don't see an easy way to fix the testcases in this PR
> without removing the optimization entirely.  It's currently guarded
> like
> 
>           /* If the TBAA state isn't the same for downstream reads
>              we cannot value-number the VDEFs the same.  */
>               alias_set_type set = get_alias_set (lhs);
>               if (! vnresult
>                   || vnresult->set == set
>                   || alias_set_subset_of (set, vnresult->set))
> 
> but we don't know whether vnresult is from a load or a store so we
> don't know whether it fully specifies the "TBAA state".  So even
> dumbing that down to
> 
>           if (vnresult->set != set
>               || set != get_alias_set (TREE_TYPE (lhs)))
>             resultsame = false;
> 
> where the latter test sees if the alias set for the access and for an
> access using a pointer is the same might keep some holes open.
> 
> Some of the more interesting cases _did_ involve unions and the above
> would likely make them not optimized again (I remember sth with vector
> intrinsics and initialization / copying via unions).  Oh, even more
> trivial union->x = union->x; will stop from being optimized.
> 
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        (revision 254135)
> +++ gcc/tree-ssa-sccvn.c        (working copy)
> @@ -3800,11 +3800,11 @@ visit_reference_op_store (tree lhs, tree
>        resultsame = expressions_equal_p (result, op);
>        if (resultsame)
>         {
> -         /* If the TBAA state isn't compatible for downstream reads
> +         /* If the TBAA state isn't the same for downstream reads
>              we cannot value-number the VDEFs the same.  */
>           alias_set_type set = get_alias_set (lhs);
>           if (vnresult->set != set
> -             && ! alias_set_subset_of (set, vnresult->set))
> +             || set != get_alias_set (TREE_TYPE (lhs)))
>             resultsame = false;
>         }
>      }
> @@ -5628,9 +5628,9 @@ eliminate_dom_walker::before_dom_childre
>                  at least all accesses the later one does or if the store
>                  was to readonly memory storing the same value.  */
>               alias_set_type set = get_alias_set (lhs);
> -             if (! vnresult
> -                 || vnresult->set == set
> -                 || alias_set_subset_of (set, vnresult->set))
> +             if (vnresult
> +                 && vnresult->set == set
> +                 && set == get_alias_set (TREE_TYPE (lhs)))
>                 {
>                   if (dump_file && (dump_flags & TDF_DETAILS))
>                     {
> Index: gcc/tree-ssa-dom.c
> ===================================================================
> --- gcc/tree-ssa-dom.c  (revision 254135)
> +++ gcc/tree-ssa-dom.c  (working copy)
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.
>  #include "gimplify.h"
>  #include "tree-cfgcleanup.h"
>  #include "dbgcnt.h"
> +#include "alias.h"
>  
>  /* This file implements optimizations on the dominator tree.  */
>  
> @@ -1953,7 +1954,8 @@ dom_opt_dom_walker::optimize_stmt (basic
>           gimple_set_vuse (new_stmt, gimple_vuse (stmt));
>           cached_lhs = m_avail_exprs_stack->lookup_avail_expr (new_stmt,
> false,
>                                                                false);
> -         if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0))
> +         if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0)
> +             && get_alias_set (lhs) == get_alias_set (TREE_TYPE (lhs)))
>             {
>               basic_block bb = gimple_bb (stmt);
>               unlink_stmt_vdef (stmt);
> 
> FAILs at least gcc.dg/tree-ssa/ssa-pre-26.c

And

FAIL: gnat.dg/opt39.adb scan-tree-dump-times optimized "MEM" 1 (found 9 times)

otherwise clean on x86_64-unknown-linux-gnu.

> as said I'm not 100% convinced the more strict handling avoids all of the
> cases.
Comment 14 Richard Biener 2021-09-10 07:57:52 UTC
Oh, and related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224