Summary: | Strict-aliasing not noticing valid aliasing of two unions with active members | ||
---|---|---|---|
Product: | gcc | Reporter: | Melissa <myriachan> |
Component: | tree-optimization | Assignee: | Richard Biener <rguenth> |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | dimhen, dushistov, fw, sjames, victor.dyachenko |
Priority: | P3 | Keywords: | alias, wrong-code |
Version: | 7.2.0 | ||
Target Milestone: | --- | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2017-09-18 00:00:00 |
Description
Melissa
2017-09-15 20:29:44 UTC
(In reply to Andrew Pinski from comment #1) > See PR 14319 which I think this is a dup of. PR 14319 refers to a case in which you are allowed to read the common prefix of a structure when the structure is not the active member of the union. In this report, however, it is never the case that an inactive member is accessed. Instead, it appears that GCC isn't aware of aliasing of two pointers when it comes to tracking which is the active member. Melissa It's more related to PR77745. With the testcase we are still removing the stores that change the active union member. They do have the same alias set unfortunately given they use union s1s2 * in their access path. The IL that we then up optimizing is <bb 2> [100.00%] [count: INV]: q[0].v1.x = 4321; ... _3 = &q + _2; _14 = MEM[(struct s1 *)&q].x; ... if (_15 != 0) goto <bb 3>; [33.00%] [count: INV] else goto <bb 4>; [67.00%] [count: INV] <bb 3> [33.00%] [count: INV]: MEM[(struct s2 *)_3].x = 1234; <bb 4> [100.00%] [count: INV]: _17 = MEM[(struct s1 *)&q].x; ... where you can see all loads/stores being done through s1/s2 types which do _not_ alias. So we have q->v1.x = 4321; // make v1 active if ((&q->v1)->x) // read via pointer to active member, supposedly valid { q->v2.x = q->v1.x; // change v2 to be active, same value, optimized away (&q->v2)->x = 1234; // write via pointer to active member q->v1.x = q->v2.x; // change v1 to be active, same value, optimized away } if ((&q->v1)->x != 1234) abort (); if we do not DSE the active member changing stores everything works as desired. Note that for GCC it isn't necessary to go through the union type when changing the active type. Note it definitely is desirable to elide the load/store changing the active member in the end. It's going to be tricky to find a solution that doesn't pessimize optimization when unions are involved. DOM is similarly affected, in the end it needs sufficiently obfuscated IL to trick our "handle must-aliases conservatively" code enough. This originated from a Stack Overflow post "supercat" made (I'm the "Myria" there). https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/ Here are simplified testcases. With a union (C and C++): ---------------------------------------------------------------------- #include <stdio.h> union u { long x; long long y; }; static long test(long *px, long long *py, union u *pu) { pu->x = 0; // make .x active member (for C++) *px = 0; // access .x via pointer pu->y = pu->x; // switch active member to .y (for C++) *py = 1; // access .y via pointer pu->x = pu->y; // switch active member back to .x return *px; // access .x via pointer } int main(void) { union u u; printf("%ld\n", test(&u.x, &u.y, &u)); } ---------------------------------------------------------------------- Results: ---------------------------------------------------------------------- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 1 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 0 ---------------------------------------------------------------------- And with allocated memory (C; add placement new's for C++): ---------------------------------------------------------------------- #include <stdlib.h> #include <string.h> #include <stdio.h> static long test(long *px, long long *py, void *pu) { *px = 0; *py = 1; // change effective type from long long to long long tmp; memcpy(&tmp, pu, sizeof(tmp)); memcpy(pu, &tmp, sizeof(tmp)); return *px; } int main(void) { void *p = malloc(10); printf("%ld\n", test(p, p, p)); } ---------------------------------------------------------------------- Results: ---------------------------------------------------------------------- $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out 1 $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out 0 ---------------------------------------------------------------------- gcc version: gcc (GCC) 8.0.0 20171023 (experimental) (In reply to Alexander Cherepanov from comment #6) > And with allocated memory (C; add placement new's for C++): > > ---------------------------------------------------------------------- > #include <stdlib.h> > #include <string.h> > #include <stdio.h> > > static long test(long *px, long long *py, void *pu) > { > *px = 0; > *py = 1; > > // change effective type from long long to long > long tmp; > memcpy(&tmp, pu, sizeof(tmp)); > memcpy(pu, &tmp, sizeof(tmp)); I believe this one is invalid - memcpy transfers the dynamic type and *pu is currently 'long long'. So it's either not changing the dynamic type because, well, the type transfers through 'tmp' or you are accessing 'tmp' with declared type long as 'long long'. > return *px; > } > > int main(void) > { > void *p = malloc(10); > > printf("%ld\n", test(p, p, p)); > } > ---------------------------------------------------------------------- > > Results: > > ---------------------------------------------------------------------- > $ gcc -std=c11 -pedantic -Wall -Wextra test.c && ./a.out > 1 > $ gcc -std=c11 -pedantic -Wall -Wextra -O3 test.c && ./a.out > 0 > ---------------------------------------------------------------------- > > gcc version: gcc (GCC) 8.0.0 20171023 (experimental) On 2017-10-27 12:41, rguenth at gcc dot gnu.org wrote:
>> And with allocated memory (C; add placement new's for C++):
>>
>> ----------------------------------------------------------------------
>> #include <stdlib.h>
>> #include <string.h>
>> #include <stdio.h>
>>
>> static long test(long *px, long long *py, void *pu)
>> {
>> *px = 0;
>> *py = 1;
>>
>> // change effective type from long long to long
>> long tmp;
>> memcpy(&tmp, pu, sizeof(tmp));
>> memcpy(pu, &tmp, sizeof(tmp));
>
> I believe this one is invalid - memcpy transfers the dynamic
> type and *pu is currently 'long long'. So it's either not
> changing the dynamic type because, well, the type transfers
> through 'tmp' or you are accessing 'tmp' with declared type
> long as 'long long'.
AIUI memcpy is always valid if there is enough space, no matter what
types its source and destination have. It accesses both of them as
arrays of chars -- C11, 7.24.1p1: "The header <string.h> declares one
type and several functions, and defines one macro useful for
manipulating arrays of character type and other objects treated as
arrays of character type."
OTOH memcpy transfers effective type from the source to the destination
but only if the destination has no declared type -- C11, 6.5p6: "If a
value is copied into an object having no declared type using memcpy or
memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one."
Placement new in C++ can change dynamic types of objects with declared
type but I don't think there are such facilities in C.
So in this example the first memcpy copies the value of *pu to tmp but
drops its effective type (long long) and the second memcpy copies the
value and passes the effective type of 'tmp' (long) along.
One complication when deciding whether the downstream uses are fine is that we assign the same alias set to union accesses u->x and u->y. That said, the union read/write pairs we remove essentially act as an optimization barrier - they "union" the alias sets of all components of the unions (not only of those accessed). I think that's what other readings of the standard ask for (there's a bug about this as well). A mere declaration of a union needs to union the alias-sets and the following should be valid: union { long l; long long ll; }; long foo (long *p) { *p = 0; *(long long *)p = 1; return *p; } this means (from an implementation perspective) that iff we do not want to go this far then accessing a union member via a non-union type isn't valid. Anyway ;) I don't see an easy way to fix the testcases in this PR without removing the optimization entirely. It's currently guarded like /* If the TBAA state isn't the same for downstream reads we cannot value-number the VDEFs the same. */ alias_set_type set = get_alias_set (lhs); if (! vnresult || vnresult->set == set || alias_set_subset_of (set, vnresult->set)) but we don't know whether vnresult is from a load or a store so we don't know whether it fully specifies the "TBAA state". So even dumbing that down to if (vnresult->set != set || set != get_alias_set (TREE_TYPE (lhs))) resultsame = false; where the latter test sees if the alias set for the access and for an access using a pointer is the same might keep some holes open. Some of the more interesting cases _did_ involve unions and the above would likely make them not optimized again (I remember sth with vector intrinsics and initialization / copying via unions). Oh, even more trivial union->x = union->x; will stop from being optimized. Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c (revision 254135) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -3800,11 +3800,11 @@ visit_reference_op_store (tree lhs, tree resultsame = expressions_equal_p (result, op); if (resultsame) { - /* If the TBAA state isn't compatible for downstream reads + /* If the TBAA state isn't the same for downstream reads we cannot value-number the VDEFs the same. */ alias_set_type set = get_alias_set (lhs); if (vnresult->set != set - && ! alias_set_subset_of (set, vnresult->set)) + || set != get_alias_set (TREE_TYPE (lhs))) resultsame = false; } } @@ -5628,9 +5628,9 @@ eliminate_dom_walker::before_dom_childre at least all accesses the later one does or if the store was to readonly memory storing the same value. */ alias_set_type set = get_alias_set (lhs); - if (! vnresult - || vnresult->set == set - || alias_set_subset_of (set, vnresult->set)) + if (vnresult + && vnresult->set == set + && set == get_alias_set (TREE_TYPE (lhs))) { if (dump_file && (dump_flags & TDF_DETAILS)) { Index: gcc/tree-ssa-dom.c =================================================================== --- gcc/tree-ssa-dom.c (revision 254135) +++ gcc/tree-ssa-dom.c (working copy) @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. #include "gimplify.h" #include "tree-cfgcleanup.h" #include "dbgcnt.h" +#include "alias.h" /* This file implements optimizations on the dominator tree. */ @@ -1953,7 +1954,8 @@ dom_opt_dom_walker::optimize_stmt (basic gimple_set_vuse (new_stmt, gimple_vuse (stmt)); cached_lhs = m_avail_exprs_stack->lookup_avail_expr (new_stmt, false, false); - if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0)) + if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0) + && get_alias_set (lhs) == get_alias_set (TREE_TYPE (lhs))) { basic_block bb = gimple_bb (stmt); unlink_stmt_vdef (stmt); FAILs at least gcc.dg/tree-ssa/ssa-pre-26.c as said I'm not 100% convinced the more strict handling avoids all of the cases. (In reply to Alexander Cherepanov from comment #8) > On 2017-10-27 12:41, rguenth at gcc dot gnu.org wrote: > >> And with allocated memory (C; add placement new's for C++): > >> > >> ---------------------------------------------------------------------- > >> #include <stdlib.h> > >> #include <string.h> > >> #include <stdio.h> > >> > >> static long test(long *px, long long *py, void *pu) > >> { > >> *px = 0; > >> *py = 1; > >> > >> // change effective type from long long to long > >> long tmp; > >> memcpy(&tmp, pu, sizeof(tmp)); > >> memcpy(pu, &tmp, sizeof(tmp)); > > > > I believe this one is invalid - memcpy transfers the dynamic > > type and *pu is currently 'long long'. So it's either not > > changing the dynamic type because, well, the type transfers > > through 'tmp' or you are accessing 'tmp' with declared type > > long as 'long long'. > AIUI memcpy is always valid if there is enough space, no matter what > types its source and destination have. It accesses both of them as > arrays of chars -- C11, 7.24.1p1: "The header <string.h> declares one > type and several functions, and defines one macro useful for > manipulating arrays of character type and other objects treated as > arrays of character type." > > OTOH memcpy transfers effective type from the source to the destination > but only if the destination has no declared type -- C11, 6.5p6: "If a > value is copied into an object having no declared type using memcpy or > memmove, or is copied as an array of character type, then the effective > type of the modified object for that access and for subsequent accesses > that do not modify the value is the effective type of the object from > which the value is copied, if it has one." > Placement new in C++ can change dynamic types of objects with declared > type but I don't think there are such facilities in C. > > So in this example the first memcpy copies the value of *pu to tmp but > drops its effective type (long long) and the second memcpy copies the > value and passes the effective type of 'tmp' (long) along. Hmm, ok. For GCC internals the memcpy changes the dynamic type to "anything" (it uses alias-set zero reads/writes) so for GCC internals the testcase is valid in that the later read needs to consider the memcpy store as aliasing. When I was converting the original testcase to malloc I started with a simple assignment with a cast. Clang optimized it as well (and I posted it to the clang bug) bug gcc didn't. Essentially this example: void f(long long *p) { *(long *)p = *p; } tree optimization turns to f (long long int * p) { long long int _1; <bb 2> [100.00%] [count: INV]: _1 = *p_3(D); MEM[(long int *)p_3(D)] = _1; return; } The same happens with "new (p) long (*p);". So the question: why is that? If this result is intended then perhaps memcpys and assignment of union members from this bug report could be converted by gcc to the same form? It would solve the problem. (In reply to Richard Biener from comment #9) > One complication when deciding whether the downstream uses are fine is > that we assign the same alias set to union accesses u->x and u->y. > > That said, the union read/write pairs we remove essentially act as > an optimization barrier - they "union" the alias sets of all components > of the unions (not only of those accessed). I think that's what other > readings of the standard ask for (there's a bug about this as well). > A mere declaration of a union needs to union the alias-sets and the > following should be valid: > > union { long l; long long ll; }; > > long foo (long *p) > { > *p = 0; > *(long long *)p = 1; > return *p; > } > > this means (from an implementation perspective) that iff we do not > want to go this far then accessing a union member via a non-union > type isn't valid. > > Anyway ;) I don't see an easy way to fix the testcases in this PR > without removing the optimization entirely. It's currently guarded > like > > /* If the TBAA state isn't the same for downstream reads > we cannot value-number the VDEFs the same. */ > alias_set_type set = get_alias_set (lhs); > if (! vnresult > || vnresult->set == set > || alias_set_subset_of (set, vnresult->set)) > > but we don't know whether vnresult is from a load or a store so we > don't know whether it fully specifies the "TBAA state". So even > dumbing that down to > > if (vnresult->set != set > || set != get_alias_set (TREE_TYPE (lhs))) > resultsame = false; > > where the latter test sees if the alias set for the access and for an > access using a pointer is the same might keep some holes open. > > Some of the more interesting cases _did_ involve unions and the above > would likely make them not optimized again (I remember sth with vector > intrinsics and initialization / copying via unions). Oh, even more > trivial union->x = union->x; will stop from being optimized. > > Index: gcc/tree-ssa-sccvn.c > =================================================================== > --- gcc/tree-ssa-sccvn.c (revision 254135) > +++ gcc/tree-ssa-sccvn.c (working copy) > @@ -3800,11 +3800,11 @@ visit_reference_op_store (tree lhs, tree > resultsame = expressions_equal_p (result, op); > if (resultsame) > { > - /* If the TBAA state isn't compatible for downstream reads > + /* If the TBAA state isn't the same for downstream reads > we cannot value-number the VDEFs the same. */ > alias_set_type set = get_alias_set (lhs); > if (vnresult->set != set > - && ! alias_set_subset_of (set, vnresult->set)) > + || set != get_alias_set (TREE_TYPE (lhs))) > resultsame = false; > } > } > @@ -5628,9 +5628,9 @@ eliminate_dom_walker::before_dom_childre > at least all accesses the later one does or if the store > was to readonly memory storing the same value. */ > alias_set_type set = get_alias_set (lhs); > - if (! vnresult > - || vnresult->set == set > - || alias_set_subset_of (set, vnresult->set)) > + if (vnresult > + && vnresult->set == set > + && set == get_alias_set (TREE_TYPE (lhs))) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > { > Index: gcc/tree-ssa-dom.c > =================================================================== > --- gcc/tree-ssa-dom.c (revision 254135) > +++ gcc/tree-ssa-dom.c (working copy) > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. > #include "gimplify.h" > #include "tree-cfgcleanup.h" > #include "dbgcnt.h" > +#include "alias.h" > > /* This file implements optimizations on the dominator tree. */ > > @@ -1953,7 +1954,8 @@ dom_opt_dom_walker::optimize_stmt (basic > gimple_set_vuse (new_stmt, gimple_vuse (stmt)); > cached_lhs = m_avail_exprs_stack->lookup_avail_expr (new_stmt, > false, > false); > - if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0)) > + if (cached_lhs && operand_equal_p (rhs, cached_lhs, 0) > + && get_alias_set (lhs) == get_alias_set (TREE_TYPE (lhs))) > { > basic_block bb = gimple_bb (stmt); > unlink_stmt_vdef (stmt); > > FAILs at least gcc.dg/tree-ssa/ssa-pre-26.c And FAIL: gnat.dg/opt39.adb scan-tree-dump-times optimized "MEM" 1 (found 9 times) otherwise clean on x86_64-unknown-linux-gnu. > as said I'm not 100% convinced the more strict handling avoids all of the > cases. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1409.htm Oh, and related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224 |