The fix for PR14784 causes misscompilation of gcc in spec2006, at -O2. Some discussion of the problem is in PR14784, I am moving it to a separate bug report.
Created attachment 12523 [details] The testcase.
The problematic piece of .alias5 dump: p_27 = &fde_13->dw_fde_cfi; # VUSE <SMT.48_79>; D.1763_23 = fde_13->dw_fde_cfi; if (D.1763_23 != 0B) goto <L8>; else goto <L10>; # p_18 = PHI <p_30(9), p_27(8)>; <L10>:; # cie_cfi_head_82 = V_MAY_DEF <cie_cfi_head_73>; *p_18 = xcfi_22; The store to *p_18 and the load of fde_13->dw_fde_cfi obviously alias, however their virtual operands are disjoint. I am trying to find out why this happens.
access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch cie_cfi_head; the list of virtual operands of the load thus becomes empty, and we insert SMT.48 for it. On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion of SMT is formulated as ... ||none_added || (TREE_CODE (var) == SYMBOL_MEMORY_TAG && for_clobber && SMT_USED_ALONE (var))) and for_clobber is only true on call operands, we do not insert SMT. The lists of virtual operands thus become disjoint. Daniel, any idea how to fix this? I do not quite understand the SMT_USED_ALONE stuff. The condition above looks suspicious to me, especially the test for "for_clobber" -- why should we want to handle call virtual operands differently from any others? Obviously, removing the for_clobber test would fix this problem, but I am not really sure this would be the right solution.
Subject: Re: Misscompilation of spec2006 gcc > > ------- Comment #3 from rakdver at gcc dot gnu dot org 2006-11-01 00:49 ------- > access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch > cie_cfi_head; the list of virtual operands of the load thus becomes empty, and > we insert SMT.48 for it. > > On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion > of SMT is formulated as > > ... > ||none_added > || (TREE_CODE (var) == SYMBOL_MEMORY_TAG > && for_clobber > && SMT_USED_ALONE (var))) > > and for_clobber is only true on call operands, we do not insert SMT. The lists > of virtual operands thus become disjoint. We should not insert the SMT here. We are never supposed to use a bare SMT when it has aliases that we can use (IE something has not been pruned). In other words, we assume we can prune the same set of aliases regardless of where the access occurs. This *was* true before your patch. > > Daniel, any idea how to fix this? I do not quite understand the SMT_USED_ALONE > stuff. The condition above looks suspicious to me, especially the test for > "for_clobber" -- why should we want to handle call virtual operands differently > from any others? SMT_USED_ALONE is used to eliminate the need for SMT operands on calls when the only place the SMT is used is to be def'd at call sites (IE it has no real uses). The real problem here is that we can't get away with pruning some but not all accesses to the same variables in the current tag scheme. >Obviously, removing the for_clobber test would fix this > problem, but I am not really sure this would be the right solution. > It would not. The test does not exist in isolation, you'd have to remove the whole block for that |, and that wouldn't fix your problem. I will work around this problem by teaching PTA about casts from nonpointers to pointers, which will cause it to end up with a nonlocal var in the set. However, this is going to lose precision in this particular instance. Until our tagging system is based on load/store interference, and not variables this name can access, we will always run into these issues.
Subject: Re: Misscompilation of spec2006 gcc > > ------- Comment #3 from rakdver at gcc dot gnu dot org 2006-11-01 00:49 ------- > > access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch > > cie_cfi_head; the list of virtual operands of the load thus becomes empty, and > > we insert SMT.48 for it. > > > > On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion > > of SMT is formulated as > > > > ... > > ||none_added > > || (TREE_CODE (var) == SYMBOL_MEMORY_TAG > > && for_clobber > > && SMT_USED_ALONE (var))) > > > > and for_clobber is only true on call operands, we do not insert SMT. The lists > > of virtual operands thus become disjoint. > We should not insert the SMT here. > We are never supposed to use a bare SMT when it has aliases that we > can use (IE something has not been pruned). but fde_13->dw_fde_cfi does not have any other aliases that can be used, i.e., none_added is true. > I will work around this problem by teaching PTA about casts from > nonpointers to pointers, which will cause it to end up with a nonlocal > var in the set. ??? There is no cast from non-pointer to pointer in this testcase.
Subject: Re: Misscompilation of spec2006 gcc > > > > > > and for_clobber is only true on call operands, we do not insert SMT. The lists > > > of virtual operands thus become disjoint. > > We should not insert the SMT here. > > We are never supposed to use a bare SMT when it has aliases that we > > can use (IE something has not been pruned). > > but fde_13->dw_fde_cfi does not have any other aliases that can be used, > i.e., none_added is true. This is only true for one of the accesses, not both, or else the SMT would be used in both cases. :) > > > I will work around this problem by teaching PTA about casts from > > nonpointers to pointers, which will cause it to end up with a nonlocal > > var in the set. > > ??? There is no cast from non-pointer to pointer in this testcase. Actually, there is. That's why you end up with SMT's in the first place. # VUSE <fde_table_in_useD.1609_6>; fde_table_in_use.0D.1617_7 = fde_table_in_useD.1609; D.1618_8 = fde_table_in_use.0D.1617_7 * 24; D.1619_9 = (struct dw_fde_struct *) D.1618_8; This causes D.1619_9 to point to anything # VUSE <fde_tableD.1610_10>; fde_table.1D.1620_11 = fde_tableD.1610; D.1621_12 = D.1619_9 + fde_table.1D.1620_11; causes D.1621_12 (the &fde_table[fde_table_in_use - 1]) to point to anything. All of this is what causes SMT's to be used.
If I change the code from dw_fde_ref fde = &fde_table[fde_table_in_use - 1]; to dw_fde_node *fde = fde_table + fde_table_in_use - 1; I got the same problem.
This is more reason why we need a POINTER_PLUS_EXPR.
Created attachment 12529 [details] A run-time testcase Here is a run-time testcase: [hjl@gnu-25 yyy]$ /usr/gcc-bad/bin/gcc -O2 bad.c [hjl@gnu-25 yyy]$ ./a.out Aborted [hjl@gnu-25 yyy]$ /usr/gcc-bad/bin/gcc -O bad.c [hjl@gnu-25 yyy]$ ./a.out [hjl@gnu-25 yyy]$ /usr/gcc-good/bin/gcc -O2 bad.c [hjl@gnu-25 yyy]$ ./a.out [hjl@gnu-25 yyy]$
Subject: Re: Misscompilation of spec2006 gcc > > > I will work around this problem by teaching PTA about casts from > > > nonpointers to pointers, which will cause it to end up with a nonlocal > > > var in the set. > > > > ??? There is no cast from non-pointer to pointer in this testcase. > > Actually, there is. > That's why you end up with SMT's in the first place. > # VUSE <fde_table_in_useD.1609_6>; > fde_table_in_use.0D.1617_7 = fde_table_in_useD.1609; > D.1618_8 = fde_table_in_use.0D.1617_7 * 24; > D.1619_9 = (struct dw_fde_struct *) D.1618_8; You mean that whenever there is a pointer arithmetics other than adding constants, we end up with points-to anything? :-(
Created attachment 12530 [details] An updated run-time testcase This is smaller.
Created attachment 12547 [details] A testcase to show array reference is ok Gcc doesn't have a problem with array reference. That is if I change it from extern dw_fde_node *fde_table; to extern dw_fde_node fde_table []; I got the correct result.
(In reply to comment #12) > Created an attachment (id=12547) [edit] > A testcase to show array reference is ok > > Gcc doesn't have a problem with array reference. That is if I change it > from Yes because for arrays we keep ARRAY_REF around instead of lowering it to pointer arthematic. See PR 29708 for another testcase which goes funny because of casts in the IR.
I checked gcc 4.3. The same source code, which is miscompiled in gcc from SPEC CPU 2006, is there. It is most likely that gcc 4.3 is also miscompiled and now generating wrong unwind/debug info, if not wrong instructions.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc Zdenek, can you revert your patch until we fix this? It might be a month or two before i get back to it. (Yeah, i know it sucks to have to do this, but) On 6 Nov 2006 15:12:30 -0000, hjl at lucon dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #14 from hjl at lucon dot org 2006-11-06 15:12 ------- > I checked gcc 4.3. The same source code, which is miscompiled in gcc from > SPEC CPU 2006, is there. It is most likely that gcc 4.3 is also miscompiled > and now generating wrong unwind/debug info, if not wrong instructions. > > > -- > > hjl at lucon dot org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |NEW > Ever Confirmed|0 |1 > Last reconfirmed|0000-00-00 00:00:00 |2006-11-06 15:12:29 > date| | > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680 > >
I think we should add the testcase when the patch is reverted to prevent it from happening again.
> Zdenek, can you revert your patch until we fix this? > It might be a month or two before i get back to it. > > (Yeah, i know it sucks to have to do this, but) I am not sure whether that would be helpful, since the problem does not seem to be directly caused by the patch (it would be a bit more difficult to construct the testcase for the problem with my patch reverted, of course). I am playing with some ideas how to fix this, unless I come up with something soon, I will revert the patch (except for the testcase that I would like to remain in the testsuite).
(In reply to comment #17) > > Zdenek, can you revert your patch until we fix this? > > It might be a month or two before i get back to it. > > > > (Yeah, i know it sucks to have to do this, but) > > I am not sure whether that would be helpful, since the problem does not seem to > be directly caused by the patch (it would be a bit more difficult to construct > the testcase for the problem with my patch reverted, of course). > > I am playing with some ideas how to fix this, unless I come up with something > soon, I will revert the patch (except for the testcase that I would like to > remain in the testsuite). I agree with your reasoning, but hiding the bug is better than nothing until a proper fix can be made. I hope you come up with something. Like I said, I can fix it in a month or two, but I know some people want to use Spec2K6 before then .
Created attachment 12574 [details] A patch This reverts the patch which triggers the problem and adds a testcase. I am running SPEC CPU 2006 now.
> I am playing with some ideas how to fix this, unless I come up with something > soon, I will revert the patch (except for the testcase that I would like to > remain in the testsuite). The best I was able to do is the following patch. Virtual operand prunning removes all the symbols that the SMTs have in common, which causes this PR. The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to avoid this. Just looking at the dump of the testcase for this PR, it appears quite expensive (there are a lot of new virtual operands); I will check what the memory behavior is on larger testcases. Index: tree-dump.c =================================================================== *** tree-dump.c (revision 118619) --- tree-dump.c (working copy) *************** dequeue_and_dump (dump_info_p di) *** 495,500 **** --- 495,501 ---- case SYMBOL_MEMORY_TAG: case NAME_MEMORY_TAG: case STRUCT_FIELD_TAG: + case CONFLICT_TAG: break; case VAR_DECL: Index: tree-pretty-print.c =================================================================== *** tree-pretty-print.c (revision 118619) --- tree-pretty-print.c (working copy) *************** dump_generic_node (pretty_printer *buffe *** 849,854 **** --- 849,855 ---- break; case SYMBOL_MEMORY_TAG: + case CONFLICT_TAG: case NAME_MEMORY_TAG: case STRUCT_FIELD_TAG: case VAR_DECL: Index: tree.c =================================================================== *** tree.c (revision 118619) --- tree.c (working copy) *************** init_ttree (void) *** 270,279 **** --- 270,281 ---- tree_contains_struct[STRUCT_FIELD_TAG][TS_DECL_MINIMAL] = 1; tree_contains_struct[NAME_MEMORY_TAG][TS_DECL_MINIMAL] = 1; tree_contains_struct[SYMBOL_MEMORY_TAG][TS_DECL_MINIMAL] = 1; + tree_contains_struct[CONFLICT_TAG][TS_DECL_MINIMAL] = 1; tree_contains_struct[STRUCT_FIELD_TAG][TS_MEMORY_TAG] = 1; tree_contains_struct[NAME_MEMORY_TAG][TS_MEMORY_TAG] = 1; tree_contains_struct[SYMBOL_MEMORY_TAG][TS_MEMORY_TAG] = 1; + tree_contains_struct[CONFLICT_TAG][TS_MEMORY_TAG] = 1; tree_contains_struct[STRUCT_FIELD_TAG][TS_STRUCT_FIELD_TAG] = 1; *************** tree_code_size (enum tree_code code) *** 336,341 **** --- 338,344 ---- return sizeof (struct tree_function_decl); case NAME_MEMORY_TAG: case SYMBOL_MEMORY_TAG: + case CONFLICT_TAG: return sizeof (struct tree_memory_tag); case STRUCT_FIELD_TAG: return sizeof (struct tree_struct_field_tag); *************** tree_node_structure (tree t) *** 2119,2124 **** --- 2122,2128 ---- case SYMBOL_MEMORY_TAG: case NAME_MEMORY_TAG: case STRUCT_FIELD_TAG: + case CONFLICT_TAG: return TS_MEMORY_TAG; default: return TS_DECL_NON_COMMON; Index: tree.h =================================================================== *** tree.h (revision 118619) --- tree.h (working copy) *************** extern const enum tree_code_class tree_c *** 106,112 **** #define MTAG_P(CODE) \ (TREE_CODE (CODE) == STRUCT_FIELD_TAG \ || TREE_CODE (CODE) == NAME_MEMORY_TAG \ ! || TREE_CODE (CODE) == SYMBOL_MEMORY_TAG) /* Nonzero if DECL represents a VAR_DECL or FUNCTION_DECL. */ --- 106,113 ---- #define MTAG_P(CODE) \ (TREE_CODE (CODE) == STRUCT_FIELD_TAG \ || TREE_CODE (CODE) == NAME_MEMORY_TAG \ ! || TREE_CODE (CODE) == SYMBOL_MEMORY_TAG \ ! || TREE_CODE (CODE) == CONFLICT_TAG) /* Nonzero if DECL represents a VAR_DECL or FUNCTION_DECL. */ Index: tree-ssa-alias.c =================================================================== *** tree-ssa-alias.c (revision 118619) --- tree-ssa-alias.c (working copy) *************** init_alias_info (void) *** 915,920 **** --- 915,925 ---- || TREE_CODE (var) == SYMBOL_MEMORY_TAG || !is_global_var (var)) clear_call_clobbered (var); + + /* Mark all old conflict symbols for renaming, so that they go + away. */ + if (TREE_CODE (var) == CONFLICT_TAG) + mark_sym_for_renaming (var); } /* Clear flow-sensitive points-to information from each SSA name. */ *************** compute_flow_sensitive_aliasing (struct *** 1149,1154 **** --- 1154,1265 ---- } } + /* Element of the conflicts hashtable. */ + + typedef struct + { + hashval_t hash; + tree smt1, smt2; + } smt_pair; + + typedef smt_pair *smt_pair_p; + + /* Return true if the smt pairs are equal. */ + + static int + smt_pair_eq (const void *va, const void *vb) + { + const smt_pair_p a = (const smt_pair_p) va; + const smt_pair_p b = (const smt_pair_p) vb; + return (a->smt1 == b->smt1 && a->smt2 == b->smt2); + } + + /* Hash for a smt pair. */ + + static unsigned int + smt_pair_hash (const void *item) + { + return ((const smt_pair_p) item)->hash; + } + + /* Adds conflict between SMT1 and SMT2 to the CONFLICTS table. */ + + static void + add_conflict (htab_t conflicts, tree smt1, tree smt2) + { + unsigned key[2]; + hashval_t hash; + void **slot; + smt_pair cmpelt; + smt_pair_p newelt; + + if (DECL_UID (smt1) > DECL_UID (smt2)) + { + tree tmp = smt1; + smt1 = smt2; + smt2 = tmp; + } + + key[0] = DECL_UID (smt1); + key[1] = DECL_UID (smt2); + hash = iterative_hash (key, sizeof (key), 0); + + cmpelt.hash = hash; + cmpelt.smt1 = smt1; + cmpelt.smt2 = smt2; + + slot = htab_find_slot (conflicts, &cmpelt, INSERT); + if (*slot) + return; + + newelt = XNEW (smt_pair); + *newelt = cmpelt; + *slot = newelt; + } + + /* Create a new memory tag of type TYPE. + Does NOT push it into the current binding. */ + + static tree + create_tag_raw (enum tree_code code, tree type, const char *prefix) + { + tree tmp_var; + tree new_type; + + /* Make the type of the variable writable. */ + new_type = build_type_variant (type, 0, 0); + TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type); + + tmp_var = build_decl (code, create_tmp_var_name (prefix), + type); + /* Make the variable writable. */ + TREE_READONLY (tmp_var) = 0; + + /* It doesn't start out global. */ + MTAG_GLOBAL (tmp_var) = 0; + TREE_STATIC (tmp_var) = 0; + TREE_USED (tmp_var) = 1; + + return tmp_var; + } + + /* Create a new conflict tag. */ + + static tree + create_conflict_tag (void) + { + var_ann_t ann; + tree tag = create_tag_raw (CONFLICT_TAG, void_type_node, "CONFLICT"); + + DECL_CONTEXT (tag) = current_function_decl; + TREE_ADDRESSABLE (tag) = 1; + + ann = get_var_ann (tag); + ann->symbol_mem_tag = NULL_TREE; + add_referenced_var (tag); + + return tag; + } /* Compute type-based alias sets. Traverse all the pointers and addressable variables found in setup_pointers_and_addressables. *************** static void *** 1163,1168 **** --- 1274,1283 ---- compute_flow_insensitive_aliasing (struct alias_info *ai) { size_t i; + htab_t conflicts; + htab_iterator hi; + smt_pair_p conflict; + tree confsym; /* Initialize counter for the total number of virtual operands that aliasing will introduce. When AI->TOTAL_ALIAS_VOPS goes beyond the *************** compute_flow_insensitive_aliasing (struc *** 1254,1282 **** /* Since this analysis is based exclusively on symbols, it fails to handle cases where two pointers P and Q have different memory tags with conflicting alias set numbers but no aliased symbols in ! common. ! ! For example, suppose that we have two memory tags SMT.1 and SMT.2 ! such that ! ! may-aliases (SMT.1) = { a } ! may-aliases (SMT.2) = { b } ! ! and the alias set number of SMT.1 conflicts with that of SMT.2. ! Since they don't have symbols in common, loads and stores from ! SMT.1 and SMT.2 will seem independent of each other, which will ! lead to the optimizers making invalid transformations (see ! testsuite/gcc.c-torture/execute/pr15262-[12].c). ! ! To avoid this problem, we do a final traversal of AI->POINTERS ! looking for pairs of pointers that have no aliased symbols in ! common and yet have conflicting alias set numbers. */ for (i = 0; i < ai->num_pointers; i++) { size_t j; struct alias_map_d *p_map1 = ai->pointers[i]; tree tag1 = var_ann (p_map1->var)->symbol_mem_tag; - bitmap may_aliases1 = p_map1->may_aliases; if (PTR_IS_REF_ALL (p_map1->var)) continue; --- 1369,1387 ---- /* Since this analysis is based exclusively on symbols, it fails to handle cases where two pointers P and Q have different memory tags with conflicting alias set numbers but no aliased symbols in ! common. Given that we prune the sets of virtual operands based ! on the actual shape of the memory reference in tree-ssa-operands, ! we have no easy way how to verify that the sets of symbols will ! always intersect. To avoid problems, we insert artificial ! NONLOCAL_CONFLICT symbol in all pairs of SMTs that conflict. ! If there are too many such pairs, we insert just one NONLOCAL_CONFLICT ! symbol in each of them, which pessimizes results, but saves memory. */ ! conflicts = htab_create (10, smt_pair_hash, smt_pair_eq, free); for (i = 0; i < ai->num_pointers; i++) { size_t j; struct alias_map_d *p_map1 = ai->pointers[i]; tree tag1 = var_ann (p_map1->var)->symbol_mem_tag; if (PTR_IS_REF_ALL (p_map1->var)) continue; *************** compute_flow_insensitive_aliasing (struc *** 1285,1324 **** { struct alias_map_d *p_map2 = ai->pointers[j]; tree tag2 = var_ann (p_map2->var)->symbol_mem_tag; - bitmap may_aliases2 = p_map2->may_aliases; ! if (PTR_IS_REF_ALL (p_map2->var)) continue; ! /* If the pointers may not point to each other, do nothing. */ ! if (!may_alias_p (p_map1->var, p_map1->set, tag2, p_map2->set, true)) ! continue; ! /* The two pointers may alias each other. If they already have ! symbols in common, do nothing. */ ! if (bitmap_intersect_p (may_aliases1, may_aliases2)) ! continue; ! if (!bitmap_empty_p (may_aliases2)) ! { ! unsigned int k; ! bitmap_iterator bi; ! /* Add all the aliases for TAG2 into TAG1's alias set. ! FIXME, update grouping heuristic counters. */ ! EXECUTE_IF_SET_IN_BITMAP (may_aliases2, 0, k, bi) ! add_may_alias (tag1, referenced_var (k)); ! bitmap_ior_into (may_aliases1, may_aliases2); ! } ! else ! { ! /* Since TAG2 does not have any aliases of its own, add ! TAG2 itself to the alias set of TAG1. */ ! add_may_alias (tag1, tag2); ! bitmap_set_bit (may_aliases1, DECL_UID (tag2)); ! } } } if (dump_file) fprintf (dump_file, "\n%s: Total number of aliased vops: %ld\n", --- 1390,1434 ---- { struct alias_map_d *p_map2 = ai->pointers[j]; tree tag2 = var_ann (p_map2->var)->symbol_mem_tag; ! if (PTR_IS_REF_ALL (p_map2->var) || tag1 == tag2) continue; ! if (may_alias_p (p_map1->var, p_map1->set, tag2, p_map2->set, true)) ! add_conflict (conflicts, tag1, tag2); ! } ! } ! if (htab_elements (conflicts) < (unsigned) MAX_ALIASED_VOPS) ! { ! FOR_EACH_HTAB_ELEMENT (conflicts, conflict, smt_pair_p, hi) ! { ! confsym = create_conflict_tag (); ! add_may_alias (conflict->smt1, confsym); ! add_may_alias (conflict->smt2, confsym); ! } ! } ! else ! { ! bitmap csmts = BITMAP_ALLOC (NULL); ! bitmap_iterator bi; ! unsigned uid; ! confsym = create_conflict_tag (); ! FOR_EACH_HTAB_ELEMENT (conflicts, conflict, smt_pair_p, hi) ! { ! bitmap_set_bit (csmts, DECL_UID (conflict->smt1)); ! bitmap_set_bit (csmts, DECL_UID (conflict->smt2)); ! } ! ! EXECUTE_IF_SET_IN_BITMAP (csmts, 0, uid, bi) ! { ! add_may_alias (referenced_var (uid), confsym); } + BITMAP_FREE (csmts); } + htab_delete (conflicts); if (dump_file) fprintf (dump_file, "\n%s: Total number of aliased vops: %ld\n", *************** is_escape_site (tree stmt) *** 2173,2204 **** return NO_ESCAPE; } - /* Create a new memory tag of type TYPE. - Does NOT push it into the current binding. */ - - static tree - create_tag_raw (enum tree_code code, tree type, const char *prefix) - { - tree tmp_var; - tree new_type; - - /* Make the type of the variable writable. */ - new_type = build_type_variant (type, 0, 0); - TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type); - - tmp_var = build_decl (code, create_tmp_var_name (prefix), - type); - /* Make the variable writable. */ - TREE_READONLY (tmp_var) = 0; - - /* It doesn't start out global. */ - MTAG_GLOBAL (tmp_var) = 0; - TREE_STATIC (tmp_var) = 0; - TREE_USED (tmp_var) = 1; - - return tmp_var; - } - /* Create a new memory tag of type TYPE. If IS_TYPE_TAG is true, the tag is considered to represent all the pointers whose pointed-to types are in the same alias set class. Otherwise, the tag represents a single --- 2283,2288 ---- *************** create_memory_tag (tree type, bool is_ty *** 2227,2233 **** return tag; } - /* Create a name memory tag to represent a specific SSA_NAME pointer P_i. This is used if P_i has been found to point to a specific set of variables or to a non-aliased memory location like the address returned --- 2311,2316 ---- Index: tree.def =================================================================== *** tree.def (revision 118619) --- tree.def (working copy) *************** DEFTREECODE (RESULT_DECL, "result_decl", *** 359,364 **** --- 359,365 ---- DEFTREECODE (STRUCT_FIELD_TAG, "struct_field_tag", tcc_declaration, 0) DEFTREECODE (NAME_MEMORY_TAG, "name_memory_tag", tcc_declaration, 0) DEFTREECODE (SYMBOL_MEMORY_TAG, "symbol_memory_tag", tcc_declaration, 0) + DEFTREECODE (CONFLICT_TAG, "conflict_tag", tcc_declaration, 0) /* A namespace declaration. Namespaces appear in DECL_CONTEXT of other _DECLs, providing a hierarchy of names. */ Index: tree-ssa-operands.c =================================================================== *** tree-ssa-operands.c (revision 118619) --- tree-ssa-operands.c (working copy) *************** get_expr_operands (tree stmt, tree *expr *** 1873,1878 **** --- 1873,1879 ---- case STRUCT_FIELD_TAG: case SYMBOL_MEMORY_TAG: case NAME_MEMORY_TAG: + case CONFLICT_TAG: add_stmt_operand (expr_p, s_ann, flags); return;
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc On 9 Nov 2006 11:16:12 -0000, rakdver at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #20 from rakdver at gcc dot gnu dot org 2006-11-09 11:16 ------- > > I am playing with some ideas how to fix this, unless I come up with > something > > soon, I will revert the patch (except for the testcase that I would like to > > remain in the testsuite). > > The best I was able to do is the following patch. Virtual operand prunning > removes all the symbols that the SMTs have in common, which causes this PR. > The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to > avoid this. This is what I was trying to do originally with multiple NONLOCAL symbols (SMT's are going to go away) in the other testcase. It is just too expensive generally One thing i'm going to try later is to try to partition all the stores/load statements and figure out how many symbols it takes to represent the results exactly (IE one symbol for each set of statements that must interfere with each other, where each statement can be in multiple partitions). IE if you had load/store statements a, b, c, d a interferes with c and d but not b b interferes with d You get partitions: part1: {a, c, d} part2: {b, d} We then just create two symbols, and use those as the vdef/vuse syms. This scheme is N^2 worst case, but you can choose to unify partitions to cut down the number of symbols. Partitions that have no shared members can also share symbols. This would unify all of our points-to/access_can_touch_var/etc results into one nice framework that gets very good results, and avoid the virtual operand explosion, i think. The real thing is that this is probably too expensive to compute 5 times per function. We'll see.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc Daniel Berlin wrote on 11/09/06 10:05: > One thing i'm going to try later is to try to partition all the > stores/load statements and figure out how many symbols it takes to > represent the results exactly (IE one symbol for each set of > statements that must interfere with each other, where each statement > can be in multiple partitions). > This is what I'm doing in memory SSA. More details later this week after I'm done testing and such. The difference is that partitioning is embedded in the actual SSA form and the partitioning heuristic can be changed independently of the renamer. This will let us play with a slider-style throttling for precision/compile-time.
(In reply to comment #20) > > I am playing with some ideas how to fix this, unless I come up with > something > > soon, I will revert the patch (except for the testcase that I would like to > > remain in the testsuite). > > The best I was able to do is the following patch. Virtual operand prunning > removes all the symbols that the SMTs have in common, which causes this PR. > The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to > avoid this. Just looking at the dump of the testcase for this PR, it appears > quite expensive (there are a lot of new virtual operands); I will check what > the memory behavior is on larger testcases. > It failed during bootstrap: /net/gnu-13/export/gnu/src/gcc/gcc/gcc/tree-dfa.c: In function âfind_referenced_varsâ: /net/gnu-13/export/gnu/src/gcc/gcc/gcc/tree-dfa.c:99: internal compiler error: in mark_operand_necessary, at tree-ssa-dce.c:261 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://gcc.gnu.org/bugs.html> for instructions. make[5]: *** [tree-dfa.o] Error 1
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc On 11/9/06, Diego Novillo <dnovillo@redhat.com> wrote: > Daniel Berlin wrote on 11/09/06 10:05: > > > One thing i'm going to try later is to try to partition all the > > stores/load statements and figure out how many symbols it takes to > > represent the results exactly (IE one symbol for each set of > > statements that must interfere with each other, where each statement > > can be in multiple partitions). > > > This is what I'm doing in memory SSA. More details later this week > after I'm done testing and such. The difference is that partitioning is > embedded in the actual SSA form and the partitioning heuristic can be > changed independently of the renamer. This will let us play with a > slider-style throttling for precision/compile-time. > Right, but the difference is, In the scheme i propose, you'd never have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do. IE we wouldn't run into all the problems mem-ssa is going to bring in this regard.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc Daniel Berlin wrote on 11/09/06 12:22: > Right, but the difference is, In the scheme i propose, you'd never > have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do. > IE we wouldn't run into all the problems mem-ssa is going to bring in > this regard. No, that's not right. Overlapping live-ranges are not a problem until you hit a PHI node. That's where currently mem-ssa is having difficulties with. We can use those static partitions at key points during SSA renaming. Since the partitions are completely unrelated to the renamer, we can experiment with different partitioning schemes. It's actually even possible to arrive to a perfect partitioning scheme that doesn't introduce false positive dependencies. More details to follow.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc > > Right, but the difference is, In the scheme i propose, you'd never > > have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do. > > IE we wouldn't run into all the problems mem-ssa is going to bring in > > this regard. > > No, that's not right. Overlapping live-ranges are not a problem until > you hit a PHI node. That's where currently mem-ssa is having > difficulties with. well, in any case, Daniel's proposal has advantage that it is much less intrusive than mem-ssa -- does not need to change ssa renaming at all, probably needs much less changes to operand scanning, and does not need any changes to optimizations that assume vops are in FUD form (i.e., that the life ranges of vops do not overlap). If he could create (or help someone create) a working prototype in reasonable time (few weeks?), I would very much like to see it compared with mem-ssa before mem-ssa branch is merged.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc A detailed proposal: So here is what i was thinking of. When i say symbols below, I mean "some VAR_DECL or structure that has a name" (like our memory tags do). A symbol is *not* a real variable that occurred in the user program. When I say "varaible" i mean a variable that occurred in the user program. The real problem with our alias system in terms of precision, and often in terms of number of useless vops, is that we are trying to use real, existing, variables, to approximate the portions of the heap a statement accesses. When things access portions of the heap we can't see (nonlocal variables), we fall down badly in terms of precision because we can eliminate every single local variable as an alias, and need to then just say it accesses "some nonlocal variable". This causes precision problems because it means that statements accessing nonlocal variables that we can *prove* don't interfere, still currently share a VUSE between them. We also have useless vops whenever we have points-to sets that intersect between all statements that interfere, because we end up adding aliases for you can eliminate the members of the alias set We also currently rely on operand-scan time pruning, which is very ugly. There is a way to get the minimal number of vuses/vdefs necessary to represent completely precise (in terms of info we have) aliasing, while still being able to degrade the precision gracefully in order to enable the vuses/vdefs necessary to go down The scheme i propose *never* has overlapping live ranges of the individual symbols, even though the symbols may represent pieces of the same heap. In other words, you can rely on the fact that once an individual symbol has a new version, there will never be a vuse of an old version of that symbol. The current vdef/vuse scheme consists of creating memory tags to represent portions of the heap. When a memory tag has aliases, we use it's alias list to generate virtual operands. When a memory tag does not have aliases, we generate a virtual operand of the base symbol. The basic idea in the new scheme is to never have a list of aliases for a symbol representing portions of the heap. The symbols representing portions of the heap are themselves always the target of a vuse/vdef. The aliases they represent is immaterial (though we can keep a list if something wants it). This enables us to have a smaller number of vops, and have something else generate the set of symbols in a precise manner, rather than have things like the operand scanner try to post process it. The symbols are also attached to the load/store statements, and not to the variables. The operand renamer only has to add vuses/vdefs for all the symbols attached to a statement, and it is done. In the simplest, dumb, non-precise version of this scheme, this means you only have one symbol, called "MEM", and generate vuse/vdefs linking every load/store together. In the absolute most-precise version of this scheme, you partition the loads/store conflicts in statements into symbols that represent statement conflictingness. In a completely naive, O(N^3) version, the following algorithm will work and generate completely precise results: Collect all loads and stores into a list (lslist) for each statement in lslist (x): for each statement in lslist (y): if x conflicts with y: if there is no partition for x, y, create a new one containing x and y. otherwise for every partition y belongs to: if all members of this partition have memory access that conflicts with x: add x to this partition otherwise create a new partition containing all members of the partition except the ones x does not conflict with. add x to this partition This is a very very slow way to do it, but it should be clear (there are much much much faster ways to do this). Basically, a single load/store statement can belong to multiple partitions. All members of a given partition conflict with each other. given the following set of memory accesses statements: a, b, c, d where: a conflicts with b and c b conflicts with c and d c conflicts with a and b d conflicts with a and c you will end up with 3 partitions: part1: {a, b, c} part2: {b, c, d} part3: {d, a, c} statement c will conflict with every member of partition 1 and thus get partition 1, rather than a new partition. You now create symbols for each partition, and for each statement in the partition, add the symbol to it's list. Thus, in the above example we get statement a - symbols: MEM.PART1, MEM.PART3 statement b - symbols: MEM.PART1, MEM.PART2 statement c - symbols: MEM.PART1, MEM.PART2, MEM.PART3 statement d - symbols MEM.PART2, MEM.PART3 As mentioned before, the operand renamer simply adds a vdef/vuse for each symbol in the statement list. Note that this is the minimal number of symbols necessary to precisely represent the conflicting accesses. If the number of partitions grows large (and thus requires too many symbols), you can simply union any partitions you like. As long as the partitions contain *at least* the real conflicts for those statements, all you would do is add false aliasing. Note that while the symbols partitioning the heap are not necessarily disjoint in terms of parts of the heap they represent, the vuse/vdefs for an individual symbol will never overlap, just like our current representation.
(In reply to comment #26) > I would very much like to see it compared with mem-ssa before mem-ssa > branch is merged. > Notice that the two approaches do not negate each other. Dan's proposal is a smarter partitioning than what the current alias grouping mechanism tries to do. We can actually have memory SSA on top of Dan's partitioning scheme. Memory SSA will use that partitioning scheme when placing memory PHI nodes. The two approaches are orthogonal. Memory SSA simply adds a new degree of factoring that gives you sparser UD chains. It also gives you exactly one name per store, without reducing precision.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc > > I would very much like to see it compared with mem-ssa before mem-ssa > > branch is merged. > > > Notice that the two approaches do not negate each other. Dan's proposal is a > smarter partitioning than what the current alias grouping mechanism tries to > do. We can actually have memory SSA on top of Dan's partitioning scheme. > Memory SSA will use that partitioning scheme when placing memory PHI nodes. > > The two approaches are orthogonal. Memory SSA simply adds a new degree of > factoring that gives you sparser UD chains. It also gives you exactly one name > per store, without reducing precision. nevertheless, it is not obvious to me whether using mem-ssa over Daniel's proposal would bring any significant gains, which I would like to have verified before we introduce milion new bugs with mem-ssa (nothing personal, it simply is too large and too intrusive change not to bring any).
(In reply to comment #29) > nevertheless, it is not obvious to me whether using mem-ssa over Daniel's > proposal would bring any significant gains, which I would like to have > Of course. If you are interested in the compile time benefits of a partitioning scheme, you can actually try the one we already have by forcing alias grouping more aggressively (--param max-aliased-vops). The current grouping is very dumb and will create tons of false positives. Daniel's approach will try to reduce false positives while bringing down the number of virtual operators per memory statement. Memory SSA brings down the number of virtual operators to exactly one per statement. > verified before we introduce milion new bugs with mem-ssa (nothing > personal, it simply is too large and too intrusive change not to bring > any). > Intrusive? Well, the only pass that was wired to the previous virtual operator scheme was PRE. DSE is also wired but to a lesser extent. No other optimization had to be changed for mem-ssa. It's obviously intrusive in the renamer, but that's it.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc > > Memory SSA brings down the number of virtual operators to exactly one per > statement. However, it does so in a way that makes the traditional things that actually want to do cool memory optimizations, harder. I'm still on the fence over whether it's a good idea or not. > > > > verified before we introduce milion new bugs with mem-ssa (nothing > > personal, it simply is too large and too intrusive change not to bring > > any). > > > Intrusive? Well, the only pass that was wired to the previous virtual operator > scheme was PRE. DSE is also wired but to a lesser extent. No other > optimization had to be changed for mem-ssa. It's obviously intrusive in the > renamer, but that's it. Uh, LIM and store sinking are too. Roughly all of our memory optimizations are. The basic problem is in mem-ssa that vdefs and vuses don't accurately reflect what symbols are being defined and used anymore. They represent the factoring of a use and definition of a whole bunch of symbols. Things like PRE and DSE break not because they are "wired to the previous virtual operator scheme" so much, but because they rely on the virtual use/def chains accurately representing where a symbol representing a memory access dies. In mem-ssa, you have VDEF's of the same symbol all over the place. The changes i have to make to PRE (and to the other things) to account for this is actually to rebuild the non-mem-ssa-factored (IE the current factored) form out of the chains by seeing what symbols they really affect. This is going to be expensive, and IMHO, is what almost all of our SSA memory optimizations are going to have to do. So while mem-ssa doesn't affect *precision*, it does affect how you can use the chains in a very significant way. For at least all the opts i see us doing, it makes them more or less useless without doing things (like reexpanding them) first. Because this is true, I'm not sure it's a good idea at all, which is why i'm still on the fence.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc > In mem-ssa, you have VDEF's of the > same symbol all over the place. > ^^^^ version of a symbol
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc dberlin at dberlin dot org wrote on 11/09/06 16:28: > Uh, LIM and store sinking are too. Roughly all of our memory > optimizations are. > They are? Really? Can you show me where exactly? > The changes i have to make to PRE (and to the other things) to > account for this is actually to rebuild the non-mem-ssa-factored (IE > the current factored) form out of the chains by seeing what symbols > they really affect. > OK, so how come you were so positive about the new interface? I need to understand what was the great difficulty you ran into that made you change your mind. I need to see a specific example. See, the UD chains you get in mem-ssa are neither incomplete nor wrong. The symbols affected are right there in plain sight, so there is no loss of any information. > For at least all the opts i see us doing, it makes them more or less > useless without doing things (like reexpanding them) first. Because > this is true, I'm not sure it's a good idea at all, which is why i'm > still on the fence. > But you still haven't *shown* me where the hardness or slowness comes in. Granted, the work is still unfinished so we can't really do actual measurements. But I need to understand where the difficulties will be so that I can accomodate the infrastructure. It's obviously not my intent to make things harder to use.
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc On 9 Nov 2006 21:48:25 -0000, dnovillo at redhat dot com <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #33 from dnovillo at redhat dot com 2006-11-09 21:48 ------- > Subject: Re: [4.3 Regression] Misscompilation > of spec2006 gcc > > dberlin at dberlin dot org wrote on 11/09/06 16:28: > > > Uh, LIM and store sinking are too. Roughly all of our memory > > optimizations are. > > > They are? Really? Can you show me where exactly? They won't get incorrect results. They will get less good results than they do now. Take LIM and store motion (sorry, i meant SM, not store sinking) determine_max_movement relies on the VIRTUAL_KILL information to determine what must be moved together. Because things that would previously kill different symbol versions, now will kill the same symbol version (due to being factored), it will add false dependencies unless someone changes it. Take the following example: int e,f,g,h; int main(int argc) { int *a, *b, *d; int i; a = &e; if (argc) a = &f; b = &g; if (argc) b = &h; d = &e; if (argc) d = &h; for (i = 0; i < 50; i++) { *a = 1; *b = 2; *d = 3; } } previously, you would have ... # e_22 = V_MAY_DEF <e_14>; # f_23 = V_MAY_DEF <f_15>; *a_1 = 1; # g_24 = V_MAY_DEF <g_16>; # h_25 = V_MAY_DEF <h_17>; *b_2 = 2; # e_26 = V_MAY_DEF <e_22>; # h_27 = V_MAY_DEF <h_25>; *d_3 = 3; Note that *a and *b do not vdef any symbol that is the same. mem-ssa gives you (as of today): # .MEM_16 = VDEF <.MEM_14, f_20> *a_1 = 1; # .MEM_17 = VDEF <.MEM_14, g_21> *b_2 = 2; # .MEM_18 = VDEF <.MEM_16, .MEM_17> *d_3 = 3; note that now *a and *b both vdef MEM_14. SM is going to say these stores are dependent on each other because they "kill" the same version of the same variable unless you teach it to look at the get_loads_and_stores bitmap. Previously, it would not. > > The changes i have to make to PRE (and to the other things) to > > account for this is actually to rebuild the non-mem-ssa-factored (IE > > the current factored) form out of the chains by seeing what symbols > > they really affect. > > > OK, so how come you were so positive about the new interface? When have i been overabundantly positive? I said I'd deal with it. I'm neither here nor there. I've relied on your statements that you believe it will make things significantly better without loss of precision. We are going to pay a cost in time for passes to make use of this information. I believed you were aware of this. > I need to > understand what was the great difficulty you ran into that made you > change your mind. I need to see a specific example. > > See, the UD chains you get in mem-ssa are neither incomplete nor wrong. Nobody said they are wrong, but I would actually argue they are no longer really the same as SSA in a way that matters, if you want to pick nits. SSA variables have not only the property that they are *assigned to* once, but the property that they are *defined* by a single operation. Our current virtual operand scheme has this property. Yours does not, because variables may be defined multiple times, although they are still singley assigned. You can argue this is not a requirement of SSA. Honestly, it makes no difference to me. The upshot is that by losing this property, you make them less useful than they could be. > The symbols affected are right there in plain sight, so there is no > loss of any information. Errr, last time i looked, you had to call a function to get the *actual* list of loads or stores affected by a statement. Why does this matter? All of our memory optimizations are trying to figure out three things: 1. Will two memory accesses return the same results (PRE, DSE)? 2. Do these two memory accesses have a dependency (PRE, SM, DSE, LIM). 3. If I hoist this memory to block X, will it still have the same value as it does in block Y (PRE, SM, LIM). Your stuff has no real affect on #2, but it makes #1 and #3 harder (not less precise, mind you), at the cost of reducing the number of operands. It does so by factoring stores in a way that disables the ability to see where loads are not *currently*, but *could*, validly be live. In other words, it was previously possible to simply determine whether two stores touched the same piece of memory by looking at the versions. This is no longer true. PRE does #1 and #3 through value numbering memory and tracking the distinct live ranges of virtual variables. SM and LIM do #3 by simply using dependency information to determine what needs to be moved together and grouping operations that vdef the same variable together. SM and LIM will thus still get *correct* information in the above example, but it's going to get false dependencies due to defs of the same mem version, unless something disambiguates between those store dependencies by looking at the underlying loads and stores involved. Previously, you could value number memory (and thus answer questions #1) simply by making a list of all the virtual variable versions associated with it, and the value number of the underlying accesses. You could see whether these value numbers would still be *obviously* be valid affected moving above or below a certain store *just by seeing what virtual variable versions were defined by that statement and whether it was a vuse version used by one of the value numbers in your statements*. It is no longer possible to get a precise answer this way when you factor stores the way you do, because the name of the virtual variable defined by a store is actually no longer actually a function of what symbols it defines anymore. Take the above case. If we simply use virtual variable versions to value number memory, we will believe that *a and *b are possible stores to the same piece of memory, even though they are not. In order to get the *correct* answer in the presence of phi nodes, we currently collect the variables the phi nodes were joining together, and consider a def of the phi result to be a def of the versions joined by the phi IE if you had phi_result_3 = <a_1, b_2> VUSE <phi_result_3> *foo = a would be considered a kill of any value numbers containing vuses of either a_1 or b_2. So how do i plan on recovering the information of figuring out where things can be moved to? Well, essentially, PRE wants a a set of "live memory variable versions" for a given statement so it can see where things can be moved to. I am going to start out at the top of the program, and walk in dominator order, generating a bitmap. We initially assume the version of everything in the load_store_value_bitmap is 0. Every time we see a vdef, we call get_loads_and_stores. For each variable in the underlying load and store bitmap, we remove the bit for the old version's id from load_store_value_bitmap, and add a bit for the new version's id. Every time we see a vuse, we use the current load_store_value_bitmap as part of it's value number, and attach the bitmap to the current statement. We attach the load_store_value_bitmap to the top and bottom of each block. Every time we see a phi node, we generate new versions for each of the loaded and stored variables of the statements that this phi node is joining. We can translate a value upwards through phis by looking at what bits in the bitmaps changed for each edge. So basically, i'm doing virtual SSA renaming and storing some of the reaching information persistently. This is how Load PRE was done in the SSAPRE infrastructure, and it works out okay. But it really does mean that we aren't using the U-D chains anymore for loads and stores, because we can't. > > > For at least all the opts i see us doing, it makes them more or less > > useless without doing things (like reexpanding them) first. Because > > this is true, I'm not sure it's a good idea at all, which is why i'm > > still on the fence. > > > But you still haven't *shown* me where the hardness or slowness comes > in. How not? As i've told you before, and showed you before, it was previously the case we could value number memory directly and determine whether it was going to be legal in a given block. mem-ssa breaks this. It was previously possible to accurately and quickly determine where stores to memory that conflicted are just by looking at versions. This is no longer true. This in turn removes the ability to determine where new loads can be placed simply by looking at VDEF RHS variables. I've reexplained this above. > Granted, the work is still unfinished so we can't really do actual > measurements. But I need to understand where the difficulties will be > so that I can accomodate the infrastructure. It's obviously not my > intent to make things harder to use. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
Subject: Re: [4.3 Regression] Misscompilation of spec2006 gcc > Take the above case. > If we simply use virtual variable versions to value number memory, we > will believe that *a and *b are possible stores to the same piece of > memory, even though they are not. In case it's not clear, this means while previously you would have determined you can move *b above *a simply by looking at the RHS, you can't anymore. My guess: Any memory optimization that wants to just eliminate redundant loads will be just fine with mem-ssa,. Any memory optimization that wants to eliminate redundant stores will have to be redesigned to not use chains to get maximum precision. Any memory optimization that wants to hoist either loads or stores will have to be redesigned to not use chains to determine where things can be moved to, in order to get maximum precision. At least, that's the way it appears to me.
(In reply to comment #19) > Created an attachment (id=12574) [edit] > A patch > > This reverts the patch which triggers the problem and adds a testcase. I > am running SPEC CPU 2006 now. I am going to commit this patch for now (once it passes bootstrap & regtesting).
Subject: Bug 29680 Author: rakdver Date: Mon Nov 13 12:37:29 2006 New Revision: 118754 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118754 Log: PR tree-optimization/29680 * tree-ssa-operands.c (access_can_touch_variable): Revert fix for PR 14784. * gcc.dg/alias-11.c: New test. Added: trunk/gcc/testsuite/gcc.dg/alias-11.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-operands.c
Fixed.