29680 – [4.3 Regression] Misscompilation of spec2006 gcc

Bug 29680 - [4.3 Regression] Misscompilation of spec2006 gcc

Summary: [4.3 Regression] Misscompilation of spec2006 gcc

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.3.0

Importance:	P3 normal
Target Milestone:	4.3.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	alias, wrong-code

Depends on:
Blocks:

Reported:	2006-11-01 00:06 UTC by Zdenek Dvorak
Modified:	2006-11-13 13:54 UTC (History)
CC List:	7 users (show)

See Also:
Host:	i686-pc-linux,x86_64-pc-linux
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2006-11-06 15:12:29

Attachments
The testcase. (2.11 KB, application/octet-stream) 2006-11-01 00:07 UTC, Zdenek Dvorak	Details
A run-time testcase (1010 bytes, text/plain) 2006-11-01 20:03 UTC, H.J. Lu	Details
An updated run-time testcase (669 bytes, text/plain) 2006-11-01 21:26 UTC, H.J. Lu	Details
A testcase to show array reference is ok (1.20 KB, application/octet-stream) 2006-11-04 16:53 UTC, H.J. Lu	Details
A patch (1.23 KB, patch) 2006-11-09 01:53 UTC, H.J. Lu	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zdenek Dvorak 2006-11-01 00:06:18 UTC

The fix for PR14784 causes misscompilation of gcc in spec2006, at -O2.  Some discussion of the problem is in PR14784, I am moving it to a separate bug report.

Comment 1 Zdenek Dvorak 2006-11-01 00:07:23 UTC

Created attachment 12523 [details]
The testcase.

Comment 2 Zdenek Dvorak 2006-11-01 00:12:27 UTC

The problematic piece of .alias5 dump:

  p_27 = &fde_13->dw_fde_cfi;
  #   VUSE <SMT.48_79>;
  D.1763_23 = fde_13->dw_fde_cfi;
  if (D.1763_23 != 0B) goto <L8>; else goto <L10>;

  # p_18 = PHI <p_30(9), p_27(8)>;
<L10>:;
  #   cie_cfi_head_82 = V_MAY_DEF <cie_cfi_head_73>;
  *p_18 = xcfi_22;

The store to *p_18 and the load of fde_13->dw_fde_cfi obviously alias,
however their virtual operands are disjoint.  I am trying to find out why this happens.

Comment 3 Zdenek Dvorak 2006-11-01 00:49:18 UTC

access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch cie_cfi_head; the list of virtual operands of the load thus becomes empty, and we insert SMT.48 for it.

On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion of SMT is formulated as

  ...
  ||none_added
  || (TREE_CODE (var) == SYMBOL_MEMORY_TAG
      && for_clobber
      && SMT_USED_ALONE (var)))

and for_clobber is only true on call operands, we do not insert SMT.  The lists of virtual operands thus become disjoint.

Daniel, any idea how to fix this?  I do not quite understand the SMT_USED_ALONE stuff.  The condition above looks suspicious to me, especially the test for "for_clobber" -- why should we want to handle call virtual operands differently from any others? Obviously, removing the for_clobber test would fix this problem, but I am not really sure this would be the right solution.

Comment 4 Daniel Berlin 2006-11-01 01:29:48 UTC

Subject: Re:  Misscompilation of spec2006 gcc

>
> ------- Comment #3 from rakdver at gcc dot gnu dot org  2006-11-01 00:49 -------
> access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch
> cie_cfi_head; the list of virtual operands of the load thus becomes empty, and
> we insert SMT.48 for it.
>
> On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion
> of SMT is formulated as
>
>   ...
>   ||none_added
>   || (TREE_CODE (var) == SYMBOL_MEMORY_TAG
>       && for_clobber
>       && SMT_USED_ALONE (var)))
>
> and for_clobber is only true on call operands, we do not insert SMT.  The lists
> of virtual operands thus become disjoint.
We should not insert the SMT here.
We are never supposed to use a bare SMT when it has aliases that we
can use (IE something has not been pruned).

In other words, we assume we can prune the same set of aliases
regardless of where the access occurs.  This *was* true before your
patch.

>
> Daniel, any idea how to fix this?  I do not quite understand the SMT_USED_ALONE
> stuff.  The condition above looks suspicious to me, especially the test for
> "for_clobber" -- why should we want to handle call virtual operands differently
> from any others?

SMT_USED_ALONE is used to eliminate the need for SMT operands on calls
when the only place the SMT is used is to be def'd at call sites (IE
it has no real uses).

The real problem here is that we can't get away with pruning some but
not all accesses to the same variables in the current tag scheme.

>Obviously, removing the for_clobber test would fix this
> problem, but I am not really sure this would be the right solution.
>
It would not. The test does not exist in isolation, you'd have to
remove the whole block for that |, and that wouldn't fix your problem.

I will work around this problem by teaching PTA about casts from
nonpointers to pointers, which will cause it to end up with a nonlocal
var in the set.

However, this is going to lose precision in this particular instance.

Until our tagging system is based on load/store interference, and not
variables this name can access, we will always run into these issues.

Comment 5 Zdenek Dvorak 2006-11-01 08:05:14 UTC

Subject: Re:  Misscompilation of spec2006 gcc

> > ------- Comment #3 from rakdver at gcc dot gnu dot org  2006-11-01 00:49 -------
> > access_can_touch_variable determines that fde_13->dw_fde_cfi cannot touch
> > cie_cfi_head; the list of virtual operands of the load thus becomes empty, and
> > we insert SMT.48 for it.
> >
> > On *p, we cannot eliminate cie_cfi_head, and since the condition for insertion
> > of SMT is formulated as
> >
> >   ...
> >   ||none_added
> >   || (TREE_CODE (var) == SYMBOL_MEMORY_TAG
> >       && for_clobber
> >       && SMT_USED_ALONE (var)))
> >
> > and for_clobber is only true on call operands, we do not insert SMT.  The lists
> > of virtual operands thus become disjoint.
> We should not insert the SMT here.
> We are never supposed to use a bare SMT when it has aliases that we
> can use (IE something has not been pruned).

but fde_13->dw_fde_cfi does not have any other aliases that can be used,
i.e., none_added is true.

> I will work around this problem by teaching PTA about casts from
> nonpointers to pointers, which will cause it to end up with a nonlocal
> var in the set.

??? There is no cast from non-pointer to pointer in this testcase.

Comment 6 Daniel Berlin 2006-11-01 17:53:33 UTC

Subject: Re:  Misscompilation of spec2006 gcc

> > >
> > > and for_clobber is only true on call operands, we do not insert SMT.  The lists
> > > of virtual operands thus become disjoint.
> > We should not insert the SMT here.
> > We are never supposed to use a bare SMT when it has aliases that we
> > can use (IE something has not been pruned).
>
> but fde_13->dw_fde_cfi does not have any other aliases that can be used,
> i.e., none_added is true.

This is only true for one of the accesses, not both, or else the SMT
would be used in both cases. :)

>
> > I will work around this problem by teaching PTA about casts from
> > nonpointers to pointers, which will cause it to end up with a nonlocal
> > var in the set.
>
> ??? There is no cast from non-pointer to pointer in this testcase.

Actually, there is.
That's why you end up with SMT's in the first place.
  #   VUSE <fde_table_in_useD.1609_6>;
  fde_table_in_use.0D.1617_7 = fde_table_in_useD.1609;
  D.1618_8 = fde_table_in_use.0D.1617_7 * 24;
  D.1619_9 = (struct dw_fde_struct *) D.1618_8;

This causes D.1619_9 to point to anything

  #   VUSE <fde_tableD.1610_10>;
  fde_table.1D.1620_11 = fde_tableD.1610;
  D.1621_12 = D.1619_9 + fde_table.1D.1620_11;

causes D.1621_12 (the &fde_table[fde_table_in_use - 1]) to point to anything.

All of this is what causes SMT's to be used.

Comment 7 H.J. Lu 2006-11-01 18:05:42 UTC

If I change the code from

dw_fde_ref fde = &fde_table[fde_table_in_use - 1];

to

dw_fde_node *fde = fde_table + fde_table_in_use - 1;

I got the same problem.

Comment 8 Andrew Pinski 2006-11-01 18:18:24 UTC

This is more reason why we need a POINTER_PLUS_EXPR.

Comment 9 H.J. Lu 2006-11-01 20:03:42 UTC

Created attachment 12529 [details]
A run-time testcase

Here is a run-time testcase:

[hjl@gnu-25 yyy]$  /usr/gcc-bad/bin/gcc -O2 bad.c
[hjl@gnu-25 yyy]$ ./a.out 
Aborted
[hjl@gnu-25 yyy]$  /usr/gcc-bad/bin/gcc -O bad.c
[hjl@gnu-25 yyy]$ ./a.out 
[hjl@gnu-25 yyy]$  /usr/gcc-good/bin/gcc -O2 bad.c
[hjl@gnu-25 yyy]$ ./a.out 
[hjl@gnu-25 yyy]$

Comment 10 Zdenek Dvorak 2006-11-01 20:26:15 UTC

Subject: Re:  Misscompilation of spec2006 gcc

> > > I will work around this problem by teaching PTA about casts from
> > > nonpointers to pointers, which will cause it to end up with a nonlocal
> > > var in the set.
> >
> > ??? There is no cast from non-pointer to pointer in this testcase.
> 
> Actually, there is.
> That's why you end up with SMT's in the first place.
>   #   VUSE <fde_table_in_useD.1609_6>;
>   fde_table_in_use.0D.1617_7 = fde_table_in_useD.1609;
>   D.1618_8 = fde_table_in_use.0D.1617_7 * 24;
>   D.1619_9 = (struct dw_fde_struct *) D.1618_8;

You mean that whenever there is a pointer arithmetics other than
adding constants, we end up with points-to anything?  :-(

Comment 11 H.J. Lu 2006-11-01 21:26:47 UTC

Created attachment 12530 [details]
An updated run-time testcase

This is smaller.

Comment 12 H.J. Lu 2006-11-04 16:53:35 UTC

Created attachment 12547 [details]
A testcase to show array reference is ok

Gcc doesn't have a problem with array reference. That is if I change it
from

extern dw_fde_node *fde_table;

to

extern dw_fde_node fde_table [];

I got the correct result.

Comment 13 Andrew Pinski 2006-11-04 17:28:19 UTC

(In reply to comment #12)
> Created an attachment (id=12547) [edit]
> A testcase to show array reference is ok
> 
> Gcc doesn't have a problem with array reference. That is if I change it
> from

Yes because for arrays we keep ARRAY_REF around instead of lowering it to pointer arthematic.  See PR 29708 for another testcase which goes funny because of casts in the IR.

Comment 14 H.J. Lu 2006-11-06 15:12:29 UTC

I checked gcc 4.3. The same source code, which is miscompiled in gcc from
SPEC CPU 2006, is there. It is most likely that gcc 4.3 is also miscompiled
and now generating wrong unwind/debug info, if not wrong instructions.

Comment 15 Daniel Berlin 2006-11-06 16:28:40 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

Zdenek, can you revert your patch  until we fix this?
It might be a month or two before i get back to it.

(Yeah, i know it sucks to have to do this, but)

On 6 Nov 2006 15:12:30 -0000, hjl at lucon dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #14 from hjl at lucon dot org  2006-11-06 15:12 -------
> I checked gcc 4.3. The same source code, which is miscompiled in gcc from
> SPEC CPU 2006, is there. It is most likely that gcc 4.3 is also miscompiled
> and now generating wrong unwind/debug info, if not wrong instructions.
>
>
> --
>
> hjl at lucon dot org changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |NEW
>      Ever Confirmed|0                           |1
>    Last reconfirmed|0000-00-00 00:00:00         |2006-11-06 15:12:29
>                date|                            |
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680
>
>

Comment 16 H.J. Lu 2006-11-06 17:19:35 UTC

I think we should add the testcase when the patch is reverted to prevent it
from happening again.

Comment 17 Zdenek Dvorak 2006-11-07 13:19:30 UTC

> Zdenek, can you revert your patch  until we fix this?
> It might be a month or two before i get back to it.
> 
> (Yeah, i know it sucks to have to do this, but)

I am not sure whether that would be helpful, since the problem does not seem to be directly caused by the patch (it would be a bit more difficult to construct the testcase for the problem with my patch reverted, of course).

I am playing with some ideas how to fix this, unless I come up with something soon, I will revert the patch (except for the testcase that I would like to remain in the testsuite).

Comment 18 Daniel Berlin 2006-11-07 15:22:45 UTC

(In reply to comment #17)
> > Zdenek, can you revert your patch  until we fix this?
> > It might be a month or two before i get back to it.
> > 
> > (Yeah, i know it sucks to have to do this, but)
> 
> I am not sure whether that would be helpful, since the problem does not seem to
> be directly caused by the patch (it would be a bit more difficult to construct
> the testcase for the problem with my patch reverted, of course).
> 

> I am playing with some ideas how to fix this, unless I come up with something
> soon, I will revert the patch (except for the testcase that I would like to
> remain in the testsuite).

I agree with your reasoning, but hiding the bug is better than nothing until a proper fix can be made.  I hope you come up with something.  Like I said, I can fix it in a month or two, but I know some people want to use Spec2K6 before then .

Comment 19 H.J. Lu 2006-11-09 01:53:43 UTC

Created attachment 12574 [details]
A patch

This reverts the patch which triggers the problem and adds a testcase. I
am running SPEC CPU 2006 now.

Comment 20 Zdenek Dvorak 2006-11-09 11:16:10 UTC

 > I am playing with some ideas how to fix this, unless I come up with something
> soon, I will revert the patch (except for the testcase that I would like to
> remain in the testsuite).

The best I was able to do is the following patch.  Virtual operand prunning removes all the symbols that the SMTs have in common, which causes this PR.  The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to avoid this.  Just looking at the dump of the testcase for this PR, it appears quite expensive (there are a lot of new virtual operands); I will check what the memory behavior is on larger testcases.

Index: tree-dump.c
===================================================================
*** tree-dump.c	(revision 118619)
--- tree-dump.c	(working copy)
*************** dequeue_and_dump (dump_info_p di)
*** 495,500 ****
--- 495,501 ----
      case SYMBOL_MEMORY_TAG:
      case NAME_MEMORY_TAG:
      case STRUCT_FIELD_TAG:
+     case CONFLICT_TAG:
        break;
  
      case VAR_DECL:
Index: tree-pretty-print.c
===================================================================
*** tree-pretty-print.c	(revision 118619)
--- tree-pretty-print.c	(working copy)
*************** dump_generic_node (pretty_printer *buffe
*** 849,854 ****
--- 849,855 ----
        break;
  
      case SYMBOL_MEMORY_TAG:
+     case CONFLICT_TAG:
      case NAME_MEMORY_TAG:
      case STRUCT_FIELD_TAG:
      case VAR_DECL:
Index: tree.c
===================================================================
*** tree.c	(revision 118619)
--- tree.c	(working copy)
*************** init_ttree (void)
*** 270,279 ****
--- 270,281 ----
    tree_contains_struct[STRUCT_FIELD_TAG][TS_DECL_MINIMAL] = 1;
    tree_contains_struct[NAME_MEMORY_TAG][TS_DECL_MINIMAL] = 1;
    tree_contains_struct[SYMBOL_MEMORY_TAG][TS_DECL_MINIMAL] = 1;
+   tree_contains_struct[CONFLICT_TAG][TS_DECL_MINIMAL] = 1;
  
    tree_contains_struct[STRUCT_FIELD_TAG][TS_MEMORY_TAG] = 1;
    tree_contains_struct[NAME_MEMORY_TAG][TS_MEMORY_TAG] = 1;
    tree_contains_struct[SYMBOL_MEMORY_TAG][TS_MEMORY_TAG] = 1;
+   tree_contains_struct[CONFLICT_TAG][TS_MEMORY_TAG] = 1;
  
    tree_contains_struct[STRUCT_FIELD_TAG][TS_STRUCT_FIELD_TAG] = 1;
  
*************** tree_code_size (enum tree_code code)
*** 336,341 ****
--- 338,344 ----
  	    return sizeof (struct tree_function_decl);
  	  case NAME_MEMORY_TAG:
  	  case SYMBOL_MEMORY_TAG:
+ 	  case CONFLICT_TAG:
  	    return sizeof (struct tree_memory_tag);
  	  case STRUCT_FIELD_TAG:
  	    return sizeof (struct tree_struct_field_tag);
*************** tree_node_structure (tree t)
*** 2119,2124 ****
--- 2122,2128 ----
  	  case SYMBOL_MEMORY_TAG:
  	  case NAME_MEMORY_TAG:
  	  case STRUCT_FIELD_TAG:
+ 	  case CONFLICT_TAG:
  	    return TS_MEMORY_TAG;
  	  default:
  	    return TS_DECL_NON_COMMON;
Index: tree.h
===================================================================
*** tree.h	(revision 118619)
--- tree.h	(working copy)
*************** extern const enum tree_code_class tree_c
*** 106,112 ****
  #define MTAG_P(CODE) \
    (TREE_CODE (CODE) == STRUCT_FIELD_TAG		\
     || TREE_CODE (CODE) == NAME_MEMORY_TAG	\
!    || TREE_CODE (CODE) == SYMBOL_MEMORY_TAG)
  
  
  /* Nonzero if DECL represents a VAR_DECL or FUNCTION_DECL.  */
--- 106,113 ----
  #define MTAG_P(CODE) \
    (TREE_CODE (CODE) == STRUCT_FIELD_TAG		\
     || TREE_CODE (CODE) == NAME_MEMORY_TAG	\
!    || TREE_CODE (CODE) == SYMBOL_MEMORY_TAG	\
!    || TREE_CODE (CODE) == CONFLICT_TAG)
  
  
  /* Nonzero if DECL represents a VAR_DECL or FUNCTION_DECL.  */
Index: tree-ssa-alias.c
===================================================================
*** tree-ssa-alias.c	(revision 118619)
--- tree-ssa-alias.c	(working copy)
*************** init_alias_info (void)
*** 915,920 ****
--- 915,925 ----
  	      || TREE_CODE (var) == SYMBOL_MEMORY_TAG
  	      || !is_global_var (var))
  	    clear_call_clobbered (var);
+ 
+ 	  /* Mark all old conflict symbols for renaming, so that they go
+ 	     away.  */
+ 	  if (TREE_CODE (var) == CONFLICT_TAG)
+ 	    mark_sym_for_renaming (var);
  	}
  
        /* Clear flow-sensitive points-to information from each SSA name.  */
*************** compute_flow_sensitive_aliasing (struct 
*** 1149,1154 ****
--- 1154,1265 ----
      }
  }
  
+ /* Element of the conflicts hashtable.  */
+ 
+ typedef struct
+ {
+   hashval_t hash;
+   tree smt1, smt2;
+ } smt_pair;
+ 
+ typedef smt_pair *smt_pair_p;
+ 
+ /* Return true if the smt pairs are equal.  */
+ 
+ static int
+ smt_pair_eq (const void *va, const void *vb)
+ {
+   const smt_pair_p a = (const smt_pair_p) va;
+   const smt_pair_p b = (const smt_pair_p) vb;
+   return (a->smt1 == b->smt1 && a->smt2 == b->smt2);
+ }
+ 
+ /* Hash for a smt pair.  */
+ 
+ static unsigned int
+ smt_pair_hash (const void *item)
+ {
+   return ((const smt_pair_p) item)->hash;
+ }
+ 
+ /* Adds conflict between SMT1 and SMT2 to the CONFLICTS table.  */
+ 
+ static void
+ add_conflict (htab_t conflicts, tree smt1, tree smt2)
+ {
+   unsigned key[2];
+   hashval_t hash;
+   void **slot;
+   smt_pair cmpelt;
+   smt_pair_p newelt;
+ 
+   if (DECL_UID (smt1) > DECL_UID (smt2))
+     {
+       tree tmp = smt1;
+       smt1 = smt2;
+       smt2 = tmp;
+     }
+ 
+   key[0] = DECL_UID (smt1);
+   key[1] = DECL_UID (smt2);
+   hash = iterative_hash (key, sizeof (key), 0);
+ 
+   cmpelt.hash = hash;
+   cmpelt.smt1 = smt1;
+   cmpelt.smt2 = smt2;
+ 
+   slot = htab_find_slot (conflicts, &cmpelt, INSERT);
+   if (*slot)
+     return;
+ 
+   newelt = XNEW (smt_pair);
+   *newelt = cmpelt;
+   *slot = newelt;
+ }
+ 
+ /* Create a new memory tag of type TYPE.
+    Does NOT push it into the current binding.  */
+ 
+ static tree
+ create_tag_raw (enum tree_code code, tree type, const char *prefix)
+ {
+   tree tmp_var;
+   tree new_type;
+ 
+   /* Make the type of the variable writable.  */
+   new_type = build_type_variant (type, 0, 0);
+   TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type);
+ 
+   tmp_var = build_decl (code, create_tmp_var_name (prefix),
+ 			type);
+   /* Make the variable writable.  */
+   TREE_READONLY (tmp_var) = 0;
+ 
+   /* It doesn't start out global.  */
+   MTAG_GLOBAL (tmp_var) = 0;
+   TREE_STATIC (tmp_var) = 0;
+   TREE_USED (tmp_var) = 1;
+ 
+   return tmp_var;
+ }
+ 
+ /* Create a new conflict tag.  */
+ 
+ static tree
+ create_conflict_tag (void)
+ {
+   var_ann_t ann;
+   tree tag = create_tag_raw (CONFLICT_TAG, void_type_node, "CONFLICT");
+ 
+   DECL_CONTEXT (tag) = current_function_decl;
+   TREE_ADDRESSABLE (tag) = 1;
+ 
+   ann = get_var_ann (tag);
+   ann->symbol_mem_tag = NULL_TREE;
+   add_referenced_var (tag);
+ 
+   return tag;
+ }
  
  /* Compute type-based alias sets.  Traverse all the pointers and
     addressable variables found in setup_pointers_and_addressables.
*************** static void
*** 1163,1168 ****
--- 1274,1283 ----
  compute_flow_insensitive_aliasing (struct alias_info *ai)
  {
    size_t i;
+   htab_t conflicts;
+   htab_iterator hi;
+   smt_pair_p conflict;
+   tree confsym;
  
    /* Initialize counter for the total number of virtual operands that
       aliasing will introduce.  When AI->TOTAL_ALIAS_VOPS goes beyond the
*************** compute_flow_insensitive_aliasing (struc
*** 1254,1282 ****
    /* Since this analysis is based exclusively on symbols, it fails to
       handle cases where two pointers P and Q have different memory
       tags with conflicting alias set numbers but no aliased symbols in
!      common.
! 
!      For example, suppose that we have two memory tags SMT.1 and SMT.2
!      such that
!      
!      		may-aliases (SMT.1) = { a }
! 		may-aliases (SMT.2) = { b }
! 
!      and the alias set number of SMT.1 conflicts with that of SMT.2.
!      Since they don't have symbols in common, loads and stores from
!      SMT.1 and SMT.2 will seem independent of each other, which will
!      lead to the optimizers making invalid transformations (see
!      testsuite/gcc.c-torture/execute/pr15262-[12].c).
! 
!      To avoid this problem, we do a final traversal of AI->POINTERS
!      looking for pairs of pointers that have no aliased symbols in
!      common and yet have conflicting alias set numbers.  */
    for (i = 0; i < ai->num_pointers; i++)
      {
        size_t j;
        struct alias_map_d *p_map1 = ai->pointers[i];
        tree tag1 = var_ann (p_map1->var)->symbol_mem_tag;
-       bitmap may_aliases1 = p_map1->may_aliases;
  
        if (PTR_IS_REF_ALL (p_map1->var))
  	continue;
--- 1369,1387 ----
    /* Since this analysis is based exclusively on symbols, it fails to
       handle cases where two pointers P and Q have different memory
       tags with conflicting alias set numbers but no aliased symbols in
!      common.  Given that we prune the sets of virtual operands based
!      on the actual shape of the memory reference in tree-ssa-operands,
!      we have no easy way how to verify that the sets of symbols will
!      always intersect.  To avoid problems, we insert artificial
!      NONLOCAL_CONFLICT symbol in all pairs of SMTs that conflict.
!      If there are too many such pairs, we insert just one NONLOCAL_CONFLICT
!      symbol in each of them, which pessimizes results, but saves memory.  */
!   conflicts = htab_create (10, smt_pair_hash, smt_pair_eq, free);
    for (i = 0; i < ai->num_pointers; i++)
      {
        size_t j;
        struct alias_map_d *p_map1 = ai->pointers[i];
        tree tag1 = var_ann (p_map1->var)->symbol_mem_tag;
  
        if (PTR_IS_REF_ALL (p_map1->var))
  	continue;
*************** compute_flow_insensitive_aliasing (struc
*** 1285,1324 ****
  	{
  	  struct alias_map_d *p_map2 = ai->pointers[j];
  	  tree tag2 = var_ann (p_map2->var)->symbol_mem_tag;
- 	  bitmap may_aliases2 = p_map2->may_aliases;
  
! 	  if (PTR_IS_REF_ALL (p_map2->var))
  	    continue;
  
! 	  /* If the pointers may not point to each other, do nothing.  */
! 	  if (!may_alias_p (p_map1->var, p_map1->set, tag2, p_map2->set, true))
! 	    continue;
  
! 	  /* The two pointers may alias each other.  If they already have
! 	     symbols in common, do nothing.  */
! 	  if (bitmap_intersect_p (may_aliases1, may_aliases2))
! 	    continue;
  
! 	  if (!bitmap_empty_p (may_aliases2))
! 	    {
! 	      unsigned int k;
! 	      bitmap_iterator bi;
  
! 	      /* Add all the aliases for TAG2 into TAG1's alias set.
! 		 FIXME, update grouping heuristic counters.  */
! 	      EXECUTE_IF_SET_IN_BITMAP (may_aliases2, 0, k, bi)
! 		add_may_alias (tag1, referenced_var (k));
! 	      bitmap_ior_into (may_aliases1, may_aliases2);
! 	    }
! 	  else
! 	    {
! 	      /* Since TAG2 does not have any aliases of its own, add
! 		 TAG2 itself to the alias set of TAG1.  */
! 	      add_may_alias (tag1, tag2);
! 	      bitmap_set_bit (may_aliases1, DECL_UID (tag2));
! 	    }
  	}
      }
    
    if (dump_file)
      fprintf (dump_file, "\n%s: Total number of aliased vops: %ld\n",
--- 1390,1434 ----
  	{
  	  struct alias_map_d *p_map2 = ai->pointers[j];
  	  tree tag2 = var_ann (p_map2->var)->symbol_mem_tag;
  
! 	  if (PTR_IS_REF_ALL (p_map2->var) || tag1 == tag2)
  	    continue;
  
! 	  if (may_alias_p (p_map1->var, p_map1->set, tag2, p_map2->set, true))
! 	    add_conflict (conflicts, tag1, tag2);
! 	}
!     }
  
!   if (htab_elements (conflicts) < (unsigned) MAX_ALIASED_VOPS)
!     {
!       FOR_EACH_HTAB_ELEMENT (conflicts, conflict, smt_pair_p, hi)
! 	{
! 	  confsym = create_conflict_tag ();
! 	  add_may_alias (conflict->smt1, confsym);
! 	  add_may_alias (conflict->smt2, confsym);
! 	}
!     }
!   else
!     {
!       bitmap csmts = BITMAP_ALLOC (NULL);
!       bitmap_iterator bi;
!       unsigned uid;
  
!       confsym = create_conflict_tag ();
  
!       FOR_EACH_HTAB_ELEMENT (conflicts, conflict, smt_pair_p, hi)
! 	{
! 	  bitmap_set_bit (csmts, DECL_UID (conflict->smt1));
! 	  bitmap_set_bit (csmts, DECL_UID (conflict->smt2));
! 	}
! 
!       EXECUTE_IF_SET_IN_BITMAP (csmts, 0, uid, bi)
! 	{
! 	  add_may_alias (referenced_var (uid), confsym);
  	}
+       BITMAP_FREE (csmts);
      }
+   htab_delete (conflicts);
    
    if (dump_file)
      fprintf (dump_file, "\n%s: Total number of aliased vops: %ld\n",
*************** is_escape_site (tree stmt)
*** 2173,2204 ****
    return NO_ESCAPE;
  }
  
- /* Create a new memory tag of type TYPE.
-    Does NOT push it into the current binding.  */
- 
- static tree
- create_tag_raw (enum tree_code code, tree type, const char *prefix)
- {
-   tree tmp_var;
-   tree new_type;
- 
-   /* Make the type of the variable writable.  */
-   new_type = build_type_variant (type, 0, 0);
-   TYPE_ATTRIBUTES (new_type) = TYPE_ATTRIBUTES (type);
- 
-   tmp_var = build_decl (code, create_tmp_var_name (prefix),
- 			type);
-   /* Make the variable writable.  */
-   TREE_READONLY (tmp_var) = 0;
- 
-   /* It doesn't start out global.  */
-   MTAG_GLOBAL (tmp_var) = 0;
-   TREE_STATIC (tmp_var) = 0;
-   TREE_USED (tmp_var) = 1;
- 
-   return tmp_var;
- }
- 
  /* Create a new memory tag of type TYPE.  If IS_TYPE_TAG is true, the tag
     is considered to represent all the pointers whose pointed-to types are
     in the same alias set class.  Otherwise, the tag represents a single
--- 2283,2288 ----
*************** create_memory_tag (tree type, bool is_ty
*** 2227,2233 ****
    return tag;
  }
  
- 
  /* Create a name memory tag to represent a specific SSA_NAME pointer P_i.
     This is used if P_i has been found to point to a specific set of
     variables or to a non-aliased memory location like the address returned
--- 2311,2316 ----
Index: tree.def
===================================================================
*** tree.def	(revision 118619)
--- tree.def	(working copy)
*************** DEFTREECODE (RESULT_DECL, "result_decl",
*** 359,364 ****
--- 359,365 ----
  DEFTREECODE (STRUCT_FIELD_TAG, "struct_field_tag", tcc_declaration, 0)
  DEFTREECODE (NAME_MEMORY_TAG, "name_memory_tag", tcc_declaration, 0)
  DEFTREECODE (SYMBOL_MEMORY_TAG, "symbol_memory_tag", tcc_declaration, 0)
+ DEFTREECODE (CONFLICT_TAG, "conflict_tag", tcc_declaration, 0)
  
  /* A namespace declaration.  Namespaces appear in DECL_CONTEXT of other
     _DECLs, providing a hierarchy of names.  */
Index: tree-ssa-operands.c
===================================================================
*** tree-ssa-operands.c	(revision 118619)
--- tree-ssa-operands.c	(working copy)
*************** get_expr_operands (tree stmt, tree *expr
*** 1873,1878 ****
--- 1873,1879 ----
      case STRUCT_FIELD_TAG:
      case SYMBOL_MEMORY_TAG:
      case NAME_MEMORY_TAG:
+     case CONFLICT_TAG:
       add_stmt_operand (expr_p, s_ann, flags);
       return;

Comment 21 Daniel Berlin 2006-11-09 15:06:06 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

On 9 Nov 2006 11:16:12 -0000, rakdver at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #20 from rakdver at gcc dot gnu dot org  2006-11-09 11:16 -------
>  > I am playing with some ideas how to fix this, unless I come up with
> something
> > soon, I will revert the patch (except for the testcase that I would like to
> > remain in the testsuite).
>
> The best I was able to do is the following patch.  Virtual operand prunning
> removes all the symbols that the SMTs have in common, which causes this PR.
> The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to
> avoid this.

This is what I was trying to do originally with multiple NONLOCAL
symbols (SMT's are going to go away) in the other testcase.

It is just too expensive generally

One thing i'm going to try later is to try to partition all the
stores/load statements and figure out how many symbols it takes to
represent the results exactly (IE one symbol for each set of
statements that must interfere with each other, where each statement
can be in multiple partitions).

IE if you had

load/store statements a, b, c, d

a interferes with c and d but not b

b interferes with d

You get partitions:

part1: {a, c, d}
part2: {b, d}

We then just create two symbols, and use those as the vdef/vuse syms.

This scheme is N^2 worst case, but you can choose to unify partitions
to cut down the number of symbols.
Partitions that have no shared members can also share symbols.

This would unify all of our points-to/access_can_touch_var/etc results
into one nice framework that gets very good results, and avoid the
virtual operand explosion, i think.

The real thing is that this is probably too expensive to compute 5
times per function.

We'll see.

Comment 22 Diego Novillo 2006-11-09 15:08:52 UTC

Subject: Re:  [4.3 Regression] Misscompilation
 of spec2006 gcc

Daniel Berlin wrote on 11/09/06 10:05:

> One thing i'm going to try later is to try to partition all the
> stores/load statements and figure out how many symbols it takes to
> represent the results exactly (IE one symbol for each set of
> statements that must interfere with each other, where each statement
> can be in multiple partitions).
> 
This is what I'm doing in memory SSA.  More details later this week 
after I'm done testing and such.  The difference is that partitioning is 
embedded in the actual SSA form and the partitioning heuristic can be 
changed independently of the renamer.  This will let us play with a 
slider-style throttling for precision/compile-time.

Comment 23 H.J. Lu 2006-11-09 15:47:11 UTC

(In reply to comment #20)
>  > I am playing with some ideas how to fix this, unless I come up with
> something
> > soon, I will revert the patch (except for the testcase that I would like to
> > remain in the testsuite).
> 
> The best I was able to do is the following patch.  Virtual operand prunning
> removes all the symbols that the SMTs have in common, which causes this PR. 
> The patch adds artificial "conflict" symbols to all pairs of aliasing SMTs, to
> avoid this.  Just looking at the dump of the testcase for this PR, it appears
> quite expensive (there are a lot of new virtual operands); I will check what
> the memory behavior is on larger testcases.
> 

It failed during bootstrap:

/net/gnu-13/export/gnu/src/gcc/gcc/gcc/tree-dfa.c: In function âfind_referenced_varsâ:
/net/gnu-13/export/gnu/src/gcc/gcc/gcc/tree-dfa.c:99: internal compiler error: in mark_operand_necessary, at tree-ssa-dce.c:261
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
make[5]: *** [tree-dfa.o] Error 1

Comment 24 Daniel Berlin 2006-11-09 17:22:32 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

On 11/9/06, Diego Novillo <dnovillo@redhat.com> wrote:
> Daniel Berlin wrote on 11/09/06 10:05:
>
> > One thing i'm going to try later is to try to partition all the
> > stores/load statements and figure out how many symbols it takes to
> > represent the results exactly (IE one symbol for each set of
> > statements that must interfere with each other, where each statement
> > can be in multiple partitions).
> >
> This is what I'm doing in memory SSA.  More details later this week
> after I'm done testing and such.  The difference is that partitioning is
> embedded in the actual SSA form and the partitioning heuristic can be
> changed independently of the renamer.  This will let us play with a
> slider-style throttling for precision/compile-time.
>
Right, but the difference is, In the scheme i propose, you'd never
have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do.
IE we wouldn't run into all the problems mem-ssa is going to bring in
this regard.

Comment 25 Diego Novillo 2006-11-09 17:38:28 UTC

Subject: Re:  [4.3 Regression] Misscompilation
 of spec2006 gcc

Daniel Berlin wrote on 11/09/06 12:22:

> Right, but the difference is, In the scheme i propose, you'd never
> have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do.
> IE we wouldn't run into all the problems mem-ssa is going to bring in
> this regard.

No, that's not right.  Overlapping live-ranges are not a problem until 
you hit a PHI node.  That's where currently mem-ssa is having 
difficulties with.

We can use those static partitions at key points during SSA renaming. 
Since the partitions are completely unrelated to the renamer, we can 
experiment with different partitioning schemes.  It's actually even 
possible to arrive to a perfect partitioning scheme that doesn't 
introduce false positive dependencies.

More details to follow.

Comment 26 Zdenek Dvorak 2006-11-09 18:03:44 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

> > Right, but the difference is, In the scheme i propose, you'd never
> > have overlapping live ranges of vuse/vdefs, and in mem-ssa, you do.
> > IE we wouldn't run into all the problems mem-ssa is going to bring in
> > this regard.
> 
> No, that's not right.  Overlapping live-ranges are not a problem until 
> you hit a PHI node.  That's where currently mem-ssa is having 
> difficulties with.

well, in any case, Daniel's proposal has advantage that it is much less
intrusive than mem-ssa -- does not need to change ssa renaming at all,
probably needs much less changes to operand scanning, and does not need
any changes to optimizations that assume vops are in FUD form (i.e.,
that the life ranges of vops do not overlap).  If he could create (or help
someone create) a working prototype in reasonable time (few weeks?),
I would very much like to see it compared with mem-ssa before mem-ssa
branch is merged.

Comment 27 Daniel Berlin 2006-11-09 18:21:38 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

A detailed proposal:

So here is what i was thinking of.  When i say symbols below, I mean
"some VAR_DECL or structure that has a name" (like our memory tags
do).  A symbol is *not* a real variable that occurred in the user
program.  When I say "varaible" i mean a variable that occurred in the
user program.

The real problem with our alias system in terms of precision, and
often in terms of number of useless vops, is that we are trying to use
real, existing, variables, to approximate the portions of the heap a
statement accesses.

When things access portions of the heap we can't see (nonlocal
variables), we fall down badly in terms of precision because we can
eliminate every single local variable as an alias, and need to then
just say it accesses "some nonlocal variable".  This causes precision
problems because it means that statements accessing nonlocal variables
that we can *prove* don't interfere, still currently share a VUSE
between them.

We also have useless vops whenever we have points-to sets that
intersect between all statements that interfere, because we end up
adding aliases for you can eliminate the members of the alias set

We also currently rely on operand-scan time pruning, which is very ugly.

There is a way to get the minimal number of vuses/vdefs necessary to
represent  completely precise (in terms of info we have) aliasing,
while still being able to degrade the precision gracefully in order to
enable the vuses/vdefs necessary to go down

The scheme i propose *never* has overlapping live ranges of the
individual symbols, even though the symbols may represent pieces of
the same heap.

In other words, you can rely on the fact that once an individual
symbol has a new version, there will never be a vuse of an old version
of that symbol.



The current vdef/vuse scheme consists of creating memory tags to
represent portions of the heap.  When a memory tag has aliases, we use
it's alias list to generate virtual operands.  When a memory tag does
not have aliases, we generate a virtual operand of the base symbol.

The basic idea in the new scheme is to never have a list of aliases
for a symbol representing portions of the heap.  The symbols
representing portions of the heap are themselves always the target of
a vuse/vdef.  The aliases they represent is immaterial (though we can
keep a list if something wants it).

This enables us to have a smaller number of vops, and have something
else generate the set of symbols in a precise manner, rather than have
things like the operand scanner try to post process it.

The symbols are also attached to the load/store statements, and not to
the variables.

The operand renamer only has to add vuses/vdefs for all the symbols
attached to a statement, and it is done.

In the simplest, dumb, non-precise version of this scheme, this means
you only have one symbol, called "MEM", and generate vuse/vdefs
linking every load/store together.

In the absolute most-precise version of this scheme, you partition the
loads/store conflicts in statements into symbols that represent
statement conflictingness.

In a completely naive, O(N^3) version, the following algorithm will
work and generate completely precise results:

Collect all loads and stores into a list (lslist)
for each statement in lslist (x):
  for each statement in lslist (y):
    if x conflicts with y:
       if there is no partition for x, y, create a new one containing x and y.
       otherwise
        for every partition y belongs to:
              if all members of this partition have memory access that
conflicts with x:
               add x to this partition
             otherwise
              create a new partition containing all members of the
partition except the ones x does not conflict with.
              add x to this partition


This is a very very slow way to do it, but it should be clear (there
are much much much faster ways to do this).

Basically, a single load/store statement can belong to multiple
partitions.  All members of a given partition conflict with each
other.

given the following set of memory accesses statements:

a, b, c, d

where:
a conflicts with b and c
b conflicts with c and d
c conflicts with a and b
d conflicts with a and c

you will end up with 3 partitions:
part1: {a, b, c}
part2: {b, c, d}
part3: {d, a, c}

statement c will conflict with every member of partition 1 and thus
get partition 1, rather than a new partition.

You now create symbols for each partition, and for each statement in
the partition, add the symbol to it's list.

Thus, in the above example we get

statement a - symbols: MEM.PART1, MEM.PART3
statement b - symbols: MEM.PART1, MEM.PART2
statement c - symbols: MEM.PART1, MEM.PART2, MEM.PART3
statement d - symbols MEM.PART2, MEM.PART3

As mentioned before, the operand renamer simply adds a vdef/vuse for
each symbol in the statement list.

Note that this is the minimal number of symbols necessary to precisely
represent the conflicting accesses.

If the number of partitions grows large (and thus requires too many
symbols), you can simply union any partitions you like.

As long as the partitions contain *at least* the real conflicts for
those statements, all you would do is add false aliasing.

Note that while the symbols partitioning the heap are not necessarily
disjoint in terms of parts of the heap they represent, the vuse/vdefs
for an individual symbol will never overlap, just like our current
representation.

Comment 28 Diego Novillo 2006-11-09 19:15:46 UTC

(In reply to comment #26)

> I would very much like to see it compared with mem-ssa before mem-ssa
> branch is merged.
> 
Notice that the two approaches do not negate each other.  Dan's proposal is a smarter partitioning than what the current alias grouping mechanism  tries to do.  We can actually have memory SSA on top of Dan's partitioning scheme.  Memory SSA will use that partitioning scheme when placing memory PHI nodes.

The two approaches are orthogonal.  Memory SSA simply adds a new degree of factoring that gives you sparser UD chains.  It also gives you exactly one name per store, without reducing precision.

Comment 29 Zdenek Dvorak 2006-11-09 19:41:39 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

> > I would very much like to see it compared with mem-ssa before mem-ssa
> > branch is merged.
> > 
> Notice that the two approaches do not negate each other.  Dan's proposal is a
> smarter partitioning than what the current alias grouping mechanism  tries to
> do.  We can actually have memory SSA on top of Dan's partitioning scheme. 
> Memory SSA will use that partitioning scheme when placing memory PHI nodes.
> 
> The two approaches are orthogonal.  Memory SSA simply adds a new degree of
> factoring that gives you sparser UD chains.  It also gives you exactly one name
> per store, without reducing precision.

nevertheless, it is not obvious to me whether using mem-ssa over Daniel's
proposal would bring any significant gains, which I would like to have
verified before we introduce milion new bugs with mem-ssa (nothing
personal, it simply is too large and too intrusive change not to bring
any).

Comment 30 Diego Novillo 2006-11-09 19:48:09 UTC

(In reply to comment #29)

> nevertheless, it is not obvious to me whether using mem-ssa over Daniel's
> proposal would bring any significant gains, which I would like to have
> 
Of course.  If you are interested in the compile time benefits of a partitioning scheme, you can actually try the one we already have by forcing alias grouping more aggressively (--param max-aliased-vops).

The current grouping is very dumb and will create tons of false positives.  Daniel's approach will try to reduce false positives while bringing down the number of virtual operators per memory statement.

Memory SSA brings down the number of virtual operators to exactly one per statement.


> verified before we introduce milion new bugs with mem-ssa (nothing
> personal, it simply is too large and too intrusive change not to bring
> any).
>
Intrusive?  Well, the only pass that was wired to the previous virtual operator scheme was PRE.  DSE is also wired but to a lesser extent.  No other optimization had to be changed for mem-ssa.  It's obviously intrusive in the renamer, but that's it.

Comment 31 Daniel Berlin 2006-11-09 21:28:53 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

>
> Memory SSA brings down the number of virtual operators to exactly one per
> statement.

However, it does so in a way that makes the traditional things that
actually want to do cool memory optimizations, harder.

I'm still on the fence over whether it's a good idea or not.
>
>
> > verified before we introduce milion new bugs with mem-ssa (nothing
> > personal, it simply is too large and too intrusive change not to bring
> > any).
> >
> Intrusive?  Well, the only pass that was wired to the previous virtual operator
> scheme was PRE.  DSE is also wired but to a lesser extent.  No other
> optimization had to be changed for mem-ssa.  It's obviously intrusive in the
> renamer, but that's it.

Uh, LIM and store sinking are too.  Roughly all of our memory optimizations are.

The basic problem is in mem-ssa that vdefs and vuses don't accurately
reflect what symbols are being defined and used anymore.  They
represent the factoring of a use and definition of a whole bunch of
symbols.

Things like PRE and DSE break not because they are "wired to the
previous virtual operator scheme" so much, but because they rely on
the virtual use/def chains accurately representing where a symbol
representing a memory access dies.  In mem-ssa, you have VDEF's of the
same symbol all over the place.

The changes i have to make to PRE (and to the other things) to account
for this is actually to rebuild the non-mem-ssa-factored (IE the
current factored) form out of the chains by seeing what symbols they
really affect.

This is going to be expensive, and IMHO, is what almost all of our SSA
memory optimizations are going to have to do.

So while mem-ssa doesn't affect *precision*, it does affect how you
can use the chains in a very significant way.

For at least all the opts i see us doing, it makes them more or less
useless without doing things (like reexpanding them) first. Because
this is true, I'm not sure it's a good idea at all, which is why i'm
still on the fence.

Comment 32 Daniel Berlin 2006-11-09 21:29:36 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

> In mem-ssa, you have VDEF's of the
> same symbol all over the place.
>
^^^^ version of a symbol

Comment 33 Diego Novillo 2006-11-09 21:48:24 UTC

Subject: Re:  [4.3 Regression] Misscompilation
 of spec2006 gcc

dberlin at dberlin dot org wrote on 11/09/06 16:28:

> Uh, LIM and store sinking are too.  Roughly all of our memory
> optimizations are.
> 
They are?  Really?  Can you show me where exactly?

> The changes i have to make to PRE (and to the other things) to
> account for this is actually to rebuild the non-mem-ssa-factored (IE
> the current factored) form out of the chains by seeing what symbols
> they really affect.
> 
OK, so how come you were so positive about the new interface?  I need to
understand what was the great difficulty you ran into that made you 
change your mind.  I need to see a specific example.

See, the UD chains you get in mem-ssa are neither incomplete nor wrong.
The symbols affected are right there in plain sight, so there is no
loss of any information.

> For at least all the opts i see us doing, it makes them more or less 
> useless without doing things (like reexpanding them) first. Because 
> this is true, I'm not sure it's a good idea at all, which is why i'm 
> still on the fence.
> 
But you still haven't *shown* me where the hardness or slowness comes 
in.  Granted, the work is still unfinished so we can't really do actual 
measurements.  But I need to understand where the difficulties will be 
so that I can accomodate the infrastructure.  It's obviously not my 
intent to make things harder to use.

Comment 34 Daniel Berlin 2006-11-10 00:03:17 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

On 9 Nov 2006 21:48:25 -0000, dnovillo at redhat dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #33 from dnovillo at redhat dot com  2006-11-09 21:48 -------
> Subject: Re:  [4.3 Regression] Misscompilation
>  of spec2006 gcc
>
> dberlin at dberlin dot org wrote on 11/09/06 16:28:
>
> > Uh, LIM and store sinking are too.  Roughly all of our memory
> > optimizations are.
> >
> They are?  Really?  Can you show me where exactly?

They won't get incorrect results. They will get less good results than
they do now.

Take LIM and store motion (sorry, i meant SM, not store sinking)
determine_max_movement relies on the VIRTUAL_KILL information to
determine what must be moved together.  Because things that would
previously kill different symbol versions, now will kill the same
symbol version (due to being factored), it will add false dependencies
unless someone changes it.

Take the following example:
int e,f,g,h;
int main(int argc)
{
  int *a, *b, *d;
 int i;
 a = &e;
 if (argc)
   a = &f;
 b = &g;
 if (argc)
   b = &h;
 d = &e;
 if (argc)
   d = &h;
 for (i = 0; i < 50; i++)
   {
     *a = 1;
     *b = 2;
     *d = 3;
   }
}

previously, you would have ...
 #   e_22 = V_MAY_DEF <e_14>;
  #   f_23 = V_MAY_DEF <f_15>;
  *a_1 = 1;
  #   g_24 = V_MAY_DEF <g_16>;
  #   h_25 = V_MAY_DEF <h_17>;
  *b_2 = 2;
  #   e_26 = V_MAY_DEF <e_22>;
  #   h_27 = V_MAY_DEF <h_25>;
  *d_3 = 3;

Note that *a and *b do not vdef any symbol that is the same.

mem-ssa gives you (as of today):

  # .MEM_16 = VDEF <.MEM_14, f_20>
  *a_1 = 1;
  # .MEM_17 = VDEF <.MEM_14, g_21>
  *b_2 = 2;
  # .MEM_18 = VDEF <.MEM_16, .MEM_17>
  *d_3 = 3;

note that now *a and *b both vdef MEM_14.

SM is going to say these stores are dependent on each other because
they "kill" the same version of the same variable unless you teach it
to look at the get_loads_and_stores bitmap.
Previously, it would not.

> > The changes i have to make to PRE (and to the other things) to
> > account for this is actually to rebuild the non-mem-ssa-factored (IE
> > the current factored) form out of the chains by seeing what symbols
> > they really affect.
> >
> OK, so how come you were so positive about the new interface?

When have i been overabundantly positive?
I said I'd deal with it.  I'm neither here nor there.  I've relied on
your statements that you believe it will make things significantly
better without loss of precision.  We are going to pay a cost in time
for passes to make use of this information. I believed you were aware
of this.

>  I need to
> understand what was the great difficulty you ran into that made you
> change your mind.  I need to see a specific example.
>
> See, the UD chains you get in mem-ssa are neither incomplete nor wrong.
Nobody said they are wrong, but I would actually argue they are no
longer really the same as SSA in a way that matters, if you want to
pick nits.
SSA variables have not only the property that they are *assigned to*
once, but the property that they are *defined* by a single operation.
Our current virtual operand scheme has this property.
Yours does not, because variables may be defined multiple times,
although they are still singley assigned.
You can argue this is not a requirement of SSA.  Honestly, it makes no
difference to me.  The upshot is that by losing this property, you
make them less useful than they could be.

> The symbols affected are right there in plain sight, so there is no
> loss of any information.

Errr, last time i looked, you had to call a function to get the
*actual* list of loads or stores affected by a statement.
Why does this matter?

All of our memory optimizations are trying to figure out three things:

1. Will two memory accesses return the same results (PRE, DSE)?
2. Do these two memory accesses have a dependency (PRE, SM, DSE, LIM).
3. If I hoist this memory to block X, will it still have the same
value as it does in block Y (PRE, SM, LIM).

Your stuff has no real affect on #2, but it makes #1 and #3 harder
(not less precise, mind you), at the cost of reducing the number of
operands.

It does so by factoring stores in a way that disables the ability to
see where loads are not *currently*, but *could*, validly be live.  In
other words, it was previously possible to simply determine whether
two stores touched the same piece of memory by looking at the
versions.  This is no longer true.

PRE does #1 and #3 through value numbering memory and tracking the
distinct live ranges of virtual variables.
 SM and LIM do #3 by simply using dependency information to determine
what needs to be moved together and grouping operations that vdef the
same variable together.

SM and LIM will thus still get *correct* information in the above
example, but it's going to get false dependencies due to defs of the
same mem version, unless something disambiguates between those store
dependencies by looking at the underlying loads and stores involved.

Previously, you could value number memory (and thus answer questions
#1) simply by making a list of all the virtual variable  versions
associated with it, and the value number of the underlying accesses.
You could see whether these value numbers would still be *obviously*
be valid affected moving above or below a certain store *just by
seeing what  virtual variable versions were defined by that statement
and whether it was a vuse version used by one of the value numbers in
your statements*.

It is no longer possible to get a precise answer this way when you
factor stores the way you do, because the name of the virtual variable
defined by a store is actually no longer actually a function of what
symbols it defines anymore.
Take the above case.
If we simply use virtual variable versions to value number memory, we
will believe that *a and *b are possible stores to the same piece of
memory, even though they are not.

In order to get the *correct* answer in the presence of phi nodes, we
currently collect the variables the phi nodes were joining together,
and consider a def of the phi result to be a def of the versions
joined by the phi

IE
if you had

phi_result_3  = <a_1, b_2>
VUSE <phi_result_3>
*foo = a
would be considered a kill of any value numbers containing vuses of
either a_1 or b_2.

So how do i plan on recovering the information of figuring out where
things can be moved to?

Well, essentially, PRE wants a a set of "live memory variable
versions" for a given statement so it can see where things can be
moved to.

I am going to start out at the top of the program, and walk in
dominator order, generating a bitmap.

We initially assume the version of everything in the
load_store_value_bitmap is 0.

Every time we see a vdef, we call get_loads_and_stores.   For each
variable in the underlying load and store bitmap, we remove the bit
for the old version's id from load_store_value_bitmap, and add a bit
for the new version's id.

Every time we see a vuse, we use the current load_store_value_bitmap
as part of it's value number, and attach the bitmap to the current
statement.

We attach the load_store_value_bitmap to the top and bottom of each block.

Every time we see a phi node, we generate new versions for each of the
loaded  and stored variables of the statements that this phi node is
joining.

We can translate a value upwards through phis by looking at what bits
in the bitmaps changed for each edge.

So basically, i'm doing virtual SSA renaming and storing some of the
reaching information persistently.

This is how Load PRE was done in the SSAPRE infrastructure, and it
works out okay.
But it really does mean that we aren't using the U-D chains anymore
for loads and stores, because we can't.

>
> > For at least all the opts i see us doing, it makes them more or less
> > useless without doing things (like reexpanding them) first. Because
> > this is true, I'm not sure it's a good idea at all, which is why i'm
> > still on the fence.
> >
> But you still haven't *shown* me where the hardness or slowness comes
> in.

How not?
As i've told you before, and showed you before, it was previously the
case we could value number memory directly and determine whether it
was going to be legal in a given block. mem-ssa breaks this.
It was previously possible to accurately and quickly determine where
stores to memory that conflicted are just by looking at versions. This
is no longer true.  This in turn removes the ability to determine
where new loads can be placed simply by looking at VDEF RHS variables.

I've reexplained this above.

>  Granted, the work is still unfinished so we can't really do actual
> measurements.  But I need to understand where the difficulties will be
> so that I can accomodate the infrastructure.  It's obviously not my
> intent to make things harder to use.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>

Comment 35 Daniel Berlin 2006-11-10 00:12:03 UTC

Subject: Re:  [4.3 Regression] Misscompilation of spec2006 gcc

> Take the above case.
> If we simply use virtual variable versions to value number memory, we
> will believe that *a and *b are possible stores to the same piece of
> memory, even though they are not.

In case it's not clear, this means while previously you would have
determined you can move *b above *a simply by looking at the RHS, you
can't anymore.


My guess:
Any memory optimization that wants to just eliminate redundant loads
will be just fine with mem-ssa,.
Any memory optimization that wants to eliminate redundant stores will
have to be redesigned to not use chains to get maximum precision.
Any memory optimization that wants to hoist either loads or stores
will have to be redesigned to not use chains to determine where things
can be moved to, in order to get maximum precision.

At least, that's the way it appears to me.

Comment 36 Zdenek Dvorak 2006-11-12 17:33:19 UTC

(In reply to comment #19)
> Created an attachment (id=12574) [edit]
> A patch
> 
> This reverts the patch which triggers the problem and adds a testcase. I
> am running SPEC CPU 2006 now.

I am going to commit this patch for now (once it passes bootstrap & regtesting).

Comment 37 Zdenek Dvorak 2006-11-13 12:37:44 UTC

Subject: Bug 29680

Author: rakdver
Date: Mon Nov 13 12:37:29 2006
New Revision: 118754

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118754
Log:
	PR tree-optimization/29680
	* tree-ssa-operands.c (access_can_touch_variable): Revert fix for
	PR 14784.

	* gcc.dg/alias-11.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/alias-11.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-operands.c

Comment 38 Andrew Pinski 2006-11-13 13:54:04 UTC

Fixed.