Bug 87843 - [9 Regression] SPEC miscompilation of 403.gcc and 502.gcc_r benchmarks
Summary: [9 Regression] SPEC miscompilation of 403.gcc and 502.gcc_r benchmarks
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: 9.0
Assignee: Jan Hubicka
URL:
Keywords: wrong-code
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2018-11-01 10:09 UTC by Martin Liška
Modified: 2018-11-15 13:03 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 8.2.0
Known to fail: 9.0
Last reconfirmed: 2018-11-01 00:00:00


Attachments
reproducer (46.53 KB, text/plain)
2018-11-02 11:56 UTC, Jan Hubicka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Liška 2018-11-01 10:09:30 UTC
Started with the same revision as PR87830, I see segfault of the benchmark.
Comment 1 Jan Hubicka 2018-11-01 11:21:07 UTC
Would be possible to analyze this a bit?  The patch does have effect on optimizers because we produce a lot fewer MEM_REFs on type mismatches. Of course this should not trigger wrong code but it also may be some bug in benchmark or so.
Comment 2 Martin Liška 2018-11-01 11:32:26 UTC
(In reply to Jan Hubicka from comment #1)
> Would be possible to analyze this a bit?

I would leave it to you. Note that the same happens for SPEC2006 403.gcc benchmark:

$ gdb --args /home/marxin/Programming/cpu2006/benchspec/CPU2006/403.gcc/run/run_peak_ref_amd64-m64-mine.0001/gcc_peak.amd64-m64-mine /home/marxin/Programming/cpu2006/benchspec/CPU2006/403.gcc/data/ref/input/scilab.in

$ Breakpoint 1, remove_useless_values () at cselib.c:394
394	    abort ();
(gdb) bt
#0  remove_useless_values () at cselib.c:394
#1  cselib_process_insn (insn=0x1200a40) at cselib.c:1377
#2  0x000000000046fd48 in reload_cse_regs_1 (first=<optimized out>) at reload1.c:8172
#3  0x00000000004700db in reload_cse_regs (first=0xd928c0) at reload1.c:8186
#4  0x000000000044a3ec in rest_of_compilation (decl=0x954000) at toplev.c:3254
#5  0x00000000005edd56 in c_expand_body.part.1.lto_priv.1473 (fndecl=fndecl@entry=0x954000, nested_p=nested_p@entry=0, can_defer_p=can_defer_p@entry=1) at c-decl.c:7119
#6  0x00000000005ee710 in c_expand_body (can_defer_p=1, nested_p=0, fndecl=0x954000) at c-decl.c:7024
#7  finish_function (nested=0, can_defer_p=1) at c-decl.c:6986
#8  0x00000000006072c0 in yyparse_1 () at c-parse.c:2186
#9  0x0000000000402fec in yyparse () at c-lex.c:164
#10 compile_file () at toplev.c:2126
#11 do_compile () at toplev.c:5221
#12 toplev_main (argv=<optimized out>, argc=<optimized out>) at toplev.c:5255
#13 main (argc=<optimized out>, argv=<optimized out>) at main.c:35


The patch does have effect on
> optimizers because we produce a lot fewer MEM_REFs on type mismatches. Of
> course this should not trigger wrong code but it also may be some bug in
> benchmark or so.

I doubt that.
Comment 3 Martin Liška 2018-11-01 11:35:07 UTC
For 403.gcc one only needs: -O2 -g -flto=8
Comment 4 Jan Hubicka 2018-11-01 14:09:23 UTC
It indeed looks interesting. 
- Crash indeed goes away if one disables type merging.
- Fat LTO objects are OK so it does not seem to be free lang data confusing early opts.
- remove_useless_values looks identical in both binaries.

So it indeed seems to be consequence of having fewer MEM_REFs in the stmt stream.  I will try to localize it better.

Honza
Comment 5 Jan Hubicka 2018-11-01 15:22:13 UTC
I think this is caused by misoptimizing
     void **x;
     void *info ATTRIBUTE_UNUSED;
{
  cselib_val *v = (cselib_val *)*x;
  struct elt_loc_list **p = &v->locs;
  int had_locs = v->locs != 0;

  while (*p)
    {
      if (references_value_p ((*p)->loc, 1))
        unchain_one_elt_loc_list (p);
      else
        p = &(*p)->next;
    }

  if (had_locs && v->locs == 0)
    {
      n_useless_values++;
      values_became_useless = 1;
    }
  return 1;
}

where fre1 after type cleaning concludes that v->locs is unchanged while it is unchanged in:

static void
unchain_one_elt_loc_list (pl)
     struct elt_loc_list **pl;
{
  struct elt_loc_list *l = *pl;

  *pl = l->next;
  l->next = empty_elt_loc_lists;
  empty_elt_loc_lists = l;
}

So it seems that in some cases alias analysis gets lost in pointer dereferences
while we are still in early optimization. 

Perhaps we want to produce alias sets earlier?

Honza
Comment 6 Richard Biener 2018-11-02 09:07:21 UTC
If we have less MEM_REFs then we probably strip them because we think they
reference equal types.

I think I already told you that given that MEM_REFs use pointer types
to carry alignment info _those_ may not become incomplete!  But I didn't
expect that to cause wrong-code but missed optimizations.
Comment 7 Jan Hubicka 2018-11-02 11:28:14 UTC
> If we have less MEM_REFs then we probably strip them because we think they
> reference equal types.
> 
> I think I already told you that given that MEM_REFs use pointer types
> to carry alignment info _those_ may not become incomplete!  But I didn't
> expect that to cause wrong-code but missed optimizations.

We do not make them incomplete.  The problem actually seems to be in
early optimization where we optimize out the if conditional above.
Not sure why -ffat-lto-objects worked in this context.
Comment 8 Richard Biener 2018-11-02 11:42:21 UTC
(In reply to Jan Hubicka from comment #7)
> > If we have less MEM_REFs then we probably strip them because we think they
> > reference equal types.
> > 
> > I think I already told you that given that MEM_REFs use pointer types
> > to carry alignment info _those_ may not become incomplete!  But I didn't
> > expect that to cause wrong-code but missed optimizations.
> 
> We do not make them incomplete.  The problem actually seems to be in
> early optimization where we optimize out the if conditional above.
> Not sure why -ffat-lto-objects worked in this context.

So can you attach preprocessed source for the affected file?  And name the
affected function? (dump is stripped too early)
Comment 9 Jan Hubicka 2018-11-02 11:56:32 UTC
Created attachment 44946 [details]
reproducer

I am attaching the preprocessed file and will be away till 2pm.
What seems to be wrong is that we optimize out decrease of n_useless_values in discard_useless_locs.
fre1 already differs:

 discard_useless_locs (void * * x, void * info)
 {
   struct elt_loc_list * l;
@@ -8886,9 +8895,6 @@
   struct rtx_def * _4;
   int _5;
   struct elt_loc_list * _7;
-  struct elt_loc_list * _8;
-  int n_useless_values.142_9;
-  int _10;
   struct elt_loc_list * _25;
   struct elt_loc_list * empty_elt_loc_lists.98_26;
 
@@ -8947,27 +8953,6 @@
 
   <bb 7> :
   # DEBUG BEGIN_STMT
-  if (_1 != 0B)
-    goto <bb 8>; [INV]
-  else
-    goto <bb 10>; [INV]
-
-  <bb 8> :
-  _8 = v_16->locs;
-  if (_8 == 0B)
-    goto <bb 9>; [INV]
-  else
-    goto <bb 10>; [INV]
-
-  <bb 9> :
-  # DEBUG BEGIN_STMT
-  n_useless_values.142_9 = n_useless_values;
-  _10 = n_useless_values.142_9 + 1;
-  n_useless_values = _10;
-  # DEBUG BEGIN_STMT
-  values_became_useless = 1;
-
-  <bb 10> :
   # DEBUG BEGIN_STMT
Comment 10 Richard Biener 2018-11-02 12:49:42 UTC
I can only see that v->locs might be affected by fld because the type of the
FIELD_DECL changes but the (alias) type of *p_11 remains the same.  Thus
we have get_alias_set (ptr-to-incomplete) and get_alias_set (ptr-to-complete)
not agreeing.  But of course they have to.

I guess we can make a two-unit LTO testcase like the following - but it
doesn't miscompile since we end up substituting the MEM_REF base type
for the store in foo() somehow so even with more fiddling I always get

  *p_5 = &a;
  py ={v} &y;
  _1 ={v} py;
  MEM[(struct Y *)_1].p = &b;
^^^ will not use the alias set of .p

  _2 = *p_5;




struct X;
struct Y { struct X *p; };

void foo (struct Y *p, struct X *v)
{
  p->p = v;
}

---

struct X { int i; };
struct Y { struct X *p; };

void foo (struct Y *, struct X *);
struct X ** volatile px;
struct X a, b;
int main()
{
  struct Y y;
  px = &y.p;
  struct X **p = px; 
  *p = &a;
  foo (&y, &b);
  if (*p != &b)
    __builtin_abort ();
  return 0;
}
Comment 11 Richard Biener 2018-11-02 12:54:01 UTC
That said - we used to give all pointer types the same alias-set but you somehow convinced yourself that not doing that is safe.  Even when considering pointer-to-complete and pointer-to-incomplete types.  Do you remember any details?
Comment 12 Richard Biener 2018-11-02 13:03:06 UTC
OK, so in GCC 8 at least pointer-to-incomplete type gets the alias set of void * and that conflicts with any other pointer.  So that works.

Not sure what breaks here now...
Comment 13 Richard Biener 2018-11-02 13:17:28 UTC
So the alias machinery disambiguates them at

static bool
indirect_refs_may_alias_p (tree ref1 ATTRIBUTE_UNUSED, tree base1,
                           poly_int64 offset1, poly_int64 max_size1,
...
  /* Do type-based disambiguation.  */
  if (base1_alias_set != base2_alias_set
      && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
    return false;

where base2_alias_set == ref2_alias_set from *p_11 and
ref1_alias_set == ref2_alias_set but base1_alias_set == 22 (from v_16->locs).

And somehow the alias-set for *v_16 doesn't have v_16->locs as child
(well, it probably has the pointer-to-complete one as child since we built
the alias-set for *v_16 _before_ adjusting the FIELD_DECLs type!?)
Comment 14 Richard Biener 2018-11-02 13:27:24 UTC
The following does _not_ fix it (but an assert that the alias-set is -1 does trigger).  We probably have to adjust all types the record parent is embedded
into as well for which there's no easy way.

Well.  Not compute any alias-sets before free-lang-data .... -Wstrict-aliasing computes it for example, so does folding, for example in make_bit_field_ref
(in fact that seems to be the only caller...).

diff --git a/gcc/tree.c b/gcc/tree.c
index 069d62d51be..47cbbaab9b5 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5515,7 +5515,10 @@ free_lang_data_in_decl (tree decl, struct free_lang_data_d *fld)
     }
   else if (TREE_CODE (decl) == FIELD_DECL)
     {
+      tree orig = TREE_TYPE (decl);
       TREE_TYPE (decl) = fld_simplified_type (TREE_TYPE (decl), fld);
+      if (TREE_TYPE (decl) != orig)
+       TYPE_ALIAS_SET (DECL_CONTEXT (decl)) = -1;
       DECL_INITIAL (decl) = NULL_TREE;
     }
   else if (TREE_CODE (decl) == TRANSLATION_UNIT_DECL
Comment 15 Richard Biener 2018-11-02 13:34:14 UTC
But the following fixes it:

diff --git a/gcc/alias.c b/gcc/alias.c
index 7963ece291a..4c88c0980d3 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1235,14 +1235,14 @@ record_component_aliases (tree type)
               Accesses to it conflicts with accesses to any other pointer
               type.  */
            tree t = TREE_TYPE (field);
-           if (in_lto_p)
+           if (1)
              {
                /* VECTOR_TYPE and ARRAY_TYPE share the alias set with their
                   element type and that type has to be normalized to void *,
                   too, in the case it is a pointer. */
                while (!canonical_type_used_p (t) && !POINTER_TYPE_P (t))
                  {
-                   gcc_checking_assert (TYPE_STRUCTURAL_EQUALITY_P (t));
+                   gcc_checking_assert (!in_lto_p || TYPE_STRUCTURAL_EQUALITY_P (t));
                    t = TREE_TYPE (t);
                  }
                if (POINTER_TYPE_P (t))
Comment 16 Martin Liška 2018-11-15 12:24:36 UTC
I don't see the miscompilation any longer, may I close it?
Comment 17 Jan Hubicka 2018-11-15 12:31:11 UTC
> I don't see the miscompilation any longer, may I close it?
Yes, it was fixed by 
                                                                                
        * tree.c (fld_type_variant): Copy canonical type.                       
        (fld_incomplete_type_of): Check that canonical types looks sane;        
        copy canonical type.                                                    
        (verify_type): Accept when incomplete type has complete canonical type.
Comment 18 Richard Biener 2018-11-15 13:03:25 UTC
Fixed.