103246 – [12 Regression] 416.gamess miscompare with -O2 -g -flto=auto since r12-5223-gecdf414bd89e6ba251f6b3f494407139b4dbae0e

Bug 103246 - [12 Regression] 416.gamess miscompare with -O2 -g -flto=auto since r12-5223-gecdf414bd89e6ba251f6b3f494407139b4dbae0e

Summary: [12 Regression] 416.gamess miscompare with -O2 -g -flto=auto since r12-5223-g...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	12.0

Importance:	P1 normal
Target Milestone:	12.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	lto, wrong-code

Depends on:
Blocks:	spec
	Show dependency tree / graph

Reported:	2021-11-15 10:22 UTC by Martin Liška
Modified:	2021-11-18 19:10 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	12.0
Last reconfirmed:	2021-11-15 00:00:00

Attachments
Bad optimized dump (128.10 KB, text/plain) 2021-11-15 10:39 UTC, Martin Liška	Details
Good optimized dump (128.36 KB, text/plain) 2021-11-15 10:40 UTC, Martin Liška	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Liška 2021-11-15 10:22:31 UTC

Since the revision --size=train miscompares, I was able to track that to 2 LTO object files that are needed for it to happen:

ecpder.fppized.o
ecplib.fppized.o

Comment 1 Martin Liška 2021-11-15 10:39:12 UTC

One can reproduce it in build dir:

$ cp /cpu2006/benchspec/CPU2006/416.gamess/data/train/input/* .
$ timeout 10 ./gamess < h2ocu2+.energy.config
...
   3  2     -270.604580685     0.000920175   0.040597819   0.071110441       68854147       9774

-270.6 is a good value.

while

   3  2     -276.344499030    -0.114881130   0.489865944   0.157922540       69406341       3920

is a bad value.

Comment 2 Martin Liška 2021-11-15 10:39:35 UTC

Created attachment 51794 [details]
Bad optimized dump

Comment 3 Martin Liška 2021-11-15 10:40:08 UTC

Created attachment 51795 [details]
Good optimized dump

Comment 4 Martin Liška 2021-11-15 10:40:38 UTC

As seen the change triggers one new ISRA clone:

diff -u gamess.ltrans0.ltrans.250t.optimized.good gamess.ltrans0.ltrans.250t.optimized.bad | head
--- gamess.ltrans0.ltrans.250t.optimized.good	2021-11-15 11:32:03.388858701 +0100
+++ gamess.ltrans0.ltrans.250t.optimized.bad	2021-11-15 11:33:06.468521265 +0100
@@ -1,5 +1,5 @@
 
-;; Function dco (dco_, funcdef_no=0, decl_uid=4939, cgraph_uid=8, symbol_order=140)
+;; Function dco.isra (dco_.isra.0, funcdef_no=1, decl_uid=5003, cgraph_uid=28, symbol_order=288)
 
 Removing basic block 9
 Removing basic block 10
@@ -7,168 +7,159 @@

Comment 5 Jan Hubicka 2021-11-15 11:13:04 UTC

Looking into optimized dump, I wonder why IPA-SRA is rpelacing some of parameters passed by reference but not all of them:

-__attribute__((fn spec (". w w w w w w w w w w w w w ")))
-double dco (int & restrict l, int & restrict m, int & restrict kx, int & restrict ky, int & restrict kz, int & restrict lp, int & restrict mp, double[15625] * restrict fpqr, double[0:] * restrict zlm, int[0:] * restrict lmf, int[0:] * restrict lmx, int[0:] * restrict lmy, int[0:] * restrict lmz)
+double dco.isra (int ISRA.1, int ISRA.1, int & restrict kx, int & restrict ky, int & restrict kz, int ISRA.2, int ISRA.3, double[15625] * restrict fpqr, double[0:] * restrict zlm, int[0:] * restrict lmf, int[0:] * restrict lmx, int[0:] * restrict lmy, int[0:] * restrict lmz)


I would expect kx,ky and kz to be also replaced.

-void formdr.constprop (double[0:] * restrict g, double[0:] * restrict g2, double[0:] * restrict xin, double[0:] * restrict yin, double[0:] * restrict zin, int & restrict mini, int & restrict maxi, int & restrict minj, int & restrict maxj, logical(kind=4) & restrict norm)
+void formdr.constprop (double[0:] * restrict g, double[0:] * restrict g2, double[0:] * restrict xin, double[0:] * restrict yin, double[0:] * restrict zin, int & restrict minj, int & restrict maxj)

similarly here we drop some parameters by we do not seem to do the replacement for rest of them.

Comment 6 Martin Jambor 2021-11-16 16:09:08 UTC

Fun fact, for me the benchmark passes (test) verification when built with:
  -O2 -g -flto=16 -fno-ipa-sra

but fails when built with:
  -O2 -g -flto=16 -fno-ipa-sra -fdump-ipa-all

Consistently so.  On revision r12-5309-g0002a8a1997c7b.

I thought I was going mad when attempting to somehow isolate what was
going on.  But I do not really know what to do with this piece of
knowledge.

Comment 7 Martin Jambor 2021-11-16 17:09:11 UTC

I have to leave the office for today.  The problematic option seems to be -fdump-ipa-inline.  With it, 28 out of 124 partitions are different in their ccp2 dumps but so far I only found UID differences.

Comment 8 Martin Liška 2021-11-16 19:24:00 UTC

Yes, it's really as you described. Started with r11-5061-g85ebbabd85e03bdc.

Comment 9 Martin Liška 2021-11-16 19:34:16 UTC

(In reply to Martin Liška from comment #8)
> Yes, it's really as you described. Started with r11-5061-g85ebbabd85e03bdc.

There are quite some -Wlto-type-mismatch warnings, so it may be related to that?

Comment 10 Martin Liška 2021-11-16 20:53:44 UTC

I isolated small reproducer:

$ cat 1.f
      SUBROUTINE DENDD1(B,L2)
      IF(CITYPIPSI.EQ.1) CALL VADD(A1DB1DA,1,L2)
      END

$ cat 2.f
      LOGICAL BSFDD
      L2 = 2
      CALL DENDD1(X0X0,L2)
      IF(BSFDD) THEN 
        CALL VCLR(X01NXYZ2*L2)
      CALL HFD(BSFDD)
      END IF
      END

$ cat cmd
FLAGS="-O2 -flto=16 -flto-partition=1to1 -fPIC"

gfortran-11 *.f $FLAGS -c 
gfortran-11 *.o $FLAGS -o gamess1 -shared $OBJ
gfortran-11 *.o $FLAGS -fdump-ipa-inline -o gamess2 -shared $OBJ
objdump -d gamess1 > gamess1.txt
objdump -d gamess2 > gamess2.txt
diff -u gamess1 gamess2

if test $? = 1; then
  exit 0
else
  exit 1
fi

$ ./cmd && diff gamess1.txt gamess2.txt
...
@@ -76,91 +76,88 @@
     10d8:	31 c0                	xor    %eax,%eax
     10da:	48 83 c4 18          	add    $0x18,%rsp
     10de:	c3                   	ret    
-    10df:	66 0f ef c0          	pxor   %xmm0,%xmm0
-    10e3:	48 8d 7c 24 0c       	lea    0xc(%rsp),%rdi
-    10e8:	f3 0f 2a 44 24 04    	cvtsi2ssl 0x4(%rsp),%xmm0
-    10ee:	f3 0f 59 05 26 0f 00 	mulss  0xf26(%rip),%xmm0        # 201c <options.0.0+0x1c>
-    10f5:	00 
-    10f6:	f3 0f 11 44 24 0c    	movss  %xmm0,0xc(%rsp)
-    10fc:	e8 2f ff ff ff       	call   1030 <vclr_@plt>
-    1101:	48 89 e7             	mov    %rsp,%rdi
...

but this got fixed on master with r12-5223-gecdf414bd89e6ba2.

I'll try if it happens with the current master. Hope it helps.

Comment 11 Jan Hubicka 2021-11-16 22:42:03 UTC

Thanks for reducing the testcase.
I found the reason for build difference.  There is misplaced code clearing to_info_lto. 

hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build4/gcc$ cat ~/fix
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index a70575bc807..1241e567266 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -5123,6 +5150,7 @@ ipa_merge_modref_summary_after_inlining (cgraph_edge *edge)
            fprintf (dump_file, "Removed mod-ref summary for %s\n",
                     to->dump_name ());
          summaries_lto->remove (to);
+         to_info_lto = NULL;
        }
       else if (to_info_lto && dump_file)
        {
@@ -5130,7 +5158,6 @@ ipa_merge_modref_summary_after_inlining (cgraph_edge *edge)
            fprintf (dump_file, "Updated mod-ref summary for %s\n",
                     to->dump_name ());
          to_info_lto->dump (dump_file);
-         to_info_lto = NULL;
        }
       if (callee_info_lto)
        summaries_lto->remove (edge->callee);

Comment 12 GCC Commits 2021-11-17 00:44:33 UTC

The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:8c693978dd64b16637577ebf50c760053d7d2165

commit r12-5328-g8c693978dd64b16637577ebf50c760053d7d2165
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Nov 17 01:43:57 2021 +0100

    Fix clearing of to_info_lto in ipa_merge_modref_summary_after_inlining
    
    This patch fixes bug that caused some optimizations to be dropped with
    -fdump-ipa-inline.
    
    gcc/ChangeLog:
    
    2021-11-17  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/103246
            * ipa-modref.c (ipa_merge_modref_summary_after_inlining): Fix clearing
            of to_info_lto

Comment 13 Martin Liška 2021-11-17 09:11:47 UTC

(In reply to Jan Hubicka from comment #11)
> Thanks for reducing the testcase.
> I found the reason for build difference.  There is misplaced code clearing
> to_info_lto. 

Thanks! Great you found it so quickly.

Comment 14 hubicka 2021-11-17 09:38:34 UTC

> Thanks! Great you found it so quickly.
It is bit stupid code since everything is duplicated twice (for LTO and
non-LTO). I have to refactor it: we could have common base of the two
summaries and handle most of things in unified way. The problem is that
gengtype does not like it, so it will probably need manual GGC
annotations.

It stil leaves us with a wrong code, right?

Comment 15 Jan Hubicka 2021-11-17 12:27:58 UTC

Yep, I still see difference in the output with and without SRA

Comment 16 Jan Hubicka 2021-11-17 13:03:05 UTC

Building with LTO only ecpder.fppized.o and ecplib.fppized.o I get the failure going away with --dbg-cnt=ipa_cp_values:10 and :9 works.

Different ipa-cp decision is
+***dbgcnt: upper limit 10 reached for ipa_cp_values.***
+ - Creating a specialized node of formdr/4 for all known contexts.
+    replacing param #5 int & restrict with const &ecpidx.iamin
+    replacing param #6 int & restrict with const &ecpidx.iamax
+    Removed a reference from formii.constprop/284 to ecpidx/11.
+      ...and replaced it with LOAD one.
+    Removed a reference from ecp2d/1 to ecpidx/11.
+    Removed a reference from ecp1d/0 to ecpidx/11.
+    replacing param #9 logical(kind=4) & restrict with const &ecpidx.norm - forcing load reference

In modref the propagation goes same way and we update signature as follows:
+Updating summary for formdr.constprop/285 from:
+  loads:
+    Limits: 32 bases, 16 refs
+    Every base
+  stores:
+    Limits: 32 bases, 16 refs
+    Every base
+  Side effects
+  Nondeterministic
+  parm 0 flags: no_direct_clobber no_direct_escape no_indirect_escape
+  parm 1 flags: no_direct_clobber no_direct_escape no_indirect_escape
+  parm 2 flags: no_direct_escape no_indirect_escape
+  parm 3 flags: no_direct_escape no_indirect_escape
+  parm 4 flags: no_direct_escape no_indirect_escape
+  parm 5 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+  parm 6 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+  parm 7 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+  parm 8 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+  parm 9 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+to:
+  loads:
+    Limits: 32 bases, 16 refs
+    Every base
+  stores:
+    Limits: 32 bases, 16 refs
+    Every base
+  Side effects
+  Nondeterministic
+  parm 0 flags: no_direct_clobber no_direct_escape no_indirect_escape
+  parm 1 flags: no_direct_clobber no_direct_escape no_indirect_escape
+  parm 2 flags: no_direct_escape no_indirect_escape
+  parm 3 flags: no_direct_escape no_indirect_escape
+  parm 4 flags: no_direct_escape no_indirect_escape
+  parm 5 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read
+  parm 6 flags: no_direct_clobber no_indirect_clobber no_direct_escape no_indirect_escape no_indirect_read

this changes partitioning decisions which makes quite a lot of fuzz in latr dump files.

Comment 17 Jan Hubicka 2021-11-17 13:15:07 UTC

and fun fact is that it really depends how program is paritioned. 
-flto-partition=max keeps problem reproducing while -flto-partition=one makes it go away and -flto-partition=one --disable-tree-modref2 brings it back.

Comment 18 Jan Hubicka 2021-11-17 13:31:11 UTC

This seems to fix it for this particular partitioning at least ;)
The problem was in summary update producing ill formed access range (which doesn't know base but knows offset from it) and that confused streamer.

I will test it with additional sanity check that accesses are sane.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 1bf2aa8460e..c29dda56fc1 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -677,6 +677,8 @@ struct GTY((user)) modref_tree
                    access_node->parm_index = (*map)[access_node->parm_index];
                  else
                    access_node->parm_index = MODREF_UNKNOWN_PARM;
+                 if (access_node->parm_index == MODREF_UNKNOWN_PARM)
+                   access_node->parm_offset_known = false;
                }
          }
       }

Comment 19 Jan Hubicka 2021-11-17 18:58:13 UTC

I was too optimistic - the patch does not help.

The problem is related to modref.  With lto-partition=max the problem triggers while lto-partition=one hides it becuase late tree modref recomputes summaries and after that things works as expected.  It may be wrong summary produced by the ipa-modref and streamed to the ltrans or some TBAA related bug. Disabling TBAA checks from the summary use makes the testcase to pass and there are some TBAA violation warnings when linking. I am isolating the problematic disambiguation.

Comment 20 Jan Hubicka 2021-11-17 19:36:38 UTC

linking with --dbg-cnt=ipa_cp_values:10 -flto-partition=one --disable-tree-modref2 --dbg-cnt=ipa_mod_ref:836 -fdump-tree-all-details -fdump-ipa-all-details is first that produces different results. Diff to 835 is:

diff -u 835/gamess.ltrans0.ltrans.113t.dse2 836/gamess.ltrans0.ltrans.113t.dse2
--- 835/gamess.ltrans0.ltrans.113t.dse2 2021-11-17 20:11:34.894715053 +0100
+++ 836/gamess.ltrans0.ltrans.113t.dse2 2021-11-17 20:12:40.387968675 +0100
@@ -14780,7 +14780,6 @@
 ipa-modref: call to dco.isra/287 does not use ref: kz alias sets: 4->4
 ipa-modref: call stmt dc_1560 = dco.isra (l_1973, m_1975, &kx, &ky, &kz, la_1974, mu_1976, _513, _510, _507, _504, _501, _498);
 ipa-modref: call to dco.isra/287 does not use ref: jndx alias sets: 4->4
-***dbgcnt: upper limit 835 reached for ipa_mod_ref.***
 ipa-modref: call stmt dc_1560 = dco.isra (l_1973, m_1975, &kx, &ky, &kz, la_1974, mu_1976, _513, _510, _507, _504, _501, _498);
 ipa-modref: call to dco.isra/287 does not use ref: kz alias sets: 4->4
   Deleted trivially dead stmt: _1548 = (long int) _1547;
@@ -14799,6 +14798,13 @@
 
   Deleted trivially dead stmt: _1770 = indx_1668 + 1;
 
+***dbgcnt: upper limit 836 reached for ipa_mod_ref.***
+ipa-modref: call stmt dc_1710 = dco.isra (0, 0, &D.8524, &D.8523, &D.8522, la_1977, mu_1978, _513, _510, _507, _504, _501, _498);
+ipa-modref: call to dco.isra/287 does not use ref: D.8522 alias sets: 4->4
+  Deleted dead store: D.8522 = _1709;
+
+  Deleted trivially dead stmt: _1709 = kz_1696 + kzp_1699;
+

At WPA time we first compute summary for original function and then update for new signature:

Updating summary for dco.isra/287 from:
  loads:
    Limits: 32 bases, 16 refs
      Base 0:int (alias set 4)
        Ref 0:int (alias set 4)
          access: Parm 0 param offset:0 offset:0 size:32 max_size:32
          access: Parm 1 param offset:0 offset:0 size:32 max_size:32
          access: Parm 5 param offset:0 offset:0 size:32 max_size:32
          access: Parm 6 param offset:0 offset:0 size:32 max_size:32
          access: Parm 2 param offset:0 offset:0 size:32 max_size:32
          access: Parm 3 param offset:0 offset:0 size:32 max_size:32
          access: Parm 4 param offset:0 offset:0 size:32 max_size:32
      Base 1:int[0:] (alias set 4)
        Ref 0:int (alias set 4)
          access: Parm 9 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 10 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 11 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 12 param offset:0 offset:0 size:32 max_size:-1
      Base 2:double[0:] (alias set 5)
        Ref 0:double (alias set 5)
          access: Parm 8 param offset:0 offset:0 size:64 max_size:-1
      Base 3:double[15625] (alias set 5)
        Ref 0:double (alias set 5)
          access: Parm 7 param offset:0 offset:0 size:64 max_size:-1
  stores:
    Limits: 32 bases, 16 refs
  parm 0 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 1 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 2 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 3 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 4 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 5 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 6 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 7 flags: not_returned_directly no_indirect_read
  parm 8 flags: not_returned_directly no_indirect_read
  parm 9 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 10 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 11 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 12 flags: not_returned_directly not_returned_indirectly no_indirect_read
to:
  loads:
    Limits: 32 bases, 16 refs
      Base 0:int (alias set 4)
        Ref 0:int (alias set 4)
          access:
          access:
          access:
          access:
          access: Parm 2 param offset:0 offset:0 size:32 max_size:32
          access: Parm 3 param offset:0 offset:0 size:32 max_size:32
          access: Parm 4 param offset:0 offset:0 size:32 max_size:32
      Base 1:int[0:] (alias set 4)
        Ref 0:int (alias set 4)
          access: Parm 9 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 10 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 11 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 12 param offset:0 offset:0 size:32 max_size:-1
      Base 2:double[0:] (alias set 5)
        Ref 0:double (alias set 5)
          access: Parm 8 param offset:0 offset:0 size:64 max_size:-1
      Base 3:double[15625] (alias set 5)
        Ref 0:double (alias set 5)
          access: Parm 7 param offset:0 offset:0 size:64 max_size:-1
  stores:
    Limits: 32 bases, 16 refs
  parm 2 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 3 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 4 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 7 flags: not_returned_directly no_indirect_read
  parm 8 flags: not_returned_directly no_indirect_read
  parm 9 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 10 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 11 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 12 flags: not_returned_directly not_returned_indirectly no_indirect_read

This seems to correclty represent that some params was scalarised.  Only one oddity is that the list is not optimized and kept in redundant form but it should be harmless.

At stream in time we see:
Read modref for dco.isra/287
  loads:
    Limits: 32 bases, 16 refs
      Base 0: alias set 4
        Ref 0: alias set 4
          access: Parm 9 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 10 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 11 param offset:0 offset:0 size:32 max_size:-1
          access: Parm 12 param offset:0 offset:0 size:32 max_size:-1
      Base 1: alias set 3
        Ref 0: alias set 3
          access: Parm 8 param offset:0 offset:0 size:64 max_size:-1
          access: Parm 7 param offset:0 offset:0 size:64 max_size:-1
  stores:
    Limits: 32 bases, 16 refs
  Try dse
  parm 2 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 3 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 4 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 7 flags: not_returned_directly no_indirect_read
  parm 8 flags: not_returned_directly no_indirect_read
  parm 9 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 10 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 11 flags: not_returned_directly not_returned_indirectly no_indirect_read
  parm 12 flags: not_returned_directly not_returned_indirectly no_indirect_read

So types are translated to alias sets however the stream in is wrong since there was accesses without known params and they are gone.  So it is streaming bug (and was present already in the initial modref version)

Comment 21 GCC Commits 2021-11-17 21:06:08 UTC

The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:425369bf3068a9f840d1c2f04a4d4c38e924d4dc

commit r12-5351-g425369bf3068a9f840d1c2f04a4d4c38e924d4dc
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Nov 17 22:04:26 2021 +0100

    Fix modref summary streaming
    
    Fixes bug in streaming in modref access tree that now cause a failure
    of gamess benchmark.  The bug is quite old (present in GCC11 release) but it
    needs quite interesting series of events to manifest. In particular
     1) At lto time ISRA turns some parameters passed by reference to scalar
     2) At lto time modref computes summaries for old parameters and then updates
        them but does so quite stupidly believing that the load from parameters
        are now unkonwn loads (rather than optimized out).
        This renders summary not very useful since it thinks every memory aliasing
        int is now accssed (as opposed as parameter dereference)
     3) At stream in we notice too early that summary is useless, set every_access
        flag and drop the list.  However while reading rest of the summary we
        overwrite the flag back to 0 which makes us to lose part of summary.
     4) right selection of partitions needs to be done to avoid late modref from
        recalculating and thus fixing the summary.
    
    This patch fixes the stream in bug, however we also should fix updating of
    summaries.
    
    gcc/ChangeLog:
    
    2021-11-17  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/103246
            * ipa-modref.c (read_modref_records): Fix streaminig in of every_access
            flag.

Comment 22 Jan Hubicka 2021-11-18 08:27:38 UTC

Seems fixed on LNT now too https://lnt.opensuse.org/db_default/v4/SPEC/21494

Comment 23 Jan Hubicka 2021-11-18 08:28:44 UTC

Meant to close the bug too.. Fixed now.

Comment 24 Martin Liška 2021-11-18 12:22:12 UTC

(In reply to CVS Commits from comment #12)
> The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:8c693978dd64b16637577ebf50c760053d7d2165
> 
> commit r12-5328-g8c693978dd64b16637577ebf50c760053d7d2165
> Author: Jan Hubicka <jh@suse.cz>
> Date:   Wed Nov 17 01:43:57 2021 +0100
> 
>     Fix clearing of to_info_lto in ipa_merge_modref_summary_after_inlining
>     
>     This patch fixes bug that caused some optimizations to be dropped with
>     -fdump-ipa-inline.
>     
>     gcc/ChangeLog:
>     
>     2021-11-17  Jan Hubicka  <hubicka@ucw.cz>
>     
>             PR ipa/103246
>             * ipa-modref.c (ipa_merge_modref_summary_after_inlining): Fix
> clearing
>             of to_info_lto

Can you please backport this to GCC 11?

Comment 25 GCC Commits 2021-11-18 19:10:18 UTC

The releases/gcc-11 branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:9d3f1435a348bece9e11787df982bd465db74ed8

commit r11-9248-g9d3f1435a348bece9e11787df982bd465db74ed8
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Nov 17 22:04:26 2021 +0100

    Fix modref summary streaming
    
    Fixes bug in streaming in modref access tree that now cause a failure
    of gamess benchmark.  The bug is quite old (present in GCC11 release) but it
    needs quite interesting series of events to manifest. In particular
     1) At lto time ISRA turns some parameters passed by reference to scalar
     2) At lto time modref computes summaries for old parameters and then updates
        them but does so quite stupidly believing that the load from parameters
        are now unkonwn loads (rather than optimized out).
        This renders summary not very useful since it thinks every memory aliasing
        int is now accssed (as opposed as parameter dereference)
     3) At stream in we notice too early that summary is useless, set every_access
        flag and drop the list.  However while reading rest of the summary we
        overwrite the flag back to 0 which makes us to lose part of summary.
     4) right selection of partitions needs to be done to avoid late modref from
        recalculating and thus fixing the summary.
    
    This patch fixes the stream in bug, however we also should fix updating of
    summaries.
    
    gcc/ChangeLog:
    
    2021-11-17  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/103246
            * ipa-modref.c (read_modref_records): Fix streaminig in of every_access
            flag.
    
    (cherry picked from commit 425369bf3068a9f840d1c2f04a4d4c38e924d4dc)