Bug 113907 - [12/13 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41
Summary: [12/13 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 14.0
: P2 normal
Target Milestone: 14.3
Assignee: Jan Hubicka
URL:
Keywords: wrong-code
: 106921 113665 114290 (view as bug list)
Depends on:
Blocks: icf
  Show dependency treegraph
 
Reported: 2024-02-13 12:49 UTC by Sam James
Modified: 2024-08-01 09:38 UTC (History)
12 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-02-15 00:00:00


Attachments
udataswp.ii.xz (94.96 KB, application/x-xz)
2024-02-13 14:17 UTC, Sam James
Details
udataswp.cpp.262r.expand (good) (71.67 KB, text/plain)
2024-02-13 14:18 UTC, Sam James
Details
udataswp.cpp.262r.expand (bad) (71.22 KB, text/plain)
2024-02-13 14:19 UTC, Sam James
Details
udataswp.o (good, r14-5108-g751fc7bcdcdf25) (78.24 KB, application/x-object)
2024-02-13 14:22 UTC, Sam James
Details
udataswp.o (bad, r14-5109-ga291237b628f41) (78.20 KB, application/x-object)
2024-02-13 14:23 UTC, Sam James
Details
gcc14-pr113907.patch (1.26 KB, patch)
2024-02-14 15:56 UTC, Jakub Jelinek
Details | Diff
Compare value ranges in jump functions (980 bytes, patch)
2024-03-14 16:20 UTC, Jan Hubicka
Details | Diff
Patch comparing jump functions (3.64 KB, patch)
2024-03-20 18:13 UTC, Martin Jambor
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2024-02-13 12:49:02 UTC
Originally reported downstream in Gentoo at https://bugs.gentoo.org/924153.

ICU seems miscompiled on x86 (-m32) since r14-5109-ga291237b628f41. My C++ is ~non-existent so I can't reduce it. I appreciate this isn't a great level of detail.

It crashes during the build when running a just-built executable:
```
i686-pc-linux-gnu-g++ -D_REENTRANT -DU_DEBUG=1 -DU_HAVE_ELF_H=1 -DU_HAVE_STRTOD_L=1 -DU_HAVE_XLOCALE_H=0 -DU_DISABLE_RENAMING=1 -I/var/tmp/portage/dev-libs/icu-74.2/work/icu/source/common -I/var/tmp/portage/dev-libs/icu-74.2/work/icu/source/tools/icupkg/../toolutil -DU_ATTRIBUTE_DEPRECATED= -O2 -march=i686 -pipe -pipe -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -std=c++14 -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -c -o icupkg.o /var/tmp/portage/dev-libs/icu-74.2/work/icu/source/tools/icupkg/icupkg.cpp
[..]
i686-pc-linux-gnu-g++ -O2 -march=i686 -pipe -pipe -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -std=c++14 -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long   -Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0   -o ../../bin/icupkg icupkg.o -L../../lib -licutu -L../../lib -licui18n -L../../lib -licuuc -L../../stubdata -licudata -lpthread -lm  
[...]
Unpacking /var/tmp/portage/dev-libs/icu-74.2/work/icu/source/data/in/icudt74l.dat and generating out/tmp/icudata.lst (list of data files)
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg -d ./out/build/icudt74l --list -x \* /var/tmp/portage/dev-libs/icu-74.2/work/icu/source/data/in/icudt74l.dat -o out/tmp/icudata.lst
make[1]: *** [Makefile:272: out/tmp/icudata.lst] Segmentation fault
make[1]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/var/tmp/portage/dev-libs/icu-74.2/work/icu/source-abi_x86_32.x86/data'
make[1]: Leaving directory '/var/tmp/portage/dev-libs/icu-74.2/work/icu/source-abi_x86_32.x86/data'
make: *** [Makefile:153: all-recursive] Error 2
 * ERROR: dev-libs/icu-74.2::gentoo failed (compile phase):
 *   emake failed
```

Valgrind says:
```
==26485== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==26485== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==26485== Command: ../bin/icupkg -d ./out/build/icudt74l --list -x * /var/tmp/portage/dev-libs/icu-74.1/work/icu/source/data/in/icudt74l.dat -o out/tmp/icudata.lst
==26485==
==26485== Invalid write of size 1
==26485==    at 0x52A0110: memcpy (string_fortified.h:29)
==26485==    by 0x52A0110: uprv_copyArray64 (udataswp.cpp:172)
==26485==    by 0x52A0110: uprv_copyArray16(UDataSwapper const*, void const*, int, void*, UErrorCode*) (udataswp.cpp:160)
==26485==    by 0x52A0630: udata_swapDataHeader (udataswp.cpp:342)
==26485==    by 0x48694F1: icu::Package::readPackage(char const*) (package.cpp:483)
==26485==    by 0x10987F: main (icupkg.cpp:335)
==26485==  Address 0x54f2458 is 0 bytes after a block of size 201,216 alloc'd
==26485==    at 0x4842E4D: operator new(unsigned int) (vg_replace_malloc.c:476)
==26485==    by 0x10936B: main (icupkg.cpp:285)
```

Manual reproduction steps:
```
./configure CC="gcc -m32" CXX="g++ -m32" CFLAGS="-O2 -ggdb3 -march=i686" CXXFLAGS="-O2 -ggdb3 -march=i686" --disable-renaming --disable-layoutex --disable-samples
make -j$(nproc)
```

There's some suspicious code there wrt reinterpret_cast but -fno-strict-aliasing doesn't help. It's fine with -O1, but not -O2.
Comment 1 Sam James 2024-02-13 12:55:55 UTC
I will do the usual bisection of objects and also narrow it down with pragmas. I won't be able to get much further than that though, I suspect.

-fsanitize=address,undefined builds fine (i.e. it doesn't even segfault).
Comment 2 Sam James 2024-02-13 13:00:22 UTC
Program received signal SIGSEGV, Segmentation fault.
0xf770e5c0 in memcpy (__dest=<optimized out>, __src=<optimized out>, __len=<optimized out>) at /usr/include/bits/string_fortified.h:29
29        return __builtin___memcpy_chk (__dest, __src, __len,
(gdb) bt
#0  0xf770e5c0 in memcpy (__dest=<optimized out>, __src=<optimized out>, __len=<optimized out>) at /usr/include/bits/string_fortified.h:29
#1  uprv_copyArray64 (ds=<optimized out>, pErrorCode=<optimized out>, inData=<optimized out>, length=<optimized out>, outData=0xf7f21094) at udataswp.cpp:172
#2  uprv_copyArray16 (ds=<optimized out>, inData=<optimized out>, length=<optimized out>, outData=<optimized out>, pErrorCode=<optimized out>) at udataswp.cpp:160
#3  0xf770eac5 in udata_swapDataHeader (ds=<optimized out>, inData=<optimized out>, length=<optimized out>, outData=<optimized out>, pErrorCode=<optimized out>) at udataswp.cpp:342
#4  0xf7f65cb2 in icu::Package::readPackage (this=this@entry=0xf7f21010, filename=filename@entry=0xffffcc82 "./in/icudt74l.dat") at package.cpp:483
#5  0x56556850 in main (argc=2, argv=0xffffca14) at icupkg.cpp:335
(gdb)
Comment 3 Jakub Jelinek 2024-02-13 13:14:55 UTC
(In reply to Sam James from comment #1)
> I will do the usual bisection of objects and also narrow it down with
> pragmas. I won't be able to get much further than that though, I suspect.

That will be definitely useful + preprocessed source for the corresponding TU.
LTO isn't involved and hopefully it won't change IL everywhere in the TU.
Comment 4 Sam James 2024-02-13 14:17:56 UTC
Created attachment 57409 [details]
udataswp.ii.xz

It goes wrong in common/udataswp.cpp's uprv_copyArray16 and uprv_copyArray64.

(Seemingly both of them need to be pragma'd but I'll check again later as it's a bit weird.)

The first differing pass is 262r.expand.

```
[...]
-(insn # # # (set (reg:CC 17 flags)
-        (compare:CC (reg:SI 133)
-            (const_int 8 [0x8]))) "udataswp.cpp":173:9#
-     (nil))
-
-(jump_insn # # # (set (pc)
-        (if_then_else (ltu (reg:CC 17 flags)
-                (const_int 0 [0]))
-            (label_ref #)
-            (pc))) "udataswp.cpp":173:9#
-     (int_list:REG_BR_PROB 644245100 (nil)))
[...]
@@ -6876,15 +6862,11 @@ Merged 2 and 3 without moving.
 Edge 7->9 redirected to 10
 Forwarding edge 8->9 to 10 failed.
 Deleted label in block 9.
-Redirecting jump 51 from 24 to 25.
-Edge 19->21 redirected to 22
-Merging block 21 into block 20...
-Merged blocks 20 and 21.
-Merged 20 and 21 without moving.
-Merging block 24 into block 23...
-Merged blocks 23 and 24.
-Merged 23 and 24 without moving.
-Removing jump 128.
+Redirecting jump 51 from 22 to 23.
+Merging block 22 into block 21...
+Merged blocks 21 and 22.
+Merged 21 and 22 without moving.
+Removing jump 123.
[...]
```
Comment 5 Sam James 2024-02-13 14:18:57 UTC
Created attachment 57410 [details]
udataswp.cpp.262r.expand (good)
Comment 6 Sam James 2024-02-13 14:19:30 UTC
Created attachment 57411 [details]
udataswp.cpp.262r.expand (bad)
Comment 7 Sam James 2024-02-13 14:22:34 UTC
Created attachment 57412 [details]
udataswp.o (good, r14-5108-g751fc7bcdcdf25)
Comment 8 Sam James 2024-02-13 14:23:05 UTC
Created attachment 57413 [details]
udataswp.o (bad, r14-5109-ga291237b628f41)
Comment 9 Sam James 2024-02-13 14:24:58 UTC
(In reply to Sam James from comment #4)
> Created attachment 57409 [details]
> udataswp.ii.xz
> 
> It goes wrong in common/udataswp.cpp's uprv_copyArray16 and uprv_copyArray64.
> 

ah, uprv_copyArray64 -> uprv_copyArray16 as the memcpy impl
Comment 10 Jakub Jelinek 2024-02-13 14:49:45 UTC
g++ command line for that TU?
-O2 -march=i686 -std=c++14 -m32
?
Comment 11 Andrew Pinski 2024-02-13 14:55:50 UTC
Does -fwrapv help?

If so does -fsanitize=undefined say anything?
Comment 12 Sam James 2024-02-13 14:56:13 UTC
-O2 -march=i686 -std=c++11 -m32 -fPIC
Comment 13 Sam James 2024-02-13 14:58:08 UTC
(In reply to Andrew Pinski from comment #11)
> Does -fwrapv help?

no (and ubsan suppresses the crash, everything works fine w/ no ubsan output)
Comment 14 Jakub Jelinek 2024-02-13 15:11:33 UTC
There are significant differences in the ranges starting with evrp.
Even optimized has:
--- pr113907.ii.261t.optimized_ 2024-02-13 09:52:13.090609512 -0500
+++ pr113907.ii.261t.optimized  2024-02-13 09:53:35.691479080 -0500
@@ -576,7 +576,7 @@ int32_t uprv_copyArray16 (const struct U
     goto <bb 10>; [67.00%]
 
   <bb 9> [local count: 43795362]:
-  # RANGE [irange] unsigned int [2, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_21 = (unsigned int) length_14(D);
   memcpy (outData_15(D), inData_13(D), length.37_21);
 
@@ -676,14 +676,14 @@ int32_t uprv_copyArray64 (const struct U
     goto <bb 10>; [67.00%]
 
   <bb 9> [local count: 43795362]:
-  # RANGE [irange] unsigned int [1, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_21 = (unsigned int) length_14(D);
   memcpy (outData_15(D), inData_13(D), length.37_21);
 
   <bb 10> [local count: 88917857]:
 
   <bb 11> [local count: 1073741824]:
-  # RANGE [irange] int32_t [0, +INF] MASK 0x7fffffff VALUE 0x0
+  # RANGE [irange] int32_t [0, 0][8, +INF] MASK 0x7fffffff VALUE 0x0
   # _6 = PHI <0(7), length_14(D)(10)>
   return _6;
 
@@ -776,14 +776,14 @@ int32_t uprv_copyArray32 (const struct U
     goto <bb 10>; [67.00%]
 
   <bb 9> [local count: 43795362]:
-  # RANGE [irange] unsigned int [1, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_21 = (unsigned int) length_14(D);
   memcpy (outData_15(D), inData_13(D), length.37_21);
 
   <bb 10> [local count: 88917857]:
 
   <bb 11> [local count: 1073741824]:
-  # RANGE [irange] int32_t [0, +INF] MASK 0x7fffffff VALUE 0x0
+  # RANGE [irange] int32_t [0, 0][4, +INF] MASK 0x7fffffff VALUE 0x0
   # _6 = PHI <0(7), length_14(D)(10)>
   return _6;
 
So it certainly doesn't surprise me some length < 8 check is optimized away given the above range info.  The question is if it is correct and what values the length actually get at runtime if you e.g. compile with -O0.
Comment 15 Andrew Pinski 2024-02-13 15:17:50 UTC
Note the range part looks correct when taking the mask into account so I am suspecting the mask is messed up. Let me see if I could spot it later today.
Comment 16 Sam James 2024-02-13 15:19:02 UTC
(In reply to Jakub Jelinek from comment #14
> So it certainly doesn't surprise me some length < 8 check is optimized away
> given the above range info.  The question is if it is correct and what
> values the length actually get at runtime if you e.g. compile with -O0.

(gdb) b uprv_copyArray16
Breakpoint 1 at 0xf770e71b: file udataswp.cpp, line 147.
(gdb) b uprv_copyArray64
Breakpoint 2 at 0xf770e7ba: file udataswp.cpp, line 164.
(gdb) r

Breakpoint 1, uprv_copyArray16 (ds=0x5655e4f0, inData=0xf32a4010, length=2, outData=0xf7f24094, pErrorCode=0xffffc47c) at udataswp.cpp:147
147         if(pErrorCode==nullptr || U_FAILURE(*pErrorCode)) {
(gdb) c
Continuing.

Breakpoint 1, uprv_copyArray16 (ds=0x5655e4f0, inData=0xf32a4014, length=4, outData=0xf7f24098, pErrorCode=0xffffc47c) at udataswp.cpp:147
147         if(pErrorCode==nullptr || U_FAILURE(*pErrorCode)) {

(gdb) c
Continuing.
[Inferior 1 (process 2861598) exited normally]
(gdb)

Oops.
Comment 17 Jakub Jelinek 2024-02-13 15:36:21 UTC
One set of range changes in evrp is more precise ranges in return values of uprv_swapArray{16,32,64}, those contain early return 0 if
length<0 || (length&1)!=0
or
length<0 || (length&3)!=0
or
length<0 || (length&7)!=0
so changing those return ranges from [0, +INF] to [0, 0][4, +INF] (for the second, 2 or 8 instead of 4 for the first/last) seems completely reasonable.
And uprv_copyArray{16,32,64} has something very similar,
length<0 || (length&{1,3,7})!=0 early exit and length > 0 guarded memcpy.
So, even there it is completely reasonable to change
-  # RANGE [irange] unsigned int [1, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_10 = (unsigned int) length_25(D);
   memcpy (outData_26(D), inData_24(D), length.37_10);
for the copy and
-  # RANGE [irange] int32_t [0, +INF] MASK 0x7fffffff VALUE 0x0
+  # RANGE [irange] int32_t [0, 0][8, +INF] MASK 0x7fffffff VALUE 0x0
   # _12 = PHI <0(4), 0(9), length_25(D)(12)>
   return _12;
for the return, but naturally copyData16 uses 2 instead of 8 and copyData32 4 instead of 8.

What seems wrong is that the 8 in the ranges somehow leaks into copyData16 during IPA.
IPA-ICF?
Comment 18 Andrew Pinski 2024-02-13 15:40:00 UTC
(In reply to Jakub Jelinek from comment #17)
> What seems wrong is that the 8 in the ranges somehow leaks into copyData16
> during IPA.
> IPA-ICF?

If IPA-ICF, then PR 113665 is similar in nature.
Comment 19 Jakub Jelinek 2024-02-13 15:46:37 UTC
Diffing the -fdump-{tree,ipa}-all-alias dumps from r14-5108 and r14-5109, 085i.fnsummary still looks good (the 2/4/8 in the ranges corresponds to whether it is in 16/32/64 suffixed/infixed function).
But the inline dump already contains presumably bogus
@@ -2426,7 +2426,7 @@ int32_t uprv_copyArray16 (const struct U
     goto <bb 12>; [67.00%]
 
   <bb 11> [local count: 43795362]:
-  # RANGE [irange] unsigned int [1, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_21 = (unsigned int) length_14(D);
   # USE = anything 
   # CLB = anything 
@@ -2437,7 +2437,7 @@ int32_t uprv_copyArray16 (const struct U
   _7 = _25;
 
   <bb 13> [local count: 1073741824]:
-  # RANGE [irange] int32_t [0, +INF] MASK 0x7fffffff VALUE 0x0
+  # RANGE [irange] int32_t [0, 0][2, +INF] MASK 0x7fffffff VALUE 0x0
   # _6 = PHI <0(4), _7(12)>
   return _6;
 
and
@@ -2660,7 +2660,7 @@ int32_t uprv_copyArray32 (const struct U
     goto <bb 12>; [67.00%]
 
   <bb 11> [local count: 43795362]:
-  # RANGE [irange] unsigned int [1, 2147483647] MASK 0x7ffffff8 VALUE 0x0
+  # RANGE [irange] unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   length.37_21 = (unsigned int) length_14(D);
   # USE = anything 
   # CLB = anything 
@@ -2671,7 +2671,7 @@ int32_t uprv_copyArray32 (const struct U
   _7 = _25;
 
   <bb 13> [local count: 1073741824]:
-  # RANGE [irange] int32_t [0, +INF] MASK 0x7fffffff VALUE 0x0
+  # RANGE [irange] int32_t [0, 0][4, +INF] MASK 0x7fffffff VALUE 0x0
   # _6 = PHI <0(4), _7(12)>
   return _6;
hunks, the return in those cases looks still right, but the length before memcpy looks wrong.
Comment 20 Jakub Jelinek 2024-02-13 15:58:38 UTC
Ah, I think it is IPA-ICF.
What happens is that fnsplit splits the uprv_copyArray{16,32,64} functions, where the
inline part does what is actually different among the functions, i.e. the
tests which among other do the early out for
length<0 || (length&1)!=0
or
length<0 || (length&3)!=0
or
length<0 || (length&7)!=0
among other things, and then the *.part.0 parts which are exactly the same for the 3 functions IL wise, except for the unfortunately differences in the ranges.
So, essentially just
  <bb 2> [local count: 132713219]:
  _2 = length_1(D) != 0;
  _5 = inData_3(D) != outData_4(D);
  _6 = _2 & _5;
  if (_6 != 0)
    goto <bb 3>; [33.00%]
  else
    goto <bb 4>; [67.00%]

  <bb 3> [local count: 43795362]:
  # RANGE [irange] unsigned int [N, 2147483647] MASK 0x7ffffffe VALUE 0x0
  length.43_7 = (unsigned int) length_1(D);
  # USE = anything
  # CLB = anything
  memcpy (outData_4(D), inData_3(D), length.43_7);

  <bb 4> [local count: 132713219]:

  <bb 5> [local count: 132713219]:
  # RANGE [irange] int [0, 0][N, +INF] MASK 0x7fffffff VALUE 0x0
  # _8 = PHI <length_1(D)(4)>
  return _8;
where N is 2, 4 or 8.
Now IPA-ICF comes, doesn't care about ranges, merges all 3 into one and picks there
the most unfortunate case, the case from the copyData64 part with N 8.
Later on we inline this back into the functions, the function splitting was useless.

So, the question is what to do in IPA-ICF.  Should we punt on range differences, reset all ranges or try to merge the ranges from all the functions merged together?
I think the last would be best.
Comment 21 Andrew Pinski 2024-02-13 16:00:29 UTC
(In reply to Jakub Jelinek from comment #20)
> Ah, I think it is IPA-ICF.
> What happens is that fnsplit splits the uprv_copyArray{16,32,64} functions,
> where the
> inline part does what is actually different among the functions,
...
> Now IPA-ICF comes, doesn't care about ranges, merges all 3 into one and
> picks there
> the most unfortunate case, the case from the copyData64 part with N 8.
> Later on we inline this back into the functions, the function splitting was
> useless.

Yep that is the same descriptions as what is happening in PR 113665.
Comment 22 Andrew Pinski 2024-02-13 16:01:15 UTC
(In reply to Andrew Pinski from comment #21)
> Yep that is the same descriptions as what is happening in PR 113665.

Specifically https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665#c5 .
Comment 23 Jakub Jelinek 2024-02-13 16:07:29 UTC
(In reply to Andrew Pinski from comment #22)
> (In reply to Andrew Pinski from comment #21)
> > Yep that is the same descriptions as what is happening in PR 113665.
> 
> Specifically https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665#c5 .

Indeed.  But this bug is a P1 while the other one P2.
Comment 24 Jakub Jelinek 2024-02-13 16:30:26 UTC
static inline int
foo (int len, void *indata, void *outdata)
{
  if (len < 0 || (len & 7) != 0)
    return 0;
  if (len != 0 && indata != outdata)
    __builtin_memcpy (outdata, indata, len);
  return len;
}

static inline int
bar (int len, void *indata, void *outdata)
{
  if (len < 0 || (len & 1) != 0)
    return 0;
  if (len != 0 && indata != outdata)
    __builtin_memcpy (outdata, indata, len);
  return len;
}

int (*volatile p1) (int, void *, void *) = foo;
int (*volatile p2) (int, void *, void *) = bar;

__attribute__((noipa)) int
baz (int len, void *indata, void *outdata)
{
  if ((len & 6) != 0)
    bar (len, indata, outdata);
  else
    foo (len, indata, outdata);
}

int
main ()
{
  char buf[13] = "abcdefghijkl";
  p2 (6, buf, buf + 6);
  if (__builtin_strcmp (buf, "abcdefabcdef"))
    __builtin_abort ();
}

Reproduces the wrong range in bar, but still doesn't crash nor abort.  So I probably need some different size of the copying.
Comment 25 Jakub Jelinek 2024-02-14 09:16:27 UTC
So, to sum up what has been discussed on IRC, LTO streaming doesn't seem to stream SSA_NAME_INFO, only the final IPA phases of say IPA-VRP can set SSA_NAME_INFO.
Thus, this bug is most likely solely about -fno-lto behavior of IPA-ICF.

Supposedly there we can change the SSA_NAME_INFO upon final decision to merge two functions.  I'd say it is in sem_function::merge above if (redirect_callers).
And guard with !flag_wpa (or for flag_wpa assert all SSA_NAME_INFO is NULL).

Except that by the time sem_function::merge is called, m_checker with its mapping between SSA_NAME_VERSION is unfortunately gone, so we'd need to preserve it somewhere.
Comment 26 Richard Biener 2024-02-14 09:25:08 UTC
(In reply to Jakub Jelinek from comment #25)
> So, to sum up what has been discussed on IRC, LTO streaming doesn't seem to
> stream SSA_NAME_INFO, only the final IPA phases of say IPA-VRP can set
> SSA_NAME_INFO.
> Thus, this bug is most likely solely about -fno-lto behavior of IPA-ICF.
> 
> Supposedly there we can change the SSA_NAME_INFO upon final decision to
> merge two functions.  I'd say it is in sem_function::merge above if
> (redirect_callers).
> And guard with !flag_wpa (or for flag_wpa assert all SSA_NAME_INFO is NULL).
> 
> Except that by the time sem_function::merge is called, m_checker with its
> mapping between SSA_NAME_VERSION is unfortunately gone, so we'd need to
> preserve it somewhere.

Or go and throw away all ranges from the final merge target?
Comment 27 Jakub Jelinek 2024-02-14 09:52:01 UTC
So:
--- gcc/ipa-icf.cc.jj	2024-02-10 11:25:09.645478952 +0100
+++ gcc/ipa-icf.cc	2024-02-14 10:44:27.906216458 +0100
@@ -1244,6 +1244,29 @@ sem_function::merge (sem_item *alias_ite
   else
     create_alias = true;
 
+  unsigned i;
+  tree name;
+  FOR_EACH_SSA_NAME (i, name, original->get_fun ())
+    {
+      /* We need to either merge or reset SSA_NAME_*_INFO.
+	 For merging we don't preserve the mapping between
+	 original and alias SSA_NAMEs from successful equals
+	 calls.  */
+      if (POINTER_TYPE_P (TREE_TYPE (name)))
+        {
+	  if (SSA_NAME_PTR_INFO (name))
+	    {
+	      gcc_assert (!flag_wpa);
+	      SSA_NAME_PTR_INFO (name) = NULL;
+	    }
+        }
+      else if (SSA_NAME_RANGE_INFO (name))
+	{
+	  gcc_assert (!flag_wpa);
+	  SSA_NAME_RANGE_INFO (name) = NULL;
+	}
+    }
+
   if (redirect_callers)
     {
       int nredirected = redirect_all_callers (alias, local_original);
then?  For the merging, I guess we'd need to move one of the 2 vec<int> vectors from m_checker to the sem_function instead of throwing it away in sem_function::equals
if returning true.
Comment 28 rguenther@suse.de 2024-02-14 10:35:32 UTC
On Wed, 14 Feb 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
> 
> --- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> So:
> --- gcc/ipa-icf.cc.jj   2024-02-10 11:25:09.645478952 +0100
> +++ gcc/ipa-icf.cc      2024-02-14 10:44:27.906216458 +0100
> @@ -1244,6 +1244,29 @@ sem_function::merge (sem_item *alias_ite
>    else
>      create_alias = true;
> 
> +  unsigned i;
> +  tree name;
> +  FOR_EACH_SSA_NAME (i, name, original->get_fun ())
> +    {
> +      /* We need to either merge or reset SSA_NAME_*_INFO.
> +        For merging we don't preserve the mapping between
> +        original and alias SSA_NAMEs from successful equals
> +        calls.  */
> +      if (POINTER_TYPE_P (TREE_TYPE (name)))
> +        {
> +         if (SSA_NAME_PTR_INFO (name))
> +           {
> +             gcc_assert (!flag_wpa);
> +             SSA_NAME_PTR_INFO (name) = NULL;
> +           }
> +        }
> +      else if (SSA_NAME_RANGE_INFO (name))
> +       {
> +         gcc_assert (!flag_wpa);
> +         SSA_NAME_RANGE_INFO (name) = NULL;
> +       }
> +    }
> +
>    if (redirect_callers)
>      {
>        int nredirected = redirect_all_callers (alias, local_original);
> then?

Yeah, though can't we do this in the caller and thus only once for the
target when we merge more than 2 nodes?

  For the merging, I guess we'd need to move one of the 2 vec<int> vectors
> from m_checker to the sem_function instead of throwing it away in
> sem_function::equals
> if returning true.

It might require quite some amount of memory though when N is big,
I'm not sure it's worth it?
Comment 29 Jan Hubicka 2024-02-14 15:52:42 UTC
Safest fix is to make equals_p to reject merging functions with different value ranges assigned to corresponding SSA names.  I would hope that, since early opts are still mostly local, that does not lead to very large degradation. This is lame of course.

If we go for smarter merging, we need to also handle ipa-prop jump functions.  In that case I think equals_p needs to check if value range sin SSA_NAMES and jump functions differs and if so, keep that noted so the merging code can do corresponding update.  I will check how hard it is to implement this.  (Equality handling is Martin Liska's code, but if I recall right, each equivalence class has a leader, and we can keep track if there are some differences WRT that leader, but I do not recall how subdivision of equivalence classes is handled).
Comment 30 Jakub Jelinek 2024-02-14 15:56:40 UTC
Created attachment 57426 [details]
gcc14-pr113907.patch

I've managed to come up with a small runtime testcase.
Now with a patch which does the resetting of SSA_NAME_{PTR,RANGE}_INFO for !flag_wpa in ICF merged functions (in the hope that we regenerate it during vrp1).
Comment 31 Jan Hubicka 2024-02-15 14:25:44 UTC
Having a testcase is great. I was just playing with crafting one.
I am still concerned about value ranges in ipa-prop's jump functions.
Let me see if I can modify the testcase to also trigger problem with value ranges in ipa-prop jump functions.

Not streaming value ranges is an omission on my side (I mistakely assumed we do stream them).  We ought to stream them, since otherwise we will lose propagated return value ranges in partitioned programs, which is pity.
Comment 32 Jakub Jelinek 2024-02-15 14:36:32 UTC
(In reply to Jan Hubicka from comment #31)
> Having a testcase is great. I was just playing with crafting one.
> I am still concerned about value ranges in ipa-prop's jump functions.

Maybe my imagination is too limited, but if the ipa-prop's jump functions are computed before ICF is performed, then if you call function a which makes some assumptions about its arguments and SSA_NAMEs used within the function and results in some return range
and another one which makes different assumptions and results in a different range,
then even if the two functions are merged and either the range info is thrown away
or unioned, I think if some functions call one or the other functions then still the original range assumptions still apply, depending on which one actually called.

> Let me see if I can modify the testcase to also trigger problem with value
> ranges in ipa-prop jump functions.
> 
> Not streaming value ranges is an omission on my side (I mistakely assumed we
> do stream them).  We ought to stream them, since otherwise we will lose
> propagated return value ranges in partitioned programs, which is pity.

Not sure if it is a good idea to start streaming them now in stage4, but sure, if we
stream them (and I think we should mostly have code to be able to stream that because we can stream the ranges in the jump functions, just unsure about points-to stuff),
then I still think best approach would be to merge regardless of range info, but in that case preferrably union instead of throw away.  The only question is where to do the merging for the LTO case (where to stream the union of the ranges and where to read it in and update for the SSA_NAMEs).
Comment 33 Jakub Jelinek 2024-02-15 14:36:55 UTC
Anyway, should I work on the union variant or do you want to take this over?
Comment 34 rguenther@suse.de 2024-02-15 14:43:28 UTC
On Thu, 15 Feb 2024, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
> 
> --- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Jan Hubicka from comment #31)
[...]
> > Not streaming value ranges is an omission on my side (I mistakely assumed we
> > do stream them).  We ought to stream them, since otherwise we will lose
> > propagated return value ranges in partitioned programs, which is pity.
> 
> Not sure if it is a good idea to start streaming them now in stage4, but sure,

Please don't do that now ;)

(we've not had ranges early originally, and even nonzero bits were not
computed)

But yes, IPA CP jump functions with value-ranges might be a problem.
I think it does instantiate them as local ranges, does it?  We
have streaming support for ranges, we just don't stream the auxiliary
data for all SSA names (like also not points-to info).
Comment 35 rguenther@suse.de 2024-02-15 14:45:19 UTC
On Thu, 15 Feb 2024, rguenther at suse dot de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
> 
> --- Comment #34 from rguenther at suse dot de <rguenther at suse dot de> ---
> On Thu, 15 Feb 2024, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
> > 
> > --- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > (In reply to Jan Hubicka from comment #31)
> [...]
> > > Not streaming value ranges is an omission on my side (I mistakely assumed we
> > > do stream them).  We ought to stream them, since otherwise we will lose
> > > propagated return value ranges in partitioned programs, which is pity.
> > 
> > Not sure if it is a good idea to start streaming them now in stage4, but sure,
> 
> Please don't do that now ;)
> 
> (we've not had ranges early originally, and even nonzero bits were not
> computed)
> 
> But yes, IPA CP jump functions with value-ranges might be a problem.
> I think it does instantiate them as local ranges, does it?  We
> have streaming support for ranges, we just don't stream the auxiliary
> data for all SSA names (like also not points-to info).

Also remember we like to have a fix that's easily backportable, and
that's probably going to be resetting the info.  We can do something
more fancy for GCC 15
Comment 36 Jan Hubicka 2024-02-15 14:55:22 UTC
> > Having a testcase is great. I was just playing with crafting one.
> > I am still concerned about value ranges in ipa-prop's jump functions.
> 
> Maybe my imagination is too limited, but if the ipa-prop's jump functions are
> computed before ICF is performed, then if you call function a which makes some
> assumptions about its arguments and SSA_NAMEs used within the function and
> results in some return range
> and another one which makes different assumptions and results in a different
> range,
> then even if the two functions are merged and either the range info is thrown
> away
> or unioned, I think if some functions call one or the other functions then
> still the original range assumptions still apply, depending on which one
> actually called.

I think the tescase should have functions A1 and A2 calling function B.
We need to arrange A1 to ICF into A2 and make A2 to have narrower value
range of parameter of B visible to ipa-prop. Since ICF happens first,
ipa-prop will not see A1's value range and specialize B for A2's range.
Which sould make it possible to trigger wrong code.
> 
> > Let me see if I can modify the testcase to also trigger problem with value
> > ranges in ipa-prop jump functions.
> > 
> > Not streaming value ranges is an omission on my side (I mistakely assumed we
> > do stream them).  We ought to stream them, since otherwise we will lose
> > propagated return value ranges in partitioned programs, which is pity.
> 
> Not sure if it is a good idea to start streaming them now in stage4, but sure,
> if we
> stream them (and I think we should mostly have code to be able to stream that

It probably makes sense to postpone VR stremaing for stage1.
(Even thouh it is not hard to do.)  I will add that to my TODO.

> because we can stream the ranges in the jump functions, just unsure about
> points-to stuff),
> then I still think best approach would be to merge regardless of range info,
> but in that case preferrably union instead of throw away.  The only question is
> where to do the merging for the LTO case (where to stream the union of the
> ranges and where to read it in and update for the SSA_NAMEs).

With LTO ICF streams in all function bodies for comparsion to WPA and
keeps the winning one, so it is about extending the comparator to keep
track of difference of value ranges for all compared pairs.

There is code to merge profiles, so at that place one can also do other
kind of metadata merging.  ICF needs a lot of love on this - value
range merging is just one example.  We also want to merge TBAA (so we
can merge different template instatiations) and other stuff.

At the moment we handle all other metadat conservatively (i.e. do not
attempt to merge, just refuse merging if it differs) so value ranges are
first that are handled aggressively with your patch.

I think it is fine, since most value range will be recomputed in late
optimization - the only exceptions are the return value ranges for now.

I will try to work on making this more systematic next stage1.
Comment 37 Jan Hubicka 2024-02-15 14:57:01 UTC
> Also remember we like to have a fix that's easily backportable, and
> that's probably going to be resetting the info.  We can do something
> more fancy for GCC 15

Rejecting to merge function with different vlaue ranges is also easy,
but so is reseting.  I only need to check the ipa-prop.
Jakub, also I can take over the patch - and sorry for beging slow. I had
visitor doing some research last two weeks and I am trying to catch up
now.
Comment 38 Jakub Jelinek 2024-02-15 15:02:05 UTC
(In reply to Jan Hubicka from comment #36)
> > > Having a testcase is great. I was just playing with crafting one.
> > > I am still concerned about value ranges in ipa-prop's jump functions.
> > 
> > Maybe my imagination is too limited, but if the ipa-prop's jump functions are
> > computed before ICF is performed, then if you call function a which makes some
> > assumptions about its arguments and SSA_NAMEs used within the function and
> > results in some return range
> > and another one which makes different assumptions and results in a different
> > range,
> > then even if the two functions are merged and either the range info is thrown
> > away
> > or unioned, I think if some functions call one or the other functions then
> > still the original range assumptions still apply, depending on which one
> > actually called.
> 
> I think the tescase should have functions A1 and A2 calling function B.
> We need to arrange A1 to ICF into A2 and make A2 to have narrower value
> range of parameter of B visible to ipa-prop. Since ICF happens first,
> ipa-prop will not see A1's value range and specialize B for A2's range.
> Which sould make it possible to trigger wrong code.

If you manage to get wrong ranges in such case, then reusing part of the above testcase could help make it exploitable, with -minline-all-stringops (at least in some tunings?) memcpy emits if (len < 8) rep movsb; else { if (dst & 1) movsb; if (dst & 2) movsb; etc. } and so if incorrect value range results in the len < 8 check to be optimized away, the rest misbehaves if destination is aligned to 8 with misalignment 1 and length is smaller than 8.
Comment 39 Jan Hubicka 2024-02-16 14:40:50 UTC
This testcase
#include <stdio.h>
int data[100];

__attribute__((noinline))
int bar (int d, unsigned int d2)
{
  if (d2 > 10)
    printf ("Bingo\n");
  return d + d2;
}

int
test2 (unsigned int i)
{
  if (i > 10)
    __builtin_unreachable ();
  if (__builtin_expect (data[i] != 0, 1))
    return data[i];
  printf ("%i\n",i);
  for (int j = 0; j < 100; j++)
    data[i] += bar (data[j], i+17);
  return data[i];
}
int
test (unsigned int i)
{
  if (i > 100)
    __builtin_unreachable ();
  if (__builtin_expect (data[i] != 0, 1))
    return data[i];
  printf ("%i\n",i);
  for (int j = 0; j < 100; j++)
    data[i] += bar (data[j], i+17);
  return data[i];
}
int
main ()
{
  test (1);
  test (2);
  test (3);
  test2 (4);
  test2 (100);
  return 0;
}

gets me most of what I want to reproduce ipa-prop problem. Functions test and test2 are split with different value ranges visible in the fnsplit dump.  However curiously enough ipa-prop analysis seems to ignore the value ranges and does not attach them to the jump function, which is odd...
Comment 40 Jakub Jelinek 2024-02-16 15:10:14 UTC
(In reply to Jan Hubicka from comment #39)
> This testcase
> #include <stdio.h>
> int data[100];
> 
> __attribute__((noinline))
> int bar (int d, unsigned int d2)
> {
>   if (d2 > 10)
>     printf ("Bingo\n");
>   return d + d2;
> }

So, while d2 should have [17, 27] range when called from test2 and [17, 117] from test,
I don't see anything that would limit ranges of data[?], it can be anything (say some global constructor modifying the data array).  So bar has to return VARYING.
Comment 41 Jan Hubicka 2024-02-16 15:52:05 UTC
OK, the reason why this does not work is that ranger ignores earlier value ranges on everything but default defs and phis.

// This is where the ranger picks up global info to seed initial
// requests.  It is a slightly restricted version of
// get_range_global() above.
//
// The reason for the difference is that we can always pick the
// default definition of an SSA with no adverse effects, but for other
// SSAs, if we pick things up to early, we may prematurely eliminate
// builtin_unreachables.
//
// Without this restriction, the test in g++.dg/tree-ssa/pr61034.C has
// all of its unreachable calls removed too early.
//
// See discussion here:
// https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html

void
gimple_range_global (vrange &r, tree name, struct function *fun)
{
  tree type = TREE_TYPE (name);
  gcc_checking_assert (TREE_CODE (name) == SSA_NAME);

  if (SSA_NAME_IS_DEFAULT_DEF (name) || (fun && fun->after_inlining)
      || is_a<gphi *> (SSA_NAME_DEF_STMT (name)))
    { 
      get_range_global (r, name, fun);
      return;
    }
  r.set_varying (type);
}


This makes ipa-prop to ignore earlier known value range and mask the bug.  However adding PHI makes the problem to reproduce:
#include <stdio.h>
#include <stdlib.h>
int data[100];
int c;

static __attribute__((noinline))
int bar (int d, unsigned int d2)
{
  if (d2 > 30)
          c++;
  return d + d2;
}
static int
test2 (unsigned int i)
{
  if (i > 100)
    __builtin_unreachable ();
  if (__builtin_expect (data[i] != 0, 1))
    return data[i];
  for (int j = 0; j < 100; j++)
    data[i] += bar (data[j], i&1 ? i+17 : i + 16);
  return data[i];
}

static int
test (unsigned int i)
{
  if (i > 10)
    __builtin_unreachable ();
  if (__builtin_expect (data[i] != 0, 1))
    return data[i];
  for (int j = 0; j < 100; j++)
    data[i] += bar (data[j], i&1 ? i+17 : i + 16);
  return data[i];
}
int
main ()
{
  int ret = test (1) + test (2) + test (3) + test2 (4) + test2 (30);
  if (!c)
          abort ();
  return ret;
}

This fails with trunk, gcc12 and gcc13 and also with Jakub's patch.
Comment 42 Jakub Jelinek 2024-02-16 15:55:01 UTC
So guess we need to union the ranges and for earlier gccs also the zero bits stuff upon ICF, right?
Comment 43 Jan Hubicka 2024-02-16 16:08:36 UTC
> // See discussion here:
> // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
Discussion says:

"Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
will tell the truth.  However, this will mean that we will no longer 
remove the first __builtin_unreachable combo.  But ISTM, that would be 
correct behavior ??."

So perhaps, we could remove that special case for default def and phi?
It is an odd thing and we clearly lose info here.
The problem is that value of parameter i itself does not have global
value range [0...10] so  I need to compute new SSA name to get it
preserved through ipa-split.  Maybe ipa-split can be extended to fire up
ranger and try to get better value ranges on function parameters from
the split function header.  Not sure if that is worth the effort though.


If we go with merging functions with different ranges, we indeed need to
update ranges and bits both in SSA_NAME infos and in ipa-prop's jump
functions.  At the time sem_fuction::merge calls ipa_merge_profiles we
do have both function bodies in memory and thus we can do the job.

If we just drop the info instead of merging, we do limited harm on
SSA_NAME infos since they mostly will be recomputed again.  For ipa-prop
this may cause more interesting precision loss.  

So perhaps for backportability we may want to just limit ICF to
functions wth same ranges in SSA_NAME infos.  Let me cook up a patch and
see if there is significant loss in merged functions. I think it should
be quite small given that ranges seem to only diverge through ipa-split
in very specific cases for now. (and given how much time I spent on
producing the testcase)
Comment 44 Jakub Jelinek 2024-02-16 16:27:26 UTC
(In reply to Jan Hubicka from comment #43)
> > // See discussion here:
> > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
> Discussion says:
> 
> "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> will tell the truth.  However, this will mean that we will no longer 
> remove the first __builtin_unreachable combo.  But ISTM, that would be 
> correct behavior ??."
...

But that doesn't cause these problems, just perhaps losing some info, right?
If so, trying to change that feels like stage1 material to me.
Comment 45 Jan Hubicka 2024-02-16 16:45:46 UTC
> > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> > will tell the truth.  However, this will mean that we will no longer 
> > remove the first __builtin_unreachable combo.  But ISTM, that would be 
> > correct behavior ??."
> ...
> 
> But that doesn't cause these problems, just perhaps losing some info, right?
> If so, trying to change that feels like stage1 material to me.

Yep. It would be nice to not forget about that - it kept me confused for
over an hour :)
Comment 46 Andrew Macleod 2024-02-16 17:01:54 UTC
(In reply to Jan Hubicka from comment #43)
> > // See discussion here:
> > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
> Discussion says:
> 
> "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> will tell the truth.  However, this will mean that we will no longer 
> remove the first __builtin_unreachable combo.  But ISTM, that would be 
> correct behavior ??."
> 
> So perhaps, we could remove that special case for default def and phi?
> It is an odd thing and we clearly lose info here.
> 

legacy VRP has been removed now.  So in theory we are free to do as we want.. but I don't remember the specific details.

So do you just want to always use get_range_global() ?  and not do the check?

I can try changing it to just get_global  and see what happens.
Comment 47 Andrew Macleod 2024-02-16 20:42:52 UTC
(In reply to Andrew Macleod from comment #46)
> (In reply to Jan Hubicka from comment #43)
> > > // See discussion here:
> > > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
> > Discussion says:
> > 
> > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> > will tell the truth.  However, this will mean that we will no longer 
> > remove the first __builtin_unreachable combo.  But ISTM, that would be 
> > correct behavior ??."
> > 
> > So perhaps, we could remove that special case for default def and phi?
> > It is an odd thing and we clearly lose info here.
> > 
> 
> legacy VRP has been removed now.  So in theory we are free to do as we
> want.. but I don't remember the specific details.
> 
> So do you just want to always use get_range_global() ?  and not do the check?
> 
> I can try changing it to just get_global  and see what happens.

FWIW,
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 040c843c566..0ab10bc5a46 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -353,16 +353,9 @@ get_range_global (vrange &r, tree name, struct function *fun = cfun)
 void
 gimple_range_global (vrange &r, tree name, struct function *fun)
 {
-  tree type = TREE_TYPE (name);
   gcc_checking_assert (TREE_CODE (name) == SSA_NAME);
 
-  if (SSA_NAME_IS_DEFAULT_DEF (name) || (fun && fun->after_inlining)
-      || is_a<gphi *> (SSA_NAME_DEF_STMT (name)))
-    {
-      get_range_global (r, name, fun);
-      return;
-    }
-  r.set_varying (type);
+  get_range_global (r, name, fun);
 }

bootstraps and runs the testsuites clean on x86-64 now...
Comment 48 Sam James 2024-02-22 09:22:03 UTC
(In reply to Jan Hubicka from comment #37)
> [...]
> Jakub, also I can take over the patch - and sorry for beging slow. I had
> visitor doing some research last two weeks and I am trying to catch up
> now.

I'll assign to you, thanks!
Comment 49 Jakub Jelinek 2024-03-08 15:34:02 UTC
Honza, any further progress on this?
Comment 50 Jeffrey A. Law 2024-03-09 17:10:47 UTC
*** Bug 113665 has been marked as a duplicate of this bug. ***
Comment 51 Andrew Pinski 2024-03-09 21:04:54 UTC
*** Bug 114290 has been marked as a duplicate of this bug. ***
Comment 52 Andrew Pinski 2024-03-09 21:06:45 UTC
The testcase from PR 114290 shows this has been an issue since GCC 9 (r9-7460-g9f2cfe108f75de). r9-7460-g9f2cfe108f75de was the revision which copied the range information for the splitted out function too ...
Comment 53 Andrew Pinski 2024-03-09 21:11:14 UTC
*** Bug 106921 has been marked as a duplicate of this bug. ***
Comment 54 Jakub Jelinek 2024-03-11 11:21:23 UTC
Slightly adjusted #c41 testcase, which indeed still fails with my patch at -O2:

int d[100], c;

static __attribute__((noinline))
int foo (int x, unsigned int y)
{
  if (y > 30)
    ++c;
  return x + y;
}

static int
bar (unsigned int x)
{
  if (x > 100)
    __builtin_unreachable ();
  if (__builtin_expect (d[x] != 0, 1))
    return d[x];
  for (int j = 0; j < 100; j++)
    d[x] += foo (d[j], x & 1 ? x + 17 : x + 16);
  return d[x];
}

static int
baz (unsigned int x)
{
  if (x > 10)
    __builtin_unreachable ();
  if (__builtin_expect (d[x] != 0, 1))
    return d[x];
  for (int j = 0; j < 100; j++)
    d[x] += foo (d[j], x & 1 ? x + 17 : x + 16);
  return d[x];
}

int
main ()
{
  int r = baz (1) + baz (2) + baz (3) + bar (4) + bar (30);
  if (c != 100)
    __builtin_abort ();
  return r;
}

I still think we want both my patch and some fix for the jump functions, not just the latter.
Anyway, can we in the spot my patch changed just walk all source->node->callees cgraph_edges, for each of them find the corresponding cgraph_edge in the alias and for each walk all the jump_functions recorded and union their m_vr?
Or is that something that can't be done in LTO for some reason?
Comment 55 Jan Hubicka 2024-03-13 14:07:48 UTC
> Anyway, can we in the spot my patch changed just walk all source->node->callees > cgraph_edges, for each of them find the corresponding cgraph_edge in the alias > and for each walk all the jump_functions recorded and union their m_vr?
> Or is that something that can't be done in LTO for some reason?

That was my fist idea too, but the problem is that icf has (very limited) support for matching function which differ by order of the basic blocks: it computes hash of every basic block and orders them by their hash prior comparing. This seems half-finished since i.e. order of edges in PHIs has to match exactly.

Callee lists are officially randomly ordered, but practically they follows the order of basic blocks (as they are built this way).  However since BB orders can differ, just walking both callee sequences and comparing pairwise does not work. This also makes merging the information harder, since we no longer have the BB map at the time decide to merge.

It is however not hard to match the jump function while walking gimple bodies and comparing statements, which is backportable and localized. I am still waiting for my statistics to converge and will send it soon.
Comment 56 Jakub Jelinek 2024-03-13 14:14:33 UTC
(In reply to Jan Hubicka from comment #55)
> It is however not hard to match the jump function while walking gimple
> bodies and comparing statements, which is backportable and localized. I am
> still waiting for my statistics to converge and will send it soon.

So, we can punt on differences there (that is desirable for backporting and maybe GCC 14 too), or we could at that point populate an int vector, which maps the callee
vector indexes to indexes in the callee vector in the other candidate function.
If unsuccessful, we just free the vector, if successful, we first walk all the callees and union stuff in there using that vector.
Comment 57 Jan Hubicka 2024-03-13 15:21:37 UTC
> So, we can punt on differences there (that is desirable for backporting and
> maybe GCC 14 too), or we could at that point populate an int vector, which maps
Yep, that is what I do.
I had bug in that so I am re-running (forgot to check that callers and
callee argument count matches and this cuases ICE during LLVM LTO link).
It seems these extra checks makes no difference in practice. 
During bootstrap there are no pairs of functions during bootstrap where
we new checks punt on value range difference or jump function
difference that would be merged otherwise.

Most common case where we could merge but we don't are those triggered
by TBAA.
> the callee
> vector indexes to indexes in the callee vector in the other candidate function.
> If unsuccessful, we just free the vector, if successful, we first walk all the
> callees and union stuff in there using that vector.
This is the plan for metadata merging. A small complication here is that
ICF works by comparing bodies to a leader of equivalence class but this
leader is not necessarilly the surviving function body.  So if we
compared A to L (leader) and B to L and then decided replace A by B, we
need to be able to combine the permutations so we know how to map call
sites in A to ones in B.  The same is true about SSA names and basic
blocks.  I have patch for that for next stage1.
Comment 58 Jan Hubicka 2024-03-14 16:20:17 UTC
Created attachment 57702 [details]
Compare value ranges in jump functions

This patch implements the jump function compare, however it is not good enough.  Here is another wrong code:

jh@ryzen3:~/gcc/build/stage1-gcc> cat a.c
#include <stddef.h>
#include <stdlib.h>

__attribute__((used)) int val,val2 = 1;

struct foo {int a;};

struct foo **ptr;

__attribute__ ((noipa))
int
test2 (void *a)
{ 
  ptr = (struct foo **)a;
}
int test3 (void *a);

int
test(void)
{ 
  struct foo *fp;
  test2 ((void *)&fp);
  fp = NULL;
  (*ptr)++;
  test3 ((void *)&fp);
}

int testb (void);

int
main()
{ 
  for (int i = 0; i < val2; i++)
  if (val)
    testb ();
  else
    test();
}
jh@ryzen3:~/gcc/build/stage1-gcc> cat b.c
#include <stdlib.h>
struct bar {int a;};
struct foo {int a;};
struct barp {struct bar *f; struct bar *g;};
extern struct foo **ptr;
int test2 (void *);
int test3 (void *);
int
testb(void)
{
  struct bar *fp;
  test2 ((void *)&fp);
  fp = NULL;
  (*ptr)++;
  test3 ((void *)&fp);
}
jh@ryzen3:~/gcc/build/stage1-gcc> cat c.c
#include <stdlib.h>
__attribute__ ((noinline))
int
test3 (void *a)
{
  if (!*(void **)a)
          abort ();
  return 0;
}
jh@ryzen3:~/gcc/build/stage1-gcc> ./xgcc -B ./ -O3 a.c b.c -flto -c ; ./xgcc -B ./ -O3 c.c -flto -fno-strict-aliasing -c ; ./xgcc  -B ./ b.o a.o c.o ; ./a.out
Aborted (core dumped)
jh@ryzen3:~/gcc/build/stage1-gcc> ./xgcc -B ./ -O3 a.c b.c -flto -c ; ./xgcc -B ./ -O3 c.c -flto -fno-strict-aliasing -c ; ./xgcc  -B ./ b.o a.o c.o --disable-ipa-icf ; ./a.out 
lto1: note: disable pass ipa-icf for functions in the range of [0, 4294967295]
lto1: note: disable pass ipa-icf for functions in the range of [0, 4294967295]
Comment 59 Jan Hubicka 2024-03-14 16:39:27 UTC
just to explain what happens in the testcase.  There is test and testb. They are almost same:

int
testb(void)
{
  struct bar *fp;
  test2 ((void *)&fp);
  fp = NULL;
  (*ptr)++;
  test3 ((void *)&fp);
}
the difference is in the alias set of FP. In one case it aliases with the (*ptr)++ while in other it does not.  This makes one function to have jump function specifying aggregate value of 0 for *fp, while other does not.

Now with LTO both struct bar and foo becomes compatible for TBAA, so the functions gets merged and the winning variant has the jump function specifying aggregate 0, which is wrong in the context code is invoked.
Comment 60 GCC Commits 2024-03-14 16:49:43 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7580e39452b65ab5fb5a06f3f1ad7d59720269b5

commit r14-9476-g7580e39452b65ab5fb5a06f3f1ad7d59720269b5
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Mar 14 17:48:30 2024 +0100

    icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
    
    AFAIK we have no code in LTO streaming to stream out or in
    SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
    and let vrp1 and alias analysis after IPA recompute that.  There is
    just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
    and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
    we saved and propagated, but that is after streaming in bodies for the
    post IPA optimizations.
    
    Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
    earlier in many cases (er.g. evrp and early alias analysis but other spots
    too), but IPA ICF is ignoring the ranges and points-to details when
    comparing the bodies.  I think ignoring that is just fine, that is
    effectively what we do for LTO where we throw that information away
    before the analysis, and not ignoring it could lead to fewer ICF merging
    possibilities.
    
    So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
    just isn't there on SSA_NAMEs in functions into which other functions have
    been ICFed, and for non-LTO throws that information away (which matches the
    LTO behavior).
    
    Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
    vector (just one of the 2) on successful sem_function::equals on the
    sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
    leader map to SSA_NAMEs in the other function) and use that vector
    to union the ranges in sem_function::merge.  I can implement that for
    comparison, but wanted to post this first if there is an agreement on
    doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
    into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
    no idea how to try to compare points to info.  And I think it will result
    in less effective ICF for non-LTO vs. LTO unnecessarily.
    
    2024-03-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/113907
            * ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
            SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
            functions.
    
            * gcc.dg/pr113907-1.c: New test.
Comment 61 Jakub Jelinek 2024-03-14 16:53:49 UTC
The originally reported bug should be fixed for GCC 14 but the others like #c54 and #c58 are not.
Comment 62 GCC Commits 2024-03-15 23:29:38 UTC
The releases/gcc-13 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:9f255e4baac68fc3568820cdca9412f67ff07940

commit r13-8451-g9f255e4baac68fc3568820cdca9412f67ff07940
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Mar 14 17:48:30 2024 +0100

    icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
    
    AFAIK we have no code in LTO streaming to stream out or in
    SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
    and let vrp1 and alias analysis after IPA recompute that.  There is
    just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
    and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
    we saved and propagated, but that is after streaming in bodies for the
    post IPA optimizations.
    
    Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
    earlier in many cases (er.g. evrp and early alias analysis but other spots
    too), but IPA ICF is ignoring the ranges and points-to details when
    comparing the bodies.  I think ignoring that is just fine, that is
    effectively what we do for LTO where we throw that information away
    before the analysis, and not ignoring it could lead to fewer ICF merging
    possibilities.
    
    So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
    just isn't there on SSA_NAMEs in functions into which other functions have
    been ICFed, and for non-LTO throws that information away (which matches the
    LTO behavior).
    
    Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
    vector (just one of the 2) on successful sem_function::equals on the
    sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
    leader map to SSA_NAMEs in the other function) and use that vector
    to union the ranges in sem_function::merge.  I can implement that for
    comparison, but wanted to post this first if there is an agreement on
    doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
    into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
    no idea how to try to compare points to info.  And I think it will result
    in less effective ICF for non-LTO vs. LTO unnecessarily.
    
    2024-03-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/113907
            * ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
            SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
            functions.
    
            * gcc.dg/pr113907-1.c: New test.
    
    (cherry picked from commit 7580e39452b65ab5fb5a06f3f1ad7d59720269b5)
Comment 63 Jakub Jelinek 2024-03-19 17:09:14 UTC
(In reply to Jan Hubicka from comment #58)
> Created attachment 57702 [details]
> Compare value ranges in jump functions

s/functoins/functions/

Are you going to apply this patch, even if it just helps partially with some tests and not others?
Comment 64 Jan Hubicka 2024-03-19 18:41:27 UTC
> Are you going to apply this patch, even if it just helps partially with some
> tests and not others?
I think we should fix this completely, since it is source of very
suprising bugs.  I discussed it with Martin (since he is maintaining the
jump functions) and he will add comparator for them, so we can plug this
bug completely.  If we have operator= on functions, we can use it at the
spot I am comparing value ranges. 

There is also a need to compare loop structures.   Seems I have testcase
now too (triggering peeling on miscomputed upper bound on iteration
count). Will attach it after cleaning up.
Comment 65 Martin Jambor 2024-03-20 09:05:41 UTC
I hope to have some jump-function comparison functions ready for testing later today.
Comment 66 Martin Jambor 2024-03-20 18:13:32 UTC
Created attachment 57750 [details]
Patch comparing jump functions

I'm testing this patch.  (Not sure how to best check that it does not inadvertently pessimize ICF too much, except for ICF testcases.)
Comment 67 Jakub Jelinek 2024-03-20 18:33:07 UTC
(In reply to Martin Jambor from comment #66)
> Created attachment 57750 [details]
> Patch comparing jump functions
> 
> I'm testing this patch.  (Not sure how to best check that it does not
> inadvertently pessimize ICF too much, except for ICF testcases.)

Bet modify the patch slightly (for testing only), so instead of those 3
return_false_with_msg it would just set some flag somewhere that the pair is not ICF optimizable, then after the early return false in sem_function::merge check that flag
and log into some /tmp/ file using appending if the ICF merge would be done and the flag wasn't set, or if the ICF merge wouldn't be done (and in that case return false too), bootstrap/regtest with such patch, plus build a few other packages (firefox, libreoffice) and then get statistics from the log file on what percentage of ICF folding it now prevents.
Comment 68 Jakub Jelinek 2024-03-21 15:29:39 UTC
(In reply to Jakub Jelinek from comment #67)
> (In reply to Martin Jambor from comment #66)
> > Created attachment 57750 [details]
> > Patch comparing jump functions
> > 
> > I'm testing this patch.  (Not sure how to best check that it does not
> > inadvertently pessimize ICF too much, except for ICF testcases.)
> 
> Bet modify the patch slightly (for testing only), so instead of those 3
> return_false_with_msg it would just set some flag somewhere that the pair is
> not ICF optimizable, then after the early return false in
> sem_function::merge check that flag
> and log into some /tmp/ file using appending if the ICF merge would be done
> and the flag wasn't set, or if the ICF merge wouldn't be done (and in that
> case return false too), bootstrap/regtest with such patch, plus build a few
> other packages (firefox, libreoffice) and then get statistics from the log
> file on what percentage of ICF folding it now prevents.

To correct myself, basically return a tristate from the ICF comparisons, instead of returning boolean this is the same vs. not return this is the same including jump_functions vs. this is the same excluding jump_functions vs. this is not the same,
put the first category into the same class but somehow connect two different classes as weakly connected if they were the same before but aren't anymore.  And then when trying to merge stuff first merge stuff in the same class + log that, and then check if one could merge between the connected classes but fail before actually merging it and log that too.
Comment 69 GCC Commits 2024-03-28 12:25:10 UTC
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:0923fe2d4808c16b72c1d1bfe28220dd326d8b76

commit r14-9705-g0923fe2d4808c16b72c1d1bfe28220dd326d8b76
Author: Jan Hubicka <jh@suse.cz>
Date:   Thu Mar 28 13:24:54 2024 +0100

    Hash operands of PHI in ipa-icf
    
    This patch fixes cache colision on function whose body differs only by constants
    at PHI operands.  As for
    
    if (test)
      a = cst1;
    else
      a = cst2;
    
    gcc/ChangeLog:
    
            PR middle-end/113907
            * ipa-icf.cc (sem_function::init): Hash PHI operands
            (sem_function::compare_phi_node): Add argument about preserving order
Comment 70 Jan Hubicka 2024-04-02 08:44:01 UTC
Hello,
over easter I did some analysis of the cases where ICF is now disabled
due to jump function miscompare.  Most common case (seen also on GCC) is
the situation where function is originally static inline and in some
units its callee is known which enables us to propagate return value
range.  It happens on wide int implementaiton but it is not that
importnat.

Other case is when function has address taken of local label and
ICF two functions taking address of local label and passing it to a
calle (maybe we should not but C standard says nothing).

I tought there is also case where jump function has other function local
stuff, but that does not happen since we avoid putting them to
jumptables (to avoid need to stream function blocal BLOCKS).

Last case is that the function takes as parameter an address of static
symbol that can be merged.  We currently don't do that since we miss any
logic tracking whether value address is eventually used to compare.  So

I think for release branchs and trunk Martin's patch is fine. For next
stage1 we will need to work on using icf's compare_op in the ipa-prop
code and also add merging of value ranges.

Honza
Comment 71 Martin Jambor 2024-04-04 21:17:05 UTC
I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6le5s25kl.fsf@virgil.suse.cz/T/#u
Comment 72 Sam James 2024-04-08 09:33:04 UTC
It's up to RMs of course but FWIW, the critical part of this for me is fixed now and it could be a P2 now if desired.
Comment 73 Richard Biener 2024-04-08 09:38:33 UTC
Agreed.
Comment 74 GCC Commits 2024-04-08 16:56:57 UTC
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:1162861439fd3c4b30fc3ccd49462e47e876f04a

commit r14-9840-g1162861439fd3c4b30fc3ccd49462e47e876f04a
Author: Martin Jambor <mjambor@suse.cz>
Date:   Mon Apr 8 18:53:23 2024 +0200

    ipa: Compare jump functions in ICF (PR 113907)
    
    In PR 113907 comment #58, Honza found a case where ICF thinks bodies
    of functions are equivalent but becaise of difference in aliases in a
    memory access, different aggregate jump functions are associated with
    supposedly equivalent call statements.  This patch adds a way to
    compare jump functions and plugs it into ICF to avoid the issue.
    
    gcc/ChangeLog:
    
    2024-03-20  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * ipa-prop.h (class ipa_vr): Declare new overload of a member function
            equal_p.
            (ipa_jump_functions_equivalent_p): Declare.
            * ipa-prop.cc (ipa_vr::equal_p): New function.
            (ipa_agg_pass_through_jf_equivalent_p): Likewise.
            (ipa_agg_jump_functions_equivalent_p): Likewise.
            (ipa_jump_functions_equivalent_p): Likewise.
            * ipa-cp.h (values_equal_for_ipcp_p): Declare.
            * ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
            * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
            ipa-cp.h and ipa-prop.h.
            (func_checker::compare_gimple_call): Comapre jump functions.
    
    gcc/testsuite/ChangeLog:
    
    2024-03-20  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * gcc.dg/lto/pr113907_0.c: New.
            * gcc.dg/lto/pr113907_1.c: Likewise.
            * gcc.dg/lto/pr113907_2.c: Likewise.
Comment 75 Martin Jambor 2024-04-08 17:04:10 UTC
The above fixes the testcase from comment #58.  I am not sure if any other testcases discussed here remain unresolved.  I am also not sure to what extent we want to that patch of mine, I guess I'll re-visit the idea in a few weeks.
Comment 76 Jan Hubicka 2024-04-09 09:47:59 UTC
There is still problem with loop bounds.  I am testing patch on that and
then we should be (finally) finally safe.
Comment 77 Richard Biener 2024-05-07 07:45:13 UTC
GCC 14.1 is being released, retargeting bugs to GCC 14.2.
Comment 78 GCC Commits 2024-05-14 15:06:59 UTC
The releases/gcc-13 branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:1db45e83021a8a87f41e22053910fcce6e8e2c2c

commit r13-8774-g1db45e83021a8a87f41e22053910fcce6e8e2c2c
Author: Martin Jambor <mjambor@suse.cz>
Date:   Tue May 14 17:01:21 2024 +0200

    ipa: Compare jump functions in ICF (PR 113907)
    
    This is a manual backport of r14-9840-g1162861439fd3c from master.
    Manual because the bits and value range representation in jump
    functions have changes during the gcc 14 development cycle.
    
    In PR 113907 comment #58, Honza found a case where ICF thinks bodies
    of functions are equivalent but becaise of difference in aliases in a
    memory access, different aggregate jump functions are associated with
    supposedly equivalent call statements.  This patch adds a way to
    compare jump functions and plugs it into ICF to avoid the issue.
    
    gcc/ChangeLog:
    
    2024-05-14  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * ipa-prop.h (ipa_jump_functions_equivalent_p): Declare.
            (values_equal_for_ipcp_p): Likewise.
            * ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function.
            (ipa_agg_jump_functions_equivalent_p): Likewise.
            (ipa_jump_functions_equivalent_p): Likewise.
            * ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
            * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
            ipa-cp.h and ipa-prop.h.
            (func_checker::compare_gimple_call): Comapre jump functions.
    
    gcc/testsuite/ChangeLog:
    
    2024-05-10  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * gcc.dg/lto/pr113907_0.c: New.
            * gcc.dg/lto/pr113907_1.c: Likewise.
            * gcc.dg/lto/pr113907_2.c: Likewise.
Comment 79 GCC Commits 2024-05-28 13:45:51 UTC
The releases/gcc-12 branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:72f6b7ec3915f0b5b3517dffa19e3b34c8af687d

commit r12-10475-g72f6b7ec3915f0b5b3517dffa19e3b34c8af687d
Author: Martin Jambor <mjambor@suse.cz>
Date:   Tue May 28 13:33:02 2024 +0200

    ipa: Compare jump functions in ICF (PR 113907)
    
    This is a manual backport of r14-9840-g1162861439fd3c from master.
    Manual because the bits and value range representation in jump
    functions have changes during the gcc 14 development cycle.
    
    In PR 113907 comment #58, Honza found a case where ICF thinks bodies
    of functions are equivalent but becaise of difference in aliases in a
    memory access, different aggregate jump functions are associated with
    supposedly equivalent call statements.  This patch adds a way to
    compare jump functions and plugs it into ICF to avoid the issue.
    
    gcc/ChangeLog:
    
    2024-05-14  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * ipa-prop.h (ipa_jump_functions_equivalent_p): Declare.
            (values_equal_for_ipcp_p): Likewise.
            * ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function.
            (ipa_agg_jump_functions_equivalent_p): Likewise.
            (ipa_jump_functions_equivalent_p): Likewise.
            * ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
            * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
            ipa-cp.h and ipa-prop.h.
            (func_checker::compare_gimple_call): Comapre jump functions.
    
    gcc/testsuite/ChangeLog:
    
    2024-05-10  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/113907
            * gcc.dg/lto/pr113907_0.c: New.
            * gcc.dg/lto/pr113907_1.c: Likewise.
            * gcc.dg/lto/pr113907_2.c: Likewise.
    
    (cherry picked from commit 1db45e83021a8a87f41e22053910fcce6e8e2c2c)
Comment 80 GCC Commits 2024-06-11 10:37:34 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:81c300bf6836505ef1df1c4430972863c732fc14

commit r12-10516-g81c300bf6836505ef1df1c4430972863c732fc14
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Mar 14 17:48:30 2024 +0100

    icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
    
    AFAIK we have no code in LTO streaming to stream out or in
    SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
    and let vrp1 and alias analysis after IPA recompute that.  There is
    just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
    and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
    we saved and propagated, but that is after streaming in bodies for the
    post IPA optimizations.
    
    Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
    earlier in many cases (er.g. evrp and early alias analysis but other spots
    too), but IPA ICF is ignoring the ranges and points-to details when
    comparing the bodies.  I think ignoring that is just fine, that is
    effectively what we do for LTO where we throw that information away
    before the analysis, and not ignoring it could lead to fewer ICF merging
    possibilities.
    
    So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
    just isn't there on SSA_NAMEs in functions into which other functions have
    been ICFed, and for non-LTO throws that information away (which matches the
    LTO behavior).
    
    Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
    vector (just one of the 2) on successful sem_function::equals on the
    sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
    leader map to SSA_NAMEs in the other function) and use that vector
    to union the ranges in sem_function::merge.  I can implement that for
    comparison, but wanted to post this first if there is an agreement on
    doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
    into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
    no idea how to try to compare points to info.  And I think it will result
    in less effective ICF for non-LTO vs. LTO unnecessarily.
    
    2024-03-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/113907
            * ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
            SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
            functions.
    
            * gcc.dg/pr113907-1.c: New test.
    
    (cherry picked from commit 7580e39452b65ab5fb5a06f3f1ad7d59720269b5)
Comment 81 GCC Commits 2024-06-20 13:22:43 UTC
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:8a917760aa47f7fb307a4abf6e17dd5b52a5b25f

commit r11-11499-g8a917760aa47f7fb307a4abf6e17dd5b52a5b25f
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Mar 14 17:48:30 2024 +0100

    icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
    
    AFAIK we have no code in LTO streaming to stream out or in
    SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
    and let vrp1 and alias analysis after IPA recompute that.  There is
    just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
    and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
    we saved and propagated, but that is after streaming in bodies for the
    post IPA optimizations.
    
    Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
    earlier in many cases (er.g. evrp and early alias analysis but other spots
    too), but IPA ICF is ignoring the ranges and points-to details when
    comparing the bodies.  I think ignoring that is just fine, that is
    effectively what we do for LTO where we throw that information away
    before the analysis, and not ignoring it could lead to fewer ICF merging
    possibilities.
    
    So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
    just isn't there on SSA_NAMEs in functions into which other functions have
    been ICFed, and for non-LTO throws that information away (which matches the
    LTO behavior).
    
    Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
    vector (just one of the 2) on successful sem_function::equals on the
    sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
    leader map to SSA_NAMEs in the other function) and use that vector
    to union the ranges in sem_function::merge.  I can implement that for
    comparison, but wanted to post this first if there is an agreement on
    doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
    into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
    no idea how to try to compare points to info.  And I think it will result
    in less effective ICF for non-LTO vs. LTO unnecessarily.
    
    2024-03-12  Jakub Jelinek  <jakub@redhat.com>
    
            PR middle-end/113907
            * ipa-icf.c (sem_item_optimizer::merge_classes): Reset
            SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
            functions.
    
            * gcc.dg/pr113907-1.c: New test.
    
    (cherry picked from commit 7580e39452b65ab5fb5a06f3f1ad7d59720269b5)
Comment 82 Jan Hubicka 2024-07-22 17:05:32 UTC
All wrong code issues i know of are now fixed on 14/15
Comment 83 Jakub Jelinek 2024-08-01 09:38:34 UTC
GCC 14.2 is being released, retargeting bugs to GCC 14.3.