Bug 113136 - [14 regression] ICE when building Perl
Summary: [14 regression] ICE when building Perl
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P1 normal
Target Milestone: 14.0
Assignee: Tamar Christina
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-25 14:15 UTC by Sam James
Modified: 2024-01-12 15:35 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-12-26 00:00:00


Attachments
toke.i.xz (154.31 KB, application/x-xz)
2023-12-25 14:15 UTC, Sam James
Details
reduced.i (253 bytes, text/plain)
2023-12-25 15:08 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2023-12-25 14:15:27 UTC
I'll file bugs for each of these I hit, and then even if they're dupes, we get more test cases out of it, so...

Hit when building perl-5.38.

native=znver2:
```
x86_64-pc-linux-gnu-gcc -c -DPERL_CORE -O3 -pipe -march=native -fdiagnostics-color=always -Waddress -Warray-bounds -Wfree-nonheap-object -Wint-to-pointer-cast -Wmain -Wnonnull -Wodr -Wreturn-type -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstring-compare -Wuninitialized -Wvarargs -fno-strict-aliasing -DNO_PERL_RAND_SEED -fwrapv -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -std=c99 -O3 -pipe -march=native -fdiagnostics-color=always -Waddress -Warray-bounds -Wfree-nonheap-object -Wint-to-pointer-cast -Wmain -Wnonnull -Wodr -Wreturn-type -Wsizeof-pointer-memaccess -Wstrict-aliasing -Wstring-compare -Wuninitialized -Wvarargs -fno-strict-aliasing -Wall -Werror=pointer-arith -Werror=vla -Wextra -Wno-long-long -Wno-declaration-after-statement -Wc++-compat -Wwrite-strings -fPIC toke.c
toke.c: In function ‘yyl_try’:
toke.c:8984:1: error: definition in block 413 does not dominate use in block 420
 8984 | yyl_try(pTHX_ char *s)
      | ^~~~~~~
for SSA_NAME: vect__256.7602_2087 in statement:
vect__256.7602_2086 = PHI <vect__256.7602_2087(420)>
PHI argument
vect__256.7602_2087
for PHI node
vect__256.7602_2086 = PHI <vect__256.7602_2087(420)>
during GIMPLE pass: vect
toke.c:8984:1: internal compiler error: verify_ssa failed
0x55bcb952ea88 verify_ssa(bool, bool)
        /usr/src/debug/sys-devel/gcc-14.0.0.9999/gcc-14.0.0.9999/gcc/tree-ssa.cc:1203
0x55bcb9130a9d execute_function_todo
        /usr/src/debug/sys-devel/gcc-14.0.0.9999/gcc-14.0.0.9999/gcc/passes.cc:2095
0x55bcb9130f30 execute_todo
        /usr/src/debug/sys-devel/gcc-14.0.0.9999/gcc-14.0.0.9999/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.gentoo.org/> for instructions.
make: *** [makefile:260: toke.o] Error 1
```

```
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/14/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-14.0.0.9999/work/gcc-14.0.0.9999/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/14 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/14/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=yes,extra,rtl,df --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 14.0.0 p, commit 2250dc0cc8c46999ad1c1d79b9aeed9056459bb6' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --without-isl --enable-default-pie --enable-host-pie --disable-host-bind-now --enable-default-ssp
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231225 (experimental) 0beeddd6b1b1cb41104c4df925323e8fc0437ba8 (Gentoo 14.0.0 p, commit 2250dc0cc8c46999ad1c1d79b9aeed9056459bb6)
```

'gcc -c ./toke.i -O3 -march=znver2' is enough to reproduce.
Comment 1 Sam James 2023-12-25 14:15:58 UTC
Created attachment 56933 [details]
toke.i.xz
Comment 2 Sam James 2023-12-25 15:08:10 UTC
Created attachment 56935 [details]
reduced.i

Reduced.
Comment 3 Tamar Christina 2023-12-26 14:11:34 UTC
This shouldn't have vectorized as outer-loop vectorization isn't supported. In addition switch statements are not supported, but I guess the nested if confused the code doing the analysis.

Will fix when back at work tomorrow.
Comment 4 Tamar Christina 2023-12-26 14:27:38 UTC
Thanks for the report and testcases.
Comment 5 Tamar Christina 2023-12-27 12:04:22 UTC
hmm I seem unable to reproduce this one..

tnfchris@x86_64 ~/gcc-peak> ./install/bin/gcc  -O3 pr113136.c -c -march=znver2
pr113136.c: In function ‘yyl_try_d’:
pr113136.c:16:14: warning: comparison of distinct pointer types lacks a cast [-Wcompare-distinct-pointer-types]
   16 |     while (s < yyl_try_d)
      |              ^
tnfchris@x86_64 ~/gcc-peak> ./install/bin/gcc  -O3 ~/toke.i  -c -march=znver2
tnfchris@x86_64 ~/gcc-peak> cat pr113136.c
enum { XSTATE } S_incline();
char PL_in_eval, PL_parser_6;
int PL_parser_1, PL_parser_0, PL_parser_7;
short PL_parser_8;
_Bool yyl_eol_needs_semicolon() {
  if (PL_parser_6 || PL_in_eval && PL_parser_7 && PL_parser_8)
    if (PL_parser_0 && PL_parser_1)
      return 0;
}
void yyl_try_d(char *s) {
retry:
  switch (*s) {
  case '\n':
    yyl_eol_needs_semicolon(&s);
    s++;
    while (s < yyl_try_d)
      if (*s++)
        S_incline();
    goto retry;
  }
}

tnfchris@x86_64 ~/gcc-peak> ./install/bin/gcc -v
Using built-in specs.
COLLECT_GCC=./install/bin/gcc
COLLECT_LTO_WRAPPER=/home/tnfchris/gcc-peak/install/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/tnfchris/gcc-dsg/configure --prefix=/home/tnfchris/gcc-peak/install --enable-checking=release : (reconfigured) /home/tnfchris/gcc-dsg/configure --prefix=/home/tnfchris/gcc-peak/install --enable-checking=release --enable-languages=c,c++,fortran,lto,objc --no-create --no-recursion
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231227 (experimental) (GCC)
Comment 6 Sam James 2023-12-27 13:37:12 UTC
Can you try with --enable-checking=yes,rtl,extra? I can reproduce this on godbolt too: https://godbolt.org/z/56r9ejMn5.
Comment 7 Tamar Christina 2023-12-27 13:44:12 UTC
(In reply to Sam James from comment #6)
> Can you try with --enable-checking=yes,rtl,extra? I can reproduce this on
> godbolt too: https://godbolt.org/z/56r9ejMn5.

trying with that.. from the dump on godbolt it looks like the live value was materialized in the wrong exit.
Comment 8 Tamar Christina 2023-12-27 14:52:42 UTC
Thanks, was able to reproduce with `--enable-checking=yes,rtl,extra`.

The issue seems to be that the value is unused, and we were relying on DSE removing such statement. but with --enable-checking=yes,rtl,extra the extra verification is done before we can remove the dead statements on these inverted loops.

That's why it doesn't fail without and produces the right code.

So it looks like this is the same bug as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137 and the fix for that should fix this.  It looks like for these inverted loops, even though none of the values are used, I have to maintain the virtual phis.

I'll keep the two separate for now..
Comment 9 Tamar Christina 2023-12-29 21:06:23 UTC
Patch submitted
Comment 10 GCC Commits 2024-01-12 15:32:39 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:411de96dbf2bdafc7a90ebbfc63e68afd6388d29

commit r14-7195-g411de96dbf2bdafc7a90ebbfc63e68afd6388d29
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Fri Jan 12 15:25:34 2024 +0000

    middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
    
    This patch fixes several interconnected issues.
    
    1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
       i.e. we want to pick an exit which we know will iterate at least once.
       However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
       a boolean value.  !niter_desc.may_be_zero is just checking if we have some
       information, not what the information is.  This leads us to pick a more
       difficult to vectorize exit more often than we should.
    
    2. Because we had this bug, we used to pick an alternative exit much more ofthen
       which showed one issue, when the loop accesses memory and we "invert it" we
       would corrupt the VUSE chain.  This is because on an peeled vector iteration
       every exit restarts the loop (i.e. they're all early) BUT since we may have
       performed a store, the vUSE would need to be updated.  This version maintains
       virtual PHIs correctly in these cases.   Note that we can't simply remove all
       of them and recreate them because we need the PHI nodes still in the right
       order for if skip_vector.
    
    3. Since we're moving the stores to a safe location I don't think we actually
       need to analyze whether the store is in range of the memref,  because if we
       ever get there, we know that the loads must be in range, and if the loads are
       in range and we get to the store we know the early breaks were not taken and
       so the scalar loop would have done the VF stores too.
    
    4. Instead of searching for where to move stores to, they should always be in
       exit belonging to the latch.  We can only ever delay stores and even if we
       pick a different exit than the latch one as the main one, effects still
       happen in program order when vectorized.  If we don't move the stores to the
       latch exit but instead to whever we pick as the "main" exit then we can
       perform incorrect memory accesses (luckily these are trapped by verify_ssa).
    
    5. We only used to analyze loads inside the same BB as an early break, and also
       we'd never analyze the ones inside the block where we'd be moving memory
       references to.  This is obviously bogus and to fix it this patch splits apart
       the two constraints.  We first validate that all load memory references are
       in bounds and only after that do we perform the alias checks for the writes.
       This makes the code simpler to understand and more trivially correct.
    
    gcc/ChangeLog:
    
            PR tree-optimization/113137
            PR tree-optimization/113136
            PR tree-optimization/113172
            PR tree-optimization/113178
            * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
            Maintain PHIs on inverted loops.
            (vect_do_peeling): Maintain virtual PHIs on inverted loops.
            * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
            latch.
            (vect_create_loop_vinfo): Record all conds instead of only alt ones.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/113137
            PR tree-optimization/113136
            PR tree-optimization/113172
            PR tree-optimization/113178
            * g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
            * g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
            * gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
            * gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
            * gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
Comment 11 Tamar Christina 2024-01-12 15:35:16 UTC
Fixed. Thanks for the report and let me know if there's something still broken.