Bug 113137 - [14 regression] Failed bootstrap with -O3 -march=znver2
Summary: [14 regression] Failed bootstrap with -O3 -march=znver2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 14.0
: P1 normal
Target Milestone: 14.0
Assignee: Tamar Christina
URL:
Keywords:
: 113135 113139 113146 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-12-25 16:16 UTC by Sam James
Modified: 2024-01-13 00:05 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-12-26 00:00:00


Attachments
c-cppbuiltin.ii.xz (402.83 KB, application/x-xz)
2023-12-25 16:16 UTC, Sam James
Details
reduced.ii (219 bytes, text/plain)
2023-12-25 16:43 UTC, Sam James
Details
maintain-lcssa-peeled.patch (4.69 KB, patch)
2023-12-29 12:16 UTC, Tamar Christina
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James 2023-12-25 16:16:28 UTC
Created attachment 56936 [details]
c-cppbuiltin.ii.xz

```
# /tmp/gcc/bin/g++ -c c-cppbuiltin.ii -O3 -march=znver2
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20231224/work/gcc-14-20231224/gcc/c-family/c-cppbuiltin.cc: In function 'void builtin_define_type_minmax(const char*, const char*, tree)':
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20231224/work/gcc-14-20231224/gcc/c-family/c-cppbuiltin.cc:2043:1: error: PHI node with wrong VUSE on edge from BB 12
 2043 | builtin_define_type_minmax (const char *min_macro, const char *max_macro,
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~
.MEM_218 = PHI <.MEM_59(12)>
expected .MEM_191
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20231224/work/gcc-14-20231224/gcc/c-family/c-cppbuiltin.cc:2043:1: error: multiple virtual PHI nodes in BB 32
.MEM_99 = PHI <.MEM_191(14)>
.MEM_217 = PHI <.MEM_59(14)>
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20231224/work/gcc-14-20231224/gcc/c-family/c-cppbuiltin.cc:2043:1: error: PHI node with wrong VUSE on edge from BB 32
.MEM_103 = PHI <.MEM_217(32), .MEM_218(37)>
expected .MEM_99
during GIMPLE pass: vect
/var/tmp/portage/sys-devel/gcc-14.0.0_pre20231224/work/gcc-14-20231224/gcc/c-family/c-cppbuiltin.cc:2043:1: internal compiler error: verify_ssa failed
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://gcc.gnu.org/bugs/> for instructions.
```

```
Using built-in specs.
COLLECT_GCC=/tmp/gcc/bin/g++
COLLECT_LTO_WRAPPER=/tmp/gcc/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/sam/git/gcc/configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --disable-analyzer --disable-cet --disable-default-pie --disable-default-ssp --disable-gcov --disable-libada --disable-libatomic --disable-libgomp --disable-libitm --disable-libquadmath --disable-libsanitizer --disable-libssp --disable-libstdcxx-pch --disable-libvtv --disable-lto --disable-multilib --disable-nls --disable-objc-gc --disable-systemtap --disable-werror --enable-languages=c,c++ --prefix=/tmp/gcc --with-checking=yes,extra,rtl --without-libatomic --without-libbacktrace --without-isl --without-zstd --with-system-zlib --enable-lto --with-build-config=bootstrap-lto
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20231225 (experimental) (GCC)
```
Comment 1 Sam James 2023-12-25 16:43:00 UTC
Created attachment 56937 [details]
reduced.ii
Comment 2 Tamar Christina 2023-12-26 14:05:54 UTC
*** Bug 113146 has been marked as a duplicate of this bug. ***
Comment 3 Tamar Christina 2023-12-26 14:06:14 UTC
*** Bug 113139 has been marked as a duplicate of this bug. ***
Comment 4 Tamar Christina 2023-12-26 14:06:52 UTC
*** Bug 113135 has been marked as a duplicate of this bug. ***
Comment 5 Tamar Christina 2023-12-26 14:23:11 UTC
Simpler reproducer:

int b;
void a() __attribute__((__noreturn__));
void c() {
  char *buf;
  int bufsz = 64;
  while (b) {
    !bufsz ? a(), 0 : *buf++ = bufsz--;
    b -= 4;
  }
}

The loop has an inverted control flow that accesses memory.
The testcase doesn't seem to have existing cases for these, but due to the inverted nature the loop's main exit is before the latch exit which we now consider the early exit.

because we only analyze early exits we miss that there's a memory reference that needs to be moved, and because it wasn't moved the vUSEs don't line up.

will fix tomorrow when back at work.
Comment 6 Tamar Christina 2023-12-26 14:27:11 UTC
Thanks for the report and testcases.
Comment 7 Tamar Christina 2023-12-27 12:08:48 UTC
Have update the memory analysis part to support inverted loops. now working on wiring through virtual phis during peeling.

Aside form this missing part CFG looks correct. will  have a final patch in a bit.
Comment 8 David Binderman 2023-12-27 21:13:30 UTC
I see this one also in a build of package LFSC.

$ ~/gcc/results/bin/gcc -c -w -O3 bug992.cc
foundBugs $ ~/gcc/results/bin/gcc -c -w -O3 -march=znver2 bug992.cc 
/home/dcb38/rpmbuild/BUILD/LFSC-bbc1798864dbc328092356d4c01f02ddc39ea6bd/src/code.cpp: In function ‘Expr* read_code()’:
/home/dcb38/rpmbuild/BUILD/LFSC-bbc1798864dbc328092356d4c01f02ddc39ea6bd/src/code.cpp:112:7: error: PHI node with wrong VUSE on edge from BB 114
  112 | Expr *read_code()
      |       ^~~~~~~~~
.MEM_824 = PHI <.MEM_1387(114)>
expected .MEM_1042
Comment 9 Tamar Christina 2023-12-28 14:27:08 UTC
Ok, have a working patch but it's a bit ugly, working on cleaning it up.
Comment 10 Tamar Christina 2023-12-28 18:08:10 UTC
Ok, so this bug is simply fixed by:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f51ae3e719e..e7a5917bc4c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -976,7 +976,8 @@ vec_init_loop_exit_info (class loop *loop)
       if (number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
          && !chrec_contains_undetermined (niter_desc.niter))
        {
-         if (!niter_desc.may_be_zero || !candidate)
+         tree may_be_zero = niter_desc.may_be_zero;
+         if ((may_be_zero && integer_zerop (may_be_zero)) || !candidate)
            candidate = exit;
        }
     }

because niter_desc.may_be_zero is not a boolean but instead a tree that encodes a boolean.

Due to this we were forcing much more complicated loops than required.  However we *should* be able to handle these complicated loops since we don't know when they'll occur.. so I'll post a companion patch to fix those too.
Comment 11 Tamar Christina 2023-12-29 12:16:31 UTC
Created attachment 56963 [details]
maintain-lcssa-peeled.patch

patch undergoing testing for both this and PR113136
Comment 12 Tamar Christina 2023-12-29 13:42:46 UTC
ok, x86_64 bootstrap and regtest with -O3 and --enable-checking=yes,rtl,extra now passes.

aarch64 hit a small issue in libgcc that I'm not sure I should be allowing or not. will investigate and either fix or disable and post patches.
Comment 13 Tamar Christina 2023-12-29 21:06:13 UTC
Patch submitted
Comment 14 David Binderman 2024-01-12 09:00:20 UTC
(In reply to Tamar Christina from comment #13)
> Patch submitted

Two weeks have elapsed and the patch doesn't seem to appear in git.

Is it perhaps stuck somewhere ?
Comment 15 Tamar Christina 2024-01-12 11:09:21 UTC
(In reply to David Binderman from comment #14)
> (In reply to Tamar Christina from comment #13)
> > Patch submitted
> 
> Two weeks have elapsed and the patch doesn't seem to appear in git.
> 
> Is it perhaps stuck somewhere ?

maintainers were on holiday till this week.  Everything's been approved now and making the final changes maintainers wanted and will regtest on various architectures.

I expect to commit the patches sometime today.  Sorry for the delay.
Comment 16 GCC Commits 2024-01-12 15:32:39 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:411de96dbf2bdafc7a90ebbfc63e68afd6388d29

commit r14-7195-g411de96dbf2bdafc7a90ebbfc63e68afd6388d29
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Fri Jan 12 15:25:34 2024 +0000

    middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
    
    This patch fixes several interconnected issues.
    
    1. When picking an exit we wanted to check for niter_desc.may_be_zero not true.
       i.e. we want to pick an exit which we know will iterate at least once.
       However niter_desc.may_be_zero is not a boolean.  It is a tree that encodes
       a boolean value.  !niter_desc.may_be_zero is just checking if we have some
       information, not what the information is.  This leads us to pick a more
       difficult to vectorize exit more often than we should.
    
    2. Because we had this bug, we used to pick an alternative exit much more ofthen
       which showed one issue, when the loop accesses memory and we "invert it" we
       would corrupt the VUSE chain.  This is because on an peeled vector iteration
       every exit restarts the loop (i.e. they're all early) BUT since we may have
       performed a store, the vUSE would need to be updated.  This version maintains
       virtual PHIs correctly in these cases.   Note that we can't simply remove all
       of them and recreate them because we need the PHI nodes still in the right
       order for if skip_vector.
    
    3. Since we're moving the stores to a safe location I don't think we actually
       need to analyze whether the store is in range of the memref,  because if we
       ever get there, we know that the loads must be in range, and if the loads are
       in range and we get to the store we know the early breaks were not taken and
       so the scalar loop would have done the VF stores too.
    
    4. Instead of searching for where to move stores to, they should always be in
       exit belonging to the latch.  We can only ever delay stores and even if we
       pick a different exit than the latch one as the main one, effects still
       happen in program order when vectorized.  If we don't move the stores to the
       latch exit but instead to whever we pick as the "main" exit then we can
       perform incorrect memory accesses (luckily these are trapped by verify_ssa).
    
    5. We only used to analyze loads inside the same BB as an early break, and also
       we'd never analyze the ones inside the block where we'd be moving memory
       references to.  This is obviously bogus and to fix it this patch splits apart
       the two constraints.  We first validate that all load memory references are
       in bounds and only after that do we perform the alias checks for the writes.
       This makes the code simpler to understand and more trivially correct.
    
    gcc/ChangeLog:
    
            PR tree-optimization/113137
            PR tree-optimization/113136
            PR tree-optimization/113172
            PR tree-optimization/113178
            * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
            Maintain PHIs on inverted loops.
            (vect_do_peeling): Maintain virtual PHIs on inverted loops.
            * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to
            latch.
            (vect_create_loop_vinfo): Record all conds instead of only alt ones.
    
    gcc/testsuite/ChangeLog:
    
            PR tree-optimization/113137
            PR tree-optimization/113136
            PR tree-optimization/113172
            PR tree-optimization/113178
            * g++.dg/vect/vect-early-break_4-pr113137.cc: New test.
            * g++.dg/vect/vect-early-break_5-pr113137.cc: New test.
            * gcc.dg/vect/vect-early-break_95-pr113137.c: New test.
            * gcc.dg/vect/vect-early-break_96-pr113136.c: New test.
            * gcc.dg/vect/vect-early-break_97-pr113172.c: New test.
Comment 17 Tamar Christina 2024-01-12 15:35:48 UTC
Fixed. Thanks for the report and let me know if there's something still broken.