With commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7 "Add a late-combine pass [PR106594]", I see for GCN target testing (tested '-march=gfx908') regress for both C, C++: @@ -191300,7 +191300,7 @@ PASS: c-c++-common/torture/builtin-arith-overflow-6.c -O0 (test for excess er PASS: c-c++-common/torture/builtin-arith-overflow-6.c -O0 execution test UNSUPPORTED: c-c++-common/torture/builtin-arith-overflow-6.c -O1 PASS: c-c++-common/torture/builtin-arith-overflow-6.c -O2 (test for excess errors) [-PASS:-]{+FAIL:+} c-c++-common/torture/builtin-arith-overflow-6.c -O2 execution test UNSUPPORTED: c-c++-common/torture/builtin-arith-overflow-6.c -O3 -g UNSUPPORTED: c-c++-common/torture/builtin-arith-overflow-6.c -Os spawn -ignore SIGHUP [...]/build-gcc/gcc/gcn-run ./builtin-arith-overflow-6.exe GCN Kernel Aborted Kernel aborted FAIL: c-c++-common/torture/builtin-arith-overflow-6.c -O2 execution test With '-fno-late-combine-instructions', it's back to PASS. The diff between good ('-fno-late-combine-instructions') vs. bad ('-flate-combine-instructions') of 'builtin-arith-overflow-6.s' as well as '-fdump-rtl-all' is big, so I'm not able to directly pinpoint one specific issue. I however do observe a number of instances as follows (good vs. bad): [...] s_mov_b64 exec, -1 [...] - s_mov_b32 s12, 0 - v_writelane_b32 v0, s12, 0 s_mov_b64 exec, 1 + v_mov_b32 v0, 0 flat_store_dword v[18:19], v0 [...] Might that "move across 'exec'" be in error?
It was writing 0 to s12 (scalar register) and then moving the zero to lane zero of v0 (vector register). Now it's writing the 0 directly to v0, of which all but lane zero is masked. These should be identical (unless s12 was also live). The problem must be elsewhere.
I suppose for issues like this, it would be useful to have a debug counter to bisect on. I'll post a patch for doing that today, but I'm afraid I'll be relying on someone with gcn access to actually do the bisection.
I've now pushed a debug counter for late_combine. Sorry to ask, but could you bisect on N in -fdbg-cnt=late_combine:N to see which transformation is causing the problem?
Created attachment 58513 [details] A patch for a bug seen on arm*-*-* Also, could you check whether the attached patch makes any difference? It fixes a problem seen on arm*-*-*, and I notice GCN also defines cannot_copy_insn_p.