struct a { unsigned b : 7; int c; int d; short e; } p, *q = &p; int f, g, h, i, r, s; static short j[8][1][6] = {}; char k[7]; short l, m; int *n; int **o = &n; void t() { for (; f;) ; } static struct a u(int x) { struct a a = {4, 8, 5, 4}; for (; i <= 6; i++) { struct a v = {}; for (; l; l++) h = 0; for (; h >= 0; h--) { j[i]; struct a *w = &p; s = 0; for (; s < 3; s++) { r ^= x; m = j[i][g][h] == (k[g] = g); *w = v; } r = 2; for (; r; r--) *o = &r; } } t(); return a; } int main() { *q = u(636); if (p.b != 4) __builtin_abort (); } The reduced example runs fine if compiled with mainline (currently 53fd7544aff) whereas it fails if compiled with GCC 11 (currently f6306457ee3). The example runs fine with GCC 11, too, if commit d1d01a66012a93cc8cb7dafbe1b5ec453ec96b59 is cherry picked. Can we backport this one?
I thought r12-145 was a missed-optimization fix and such generally undesirable to backport, plus it introduced PR100492 and PR101009 regressions.
Yeah, that wasn't a fix in any sense so the issue is likely still latent. Note there were some store-motion fixes that are not yet backported. Btw, what options do you see the run-fail with? Does it work fine with GCC 10? I fear the failure needs more analysis to pin-point the root cause first.
The problem shows up for option -O1 (options -O{0,2,3} are fine) and GCC 10 and 11 (mainline and GCC 9 are fine).
OK, can you try to pin-point the wrong transform? If GCC 10 is affected then my store-motion comment can be ignored.
Yes, I'm already looking into this.
I've reproduced it with -O1 -march=z14 and it started with r10-7093-g5dc1390b41db5c1765e25fd21dad1a930a015aac So, I think it is much more likely some RA issue or RTL optimization issue and the r12-145 change just makes it latent.
I had a look at the optimized tree output which looks good to me. However, I see that split2 transforms (insn 218 222 114 15 (set (reg/v:TI 10 %r10 [orig:87 a ] [87]) (reg/v:TI 18 %f4 [orig:87 a ] [87])) 1466 {movti} (nil)) into (insn 234 222 235 15 (set (reg:DI 10 %r10 [ a ]) (reg:DI 18 %f4)) 1467 {*movdi_64} (nil)) (insn 235 234 114 15 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87]) (unspec:DI [ (reg:V2DI 18 %f4) (const_int 1 [0x1]) ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di} (nil)) which might be wrong. If I swap r10 by r11 via diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 7faf775fbf2..0319934062a 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -1747,8 +1747,8 @@ (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)] UNSPEC_VEC_EXTRACT))] { - operands[2] = operand_subword (operands[0], 0, 0, TImode); - operands[3] = operand_subword (operands[0], 1, 0, TImode); + operands[2] = operand_subword (operands[0], 1, 0, TImode); + operands[3] = operand_subword (operands[0], 0, 0, TImode); operands[4] = gen_rtx_REG (DImode, REGNO (operands[1])); operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1])); }) then the compiled program just runs fine. However, I'm not sure whether this fixes the problem or just the symptoms. I will come back to this tomorrow.
Pass split2 transforms (insn 218 222 114 15 (set (reg/v:TI 10 %r10 [orig:87 a ] [87]) (reg/v:TI 18 %f4 [orig:87 a ] [87])) 1466 {movti} (nil)) into (insn 234 222 235 15 (set (reg:DI 10 %r10 [ a ]) (reg:DI 18 %f4)) 1467 {*movdi_64} (nil)) (insn 235 234 114 15 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87]) (unspec:DI [ (reg:V2DI 18 %f4) (const_int 1 [0x1]) ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di} (nil)) which is then transformed by cprop_hardreg into (insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ]) (reg:DI 11 %r11 [18])) 1467 {*movdi_64} (expr_list:REG_DEAD (reg:DI 11 %r11 [18]) (nil))) (insn 235 234 114 14 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87]) (unspec:DI [ (reg:V2DI 18 %f4) (const_int 1 [0x1]) ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di} (expr_list:REG_DEAD (reg:V2DI 18 %f4) (nil))) where in insn 234 register f4 is substituted by r11 which is wrong. This can also be observed in the final assembler output: vlvgp %v4,%r10,%r11 l %r2,12(%r1) ahi %r2,-1 st %r2,12(%r1) cijhe %r2,0,.L13 lgr %r10,%r11 // (*) vlgvg %r11,%v4,1 Registers r10 and r11 are moved into v4. The inverse move from v4 into r10 and r11 is broken since cprop_hardreg wrongly substitutes f4 by r11. Thus the expected output for (*) is: lgdr %r10,%f4
Not really fixed on trunk, we just don't have a testcase there.
In regcprop we call find_oldest_value_reg which itself calls maybe_mode_change (TImode, TImode, DImode, 10, 18) where we have regno += subreg_regno_offset (regno, orig_mode, offset, new_mode); The call is made where offset equals 8 which is wrong since we are interested in the high part which is contained in r10 and not r11. The following patch fixes this: diff --git a/gcc/regcprop.c b/gcc/regcprop.c index d2a01130fe1..0e1ac12458a 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode, copy_nregs, &bytes_per_reg)) return NULL_RTX; poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs); - poly_uint64 offset - = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset, - GET_MODE_SIZE (orig_mode)); + poly_uint64 offset = +#if WORDS_BIG_ENDIAN + subreg_size_highpart_offset +#else + subreg_size_lowpart_offset +#endif + (GET_MODE_SIZE (new_mode) + copy_offset, + GET_MODE_SIZE (orig_mode)); regno += subreg_regno_offset (regno, orig_mode, offset, new_mode); if (targetm.hard_regno_mode_ok (regno, new_mode)) return gen_raw_REG (new_mode, regno); With the patch (insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ]) (reg:DI 18 %f4)) 1376 {*movdi_64} (nil)) is first modified into a noop (insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ]) (reg:DI 10 %r10 [18])) 1376 {*movdi_64} (nil)) and then deleted within regcprop.
*** Bug 104034 has been marked as a duplicate of this bug. ***
A patch was submitted here: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581172.html
(In reply to Andrew Pinski from comment #12) > A patch was submitted here: > https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581172.html Another patch was submitted here too: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html
Hmm, PR 98694 seems related too.
And PR 100342.
I wonder if makes sense to create either a RTL testcase which fails on s390 still (or did in GCC 10)?
The master branch has been updated by Andreas Krebbel <krebbel@gcc.gnu.org>: https://gcc.gnu.org/g:b9ebf6c330e24e886e7ce148e8c680c3e06c24dc commit r12-6960-gb9ebf6c330e24e886e7ce148e8c680c3e06c24dc Author: Andreas Krebbel <krebbel@linux.ibm.com> Date: Tue Feb 1 13:33:55 2022 +0100 PR101260 regcprop: Add mode change check for copy reg When propagating a multi-word register into an access with a smaller mode the can_change_mode backend hook is already consulted for the original register. This however is also required for the intermediate copy in copy_regno which might use a different register class. gcc/ChangeLog: PR rtl-optimization/101260 * regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for copy_regno. gcc/testsuite/ChangeLog: PR rtl-optimization/101260 * gcc.target/s390/pr101260.c: New testcase.
Fixed for 12 and mainline.
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.