Bug 101260 - [11 Regression] regcprop: Determine subreg offset depending on endianness
Summary: [11 Regression] regcprop: Determine subreg offset depending on endianness
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 11.0
: P2 normal
Target Milestone: 11.5
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 104034 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-06-29 15:35 UTC by Stefan Schulze Frielinghaus
Modified: 2023-07-07 10:40 UTC (History)
4 users (show)

See Also:
Host:
Target: s390*-*-*
Build:
Known to work:
Known to fail: 10.3.1, 11.1.1
Last reconfirmed: 2021-06-30 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Schulze Frielinghaus 2021-06-29 15:35:32 UTC
struct a {
  unsigned b : 7;
  int c;
  int d;
  short e;
} p, *q = &p;
int f, g, h, i, r, s;
static short j[8][1][6] = {};
char k[7];
short l, m;
int *n;
int **o = &n;
void t() {
  for (; f;)
    ;
}
static struct a u(int x) {
  struct a a = {4, 8, 5, 4};
  for (; i <= 6; i++) {
    struct a v = {};
    for (; l; l++)
      h = 0;
    for (; h >= 0; h--) {
      j[i];
      struct a *w = &p;
      s = 0;
      for (; s < 3; s++) {
        r ^= x;
        m = j[i][g][h] == (k[g] = g);
        *w = v;
      }
      r = 2;
      for (; r; r--)
        *o = &r;
    }
  }
  t();
  return a;
}
int main() {
  *q = u(636);
  if (p.b != 4)
    __builtin_abort ();
}

The reduced example runs fine if compiled with mainline (currently 53fd7544aff) whereas it fails if compiled with GCC 11 (currently f6306457ee3). The example runs fine with GCC 11, too, if commit d1d01a66012a93cc8cb7dafbe1b5ec453ec96b59 is cherry picked. Can we backport this one?
Comment 1 Jakub Jelinek 2021-06-29 23:08:32 UTC
I thought r12-145 was a missed-optimization fix and such generally undesirable to backport, plus it introduced PR100492 and PR101009 regressions.
Comment 2 Richard Biener 2021-06-30 06:33:40 UTC
Yeah, that wasn't a fix in any sense so the issue is likely still latent.  Note there were some store-motion fixes that are not yet backported.

Btw, what options do you see the run-fail with?  Does it work fine with GCC 10?

I fear the failure needs more analysis to pin-point the root cause first.
Comment 3 Stefan Schulze Frielinghaus 2021-06-30 07:41:18 UTC
The problem shows up for option -O1 (options -O{0,2,3} are fine) and GCC 10 and 11 (mainline and GCC 9 are fine).
Comment 4 Richard Biener 2021-06-30 07:58:40 UTC
OK, can you try to pin-point the wrong transform?  If GCC 10 is affected then my store-motion comment can be ignored.
Comment 5 Stefan Schulze Frielinghaus 2021-06-30 14:08:58 UTC
Yes, I'm already looking into this.
Comment 6 Jakub Jelinek 2021-06-30 17:23:35 UTC
I've reproduced it with -O1 -march=z14 and it started with
r10-7093-g5dc1390b41db5c1765e25fd21dad1a930a015aac
So, I think it is much more likely some RA issue or RTL optimization issue and the r12-145 change just makes it latent.
Comment 7 Stefan Schulze Frielinghaus 2021-06-30 19:04:34 UTC
I had a look at the optimized tree output which looks good to me. However, I see that split2 transforms

(insn 218 222 114 15 (set (reg/v:TI 10 %r10 [orig:87 a ] [87])
        (reg/v:TI 18 %f4 [orig:87 a ] [87])) 1466 {movti}
     (nil))

into

(insn 234 222 235 15 (set (reg:DI 10 %r10 [ a ])
        (reg:DI 18 %f4)) 1467 {*movdi_64}
     (nil))
(insn 235 234 114 15 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87])
        (unspec:DI [
                (reg:V2DI 18 %f4)
                (const_int 1 [0x1])
            ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di}
     (nil))

which might be wrong. If I swap r10 by r11 via

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 7faf775fbf2..0319934062a 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1747,8 +1747,8 @@
    (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)]
                                 UNSPEC_VEC_EXTRACT))]
 {
-  operands[2] = operand_subword (operands[0], 0, 0, TImode);
-  operands[3] = operand_subword (operands[0], 1, 0, TImode);
+  operands[2] = operand_subword (operands[0], 1, 0, TImode);
+  operands[3] = operand_subword (operands[0], 0, 0, TImode);
   operands[4] = gen_rtx_REG (DImode, REGNO (operands[1]));
   operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1]));
 })

then the compiled program just runs fine. However, I'm not sure whether this fixes the problem or just the symptoms. I will come back to this tomorrow.
Comment 8 Stefan Schulze Frielinghaus 2021-07-01 16:58:57 UTC
Pass split2 transforms

(insn 218 222 114 15 (set (reg/v:TI 10 %r10 [orig:87 a ] [87])
        (reg/v:TI 18 %f4 [orig:87 a ] [87])) 1466 {movti}
     (nil))

into

(insn 234 222 235 15 (set (reg:DI 10 %r10 [ a ])
        (reg:DI 18 %f4)) 1467 {*movdi_64}
     (nil))
(insn 235 234 114 15 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87])
        (unspec:DI [
                (reg:V2DI 18 %f4)
                (const_int 1 [0x1])
            ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di}
     (nil))

which is then transformed by cprop_hardreg into

(insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ])
        (reg:DI 11 %r11 [18])) 1467 {*movdi_64}
     (expr_list:REG_DEAD (reg:DI 11 %r11 [18])                                                                                                                                                        (nil)))
(insn 235 234 114 14 (set (reg:DI 11 %r11 [orig:87 a+8 ] [87])
        (unspec:DI [
                (reg:V2DI 18 %f4)
                (const_int 1 [0x1])
            ] UNSPEC_VEC_EXTRACT)) 495 {*vec_extractv2di}
     (expr_list:REG_DEAD (reg:V2DI 18 %f4)
        (nil)))

where in insn 234 register f4 is substituted by r11 which is wrong. This can also be observed in the final assembler output:

vlvgp   %v4,%r10,%r11
l       %r2,12(%r1)
ahi     %r2,-1
st      %r2,12(%r1)
cijhe   %r2,0,.L13
lgr     %r10,%r11   // (*)
vlgvg   %r11,%v4,1

Registers r10 and r11 are moved into v4. The inverse move from v4 into r10 and r11 is broken since cprop_hardreg wrongly substitutes f4 by r11. Thus the expected output for (*) is:

lgdr    %r10,%f4
Comment 9 Richard Biener 2021-07-19 12:02:43 UTC
Not really fixed on trunk, we just don't have a testcase there.
Comment 10 Stefan Schulze Frielinghaus 2021-08-06 13:59:45 UTC
In regcprop we call find_oldest_value_reg which itself calls

  maybe_mode_change (TImode, TImode, DImode, 10, 18)

where we have

  regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);

The call is made where offset equals 8 which is wrong since we are interested in the high part which is contained in r10 and not r11.  The following patch fixes this:

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index d2a01130fe1..0e1ac12458a 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
                            copy_nregs, &bytes_per_reg))
        return NULL_RTX;
       poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs);
-      poly_uint64 offset
-       = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset,
-                                     GET_MODE_SIZE (orig_mode));
+      poly_uint64 offset =
+#if WORDS_BIG_ENDIAN
+       subreg_size_highpart_offset
+#else
+       subreg_size_lowpart_offset
+#endif
+                                   (GET_MODE_SIZE (new_mode) + copy_offset,
+                                    GET_MODE_SIZE (orig_mode));
       regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);
       if (targetm.hard_regno_mode_ok (regno, new_mode))
        return gen_raw_REG (new_mode, regno);

With the patch

(insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ])
        (reg:DI 18 %f4)) 1376 {*movdi_64}
     (nil))

is first modified into a noop

(insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ])
        (reg:DI 10 %r10 [18])) 1376 {*movdi_64}
     (nil))

and then deleted within regcprop.
Comment 11 Andrew Pinski 2022-01-14 19:24:48 UTC
*** Bug 104034 has been marked as a duplicate of this bug. ***
Comment 12 Andrew Pinski 2022-01-14 19:26:17 UTC
A patch was submitted here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581172.html
Comment 13 Andrew Pinski 2022-01-14 19:27:43 UTC
(In reply to Andrew Pinski from comment #12)
> A patch was submitted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581172.html

Another patch was submitted here too:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588407.html
Comment 14 Andrew Pinski 2022-01-14 19:32:09 UTC
Hmm, PR 98694 seems related too.
Comment 15 Andrew Pinski 2022-01-14 19:33:06 UTC
And PR 100342.
Comment 16 Andrew Pinski 2022-01-14 19:37:29 UTC
I wonder if makes sense to create either a RTL testcase which fails on s390 still (or did in GCC 10)?
Comment 17 CVS Commits 2022-02-01 12:35:15 UTC
The master branch has been updated by Andreas Krebbel <krebbel@gcc.gnu.org>:

https://gcc.gnu.org/g:b9ebf6c330e24e886e7ce148e8c680c3e06c24dc

commit r12-6960-gb9ebf6c330e24e886e7ce148e8c680c3e06c24dc
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Tue Feb 1 13:33:55 2022 +0100

    PR101260 regcprop: Add mode change check for copy reg
    
    When propagating a multi-word register into an access with a smaller
    mode the can_change_mode backend hook is already consulted for the
    original register.  This however is also required for the intermediate
    copy in copy_regno which might use a different register class.
    
    gcc/ChangeLog:
    
            PR rtl-optimization/101260
            * regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for
            copy_regno.
    
    gcc/testsuite/ChangeLog:
    
            PR rtl-optimization/101260
            * gcc.target/s390/pr101260.c: New testcase.
Comment 18 Stefan Schulze Frielinghaus 2022-06-09 08:53:15 UTC
Fixed for 12 and mainline.
Comment 19 Jakub Jelinek 2022-06-28 10:45:34 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 20 Richard Biener 2023-07-07 10:40:18 UTC
GCC 10 branch is being closed.