The following testcase manually reduced from git's diff.c is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13 starting with r8-1288-ge701e0b9e8c74e5e93764556a447ea334e4e9389 . git itself started to be miscompiled with r10-4651-gafeb887562af17ea235fbec650ff6d16c412682a (r10-4650 fails to build), but that was just (apparently quite significant) change in inlining behavior (Martin L., was that intended?). struct T { int b; int c; unsigned short d; unsigned e:1, f:1, g:1, h:2, i:1, j:1; signed int k:2; }; struct S { struct T s; char c[64]; } buf[2]; __attribute__ ((noinline, noclone, noipa)) void * baz (void) { static int cnt; return (void *) &buf[cnt++]; } static inline __attribute__ ((always_inline)) struct T * bar (const char *a) { struct T *s; s = baz (); s->b = 1; s->k = -1; return s; } __attribute__ ((noinline, noclone, noipa)) void foo (const char *x, struct T **y) { struct T *l = bar (x); struct T *m = bar (x); y[0] = l; y[1] = m; } int main () { struct T *r[2]; foo ("foo", r); if (r[0]->e || r[0]->f || r[0]->g || r[0]->h || r[0]->i || r[0]->j || r[0]->k != -1) __builtin_abort (); if (r[1]->e || r[1]->f || r[1]->g || r[1]->h || r[1]->i || r[1]->j || r[1]->k != -1) __builtin_abort (); return 0; } The first s->k = -1; bitfield store is always done through: (insn 14 13 15 2 (parallel [ (set (mem/j:HI (plus:DI (reg/v/f:DI 61 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 61 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 384 [0x180]))) (clobber (reg:CC 33 %cc)) ]) "diff2.i":29 1733 {*iorhi3_zarch} (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil))) but the second one changed like: (insn 22 21 23 2 (parallel [ (set (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) - (const_int 384 [0x180]))) + (const_int -128 [0xffffffffffffff80]))) (clobber (reg:CC 33 %cc)) ]) "diff2.i":29 1733 {*iorhi3_zarch} (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil))) which means it sets to all ones various other bitfields.
I'd say the bug is in the way how we try to lower the zero_extract store: Trying 19, 13, 21 -> 22: Failed to match this instruction: (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 2 [0x2]) (const_int 7 [0x7])) (const_int -1 [0xffffffffffffffff])) Successfully matched this instruction: (set (reg:HI 67) (const_int -128 [0xffffffffffffff80])) Successfully matched this instruction: (set (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (reg:HI 67))) I'd say storing a -1 into a zero_extract is fine, at least IMHO we shouldn't rely on it being zero extended.
Created attachment 47900 [details] gcc10-pr93908.patch Untested fix.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:73dc4ae47418aef2eb470b8f71cef57dce37349e commit r10-6844-g73dc4ae47418aef2eb470b8f71cef57dce37349e Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Feb 25 13:56:47 2020 +0100 combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908] git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13. I've managed to reduce it into the following testcase. The problem is that during combine we see the s->k = -1; bitfield store and change the SET_SRC from a pseudo into a constant: (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 2 [0x2]) (const_int 7 [0x7])) (const_int -1 [0xffffffffffffffff])) This on s390x with the above option isn't recognized as valid instruction, so find_split_point decides to handle it as IOR or IOR/AND. src is -1, mask is 3 and pos is 7. src != mask (this is also incorrect, we want to set all (both) bits in the bitfield), so we go for IOR/AND, but instead of trying mem = (mem & ~0x180) | ((-1 << 7) & 0x180) we actually try mem = (mem & ~0x180) | (-1 << 7) and that is further simplified into: mem = mem | (-1 << 7) aka mem = mem | 0xff80 which doesn't set just the 2-bit bitfield, but also many other bitfields that shouldn't be touched. We really should do: mem = mem | 0x180 instead. The problem is that we assume that no bits but those low len (2 here) will be set in the SET_SRC, but there is nothing that can prevent that, we just should ignore the other bits. The following patch fixes it by masking src with mask, this way already the src == mask test will DTRT, and as the code for or_mask uses gen_int_mode, if the most significant bit is set after shifting it left by pos, it will be properly sign-extended. 2020-02-25 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/93908 * combine.c (find_split_point): For store into ZERO_EXTRACT, and src with mask. * gcc.c-torture/execute/pr93908.c: New test.
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:eeb31391b7f223e6ca8cbd4452b99b55f4afdb1c commit r9-8276-geeb31391b7f223e6ca8cbd4452b99b55f4afdb1c Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Feb 25 13:56:47 2020 +0100 combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908] git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13. I've managed to reduce it into the following testcase. The problem is that during combine we see the s->k = -1; bitfield store and change the SET_SRC from a pseudo into a constant: (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 2 [0x2]) (const_int 7 [0x7])) (const_int -1 [0xffffffffffffffff])) This on s390x with the above option isn't recognized as valid instruction, so find_split_point decides to handle it as IOR or IOR/AND. src is -1, mask is 3 and pos is 7. src != mask (this is also incorrect, we want to set all (both) bits in the bitfield), so we go for IOR/AND, but instead of trying mem = (mem & ~0x180) | ((-1 << 7) & 0x180) we actually try mem = (mem & ~0x180) | (-1 << 7) and that is further simplified into: mem = mem | (-1 << 7) aka mem = mem | 0xff80 which doesn't set just the 2-bit bitfield, but also many other bitfields that shouldn't be touched. We really should do: mem = mem | 0x180 instead. The problem is that we assume that no bits but those low len (2 here) will be set in the SET_SRC, but there is nothing that can prevent that, we just should ignore the other bits. The following patch fixes it by masking src with mask, this way already the src == mask test will DTRT, and as the code for or_mask uses gen_int_mode, if the most significant bit is set after shifting it left by pos, it will be properly sign-extended. 2020-02-25 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/93908 * combine.c (find_split_point): For store into ZERO_EXTRACT, and src with mask. * gcc.c-torture/execute/pr93908.c: New test.
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:d9814ee5972b48eb67a7af171957d1a285453f46 commit r8-10063-gd9814ee5972b48eb67a7af171957d1a285453f46 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Feb 25 13:56:47 2020 +0100 combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908] git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13. I've managed to reduce it into the following testcase. The problem is that during combine we see the s->k = -1; bitfield store and change the SET_SRC from a pseudo into a constant: (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ]) (const_int 10 [0xa])) [0 +0 S2 A16]) (const_int 2 [0x2]) (const_int 7 [0x7])) (const_int -1 [0xffffffffffffffff])) This on s390x with the above option isn't recognized as valid instruction, so find_split_point decides to handle it as IOR or IOR/AND. src is -1, mask is 3 and pos is 7. src != mask (this is also incorrect, we want to set all (both) bits in the bitfield), so we go for IOR/AND, but instead of trying mem = (mem & ~0x180) | ((-1 << 7) & 0x180) we actually try mem = (mem & ~0x180) | (-1 << 7) and that is further simplified into: mem = mem | (-1 << 7) aka mem = mem | 0xff80 which doesn't set just the 2-bit bitfield, but also many other bitfields that shouldn't be touched. We really should do: mem = mem | 0x180 instead. The problem is that we assume that no bits but those low len (2 here) will be set in the SET_SRC, but there is nothing that can prevent that, we just should ignore the other bits. The following patch fixes it by masking src with mask, this way already the src == mask test will DTRT, and as the code for or_mask uses gen_int_mode, if the most significant bit is set after shifting it left by pos, it will be properly sign-extended. 2020-02-25 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/93908 * combine.c (find_split_point): For store into ZERO_EXTRACT, and src with mask. * gcc.c-torture/execute/pr93908.c: New test.
Fixed.