Bug 93908 - [8/9/10 Regression] git miscompilation on s390x-linux with -O2 -march=zEC12 -mtune=z13 starting with r8-1288
Summary: [8/9/10 Regression] git miscompilation on s390x-linux with -O2 -march=zEC12 -...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.2.1
: P2 normal
Target Milestone: 8.4
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2020-02-24 16:46 UTC by Jakub Jelinek
Modified: 2020-02-25 13:10 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-02-24 00:00:00


Attachments
gcc10-pr93908.patch (969 bytes, patch)
2020-02-24 17:42 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2020-02-24 16:46:41 UTC
The following testcase manually reduced from git's diff.c is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13 starting with
r8-1288-ge701e0b9e8c74e5e93764556a447ea334e4e9389 .  git itself started to be miscompiled with r10-4651-gafeb887562af17ea235fbec650ff6d16c412682a (r10-4650 fails to build), but that was just (apparently quite significant) change in inlining behavior (Martin L., was that intended?).

struct T
{
  int b;
  int c;
  unsigned short d;
  unsigned e:1, f:1, g:1, h:2, i:1, j:1;
  signed int k:2;
};

struct S
{
  struct T s;
  char c[64];
} buf[2];

__attribute__ ((noinline, noclone, noipa)) void *
baz (void)
{
  static int cnt;
  return (void *) &buf[cnt++];
}

static inline __attribute__ ((always_inline)) struct T *
bar (const char *a)
{
  struct T *s;
  s = baz ();
  s->b = 1;
  s->k = -1;
  return s;
}

__attribute__ ((noinline, noclone, noipa)) void
foo (const char *x, struct T **y)
{
  struct T *l = bar (x);
  struct T *m = bar (x);
  y[0] = l;
  y[1] = m;
}

int
main ()
{
  struct T *r[2];
  foo ("foo", r);
  if (r[0]->e || r[0]->f || r[0]->g || r[0]->h || r[0]->i || r[0]->j || r[0]->k != -1)
    __builtin_abort ();
  if (r[1]->e || r[1]->f || r[1]->g || r[1]->h || r[1]->i || r[1]->j || r[1]->k != -1)
    __builtin_abort ();
  return 0;
}

The first s->k = -1; bitfield store is always done through:
 (insn 14 13 15 2 (parallel [
             (set (mem/j:HI (plus:DI (reg/v/f:DI 61 [ s ])
                         (const_int 10 [0xa])) [0 +0 S2 A16])
                 (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 61 [ s ])
                             (const_int 10 [0xa])) [0 +0 S2 A16])
                     (const_int 384 [0x180])))
             (clobber (reg:CC 33 %cc))
         ]) "diff2.i":29 1733 {*iorhi3_zarch}
      (expr_list:REG_UNUSED (reg:CC 33 %cc)
         (nil)))
but the second one changed like:
 (insn 22 21 23 2 (parallel [
             (set (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                         (const_int 10 [0xa])) [0 +0 S2 A16])
                 (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                             (const_int 10 [0xa])) [0 +0 S2 A16])
-                    (const_int 384 [0x180])))
+                    (const_int -128 [0xffffffffffffff80])))
             (clobber (reg:CC 33 %cc))
         ]) "diff2.i":29 1733 {*iorhi3_zarch}
      (expr_list:REG_UNUSED (reg:CC 33 %cc)
         (nil)))
which means it sets to all ones various other bitfields.
Comment 1 Jakub Jelinek 2020-02-24 16:53:45 UTC
I'd say the bug is in the way how we try to lower the zero_extract store:
Trying 19, 13, 21 -> 22:
Failed to match this instruction:
(set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                (const_int 10 [0xa])) [0 +0 S2 A16])
        (const_int 2 [0x2])
        (const_int 7 [0x7]))
    (const_int -1 [0xffffffffffffffff]))
Successfully matched this instruction:
(set (reg:HI 67)
    (const_int -128 [0xffffffffffffff80]))
Successfully matched this instruction:
(set (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
            (const_int 10 [0xa])) [0 +0 S2 A16])
    (ior:HI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                (const_int 10 [0xa])) [0 +0 S2 A16])
        (reg:HI 67)))
I'd say storing a -1 into a zero_extract is fine, at least IMHO we shouldn't rely on it being zero extended.
Comment 2 Jakub Jelinek 2020-02-24 17:42:09 UTC
Created attachment 47900 [details]
gcc10-pr93908.patch

Untested fix.
Comment 3 GCC Commits 2020-02-25 12:58:48 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:73dc4ae47418aef2eb470b8f71cef57dce37349e

commit r10-6844-g73dc4ae47418aef2eb470b8f71cef57dce37349e
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 25 13:56:47 2020 +0100

    combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908]
    
    git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13.
    I've managed to reduce it into the following testcase.  The problem is that
    during combine we see the s->k = -1; bitfield store and change the SET_SRC
    from a pseudo into a constant:
    (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                    (const_int 10 [0xa])) [0 +0 S2 A16])
            (const_int 2 [0x2])
            (const_int 7 [0x7]))
        (const_int -1 [0xffffffffffffffff]))
    This on s390x with the above option isn't recognized as valid instruction,
    so find_split_point decides to handle it as IOR or IOR/AND.
    src is -1, mask is 3 and pos is 7.
    src != mask (this is also incorrect, we want to set all (both) bits in the
    bitfield), so we go for IOR/AND, but instead of trying
    mem = (mem & ~0x180) | ((-1 << 7) & 0x180)
    we actually try
    mem = (mem & ~0x180) | (-1 << 7)
    and that is further simplified into:
    mem = mem | (-1 << 7)
    aka
    mem = mem | 0xff80
    which doesn't set just the 2-bit bitfield, but also many other bitfields
    that shouldn't be touched.
    We really should do:
    mem = mem | 0x180
    instead.
    The problem is that we assume that no bits but those low len (2 here) will
    be set in the SET_SRC, but there is nothing that can prevent that, we just
    should ignore the other bits.
    
    The following patch fixes it by masking src with mask, this way already
    the src == mask test will DTRT, and as the code for or_mask uses
    gen_int_mode, if the most significant bit is set after shifting it left by
    pos, it will be properly sign-extended.
    
    2020-02-25  Jakub Jelinek  <jakub@redhat.com>
    
    	PR rtl-optimization/93908
    	* combine.c (find_split_point): For store into ZERO_EXTRACT, and src
    	with mask.
    
    	* gcc.c-torture/execute/pr93908.c: New test.
Comment 4 GCC Commits 2020-02-25 13:05:01 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:eeb31391b7f223e6ca8cbd4452b99b55f4afdb1c

commit r9-8276-geeb31391b7f223e6ca8cbd4452b99b55f4afdb1c
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 25 13:56:47 2020 +0100

    combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908]
    
    git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13.
    I've managed to reduce it into the following testcase.  The problem is that
    during combine we see the s->k = -1; bitfield store and change the SET_SRC
    from a pseudo into a constant:
    (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                    (const_int 10 [0xa])) [0 +0 S2 A16])
            (const_int 2 [0x2])
            (const_int 7 [0x7]))
        (const_int -1 [0xffffffffffffffff]))
    This on s390x with the above option isn't recognized as valid instruction,
    so find_split_point decides to handle it as IOR or IOR/AND.
    src is -1, mask is 3 and pos is 7.
    src != mask (this is also incorrect, we want to set all (both) bits in the
    bitfield), so we go for IOR/AND, but instead of trying
    mem = (mem & ~0x180) | ((-1 << 7) & 0x180)
    we actually try
    mem = (mem & ~0x180) | (-1 << 7)
    and that is further simplified into:
    mem = mem | (-1 << 7)
    aka
    mem = mem | 0xff80
    which doesn't set just the 2-bit bitfield, but also many other bitfields
    that shouldn't be touched.
    We really should do:
    mem = mem | 0x180
    instead.
    The problem is that we assume that no bits but those low len (2 here) will
    be set in the SET_SRC, but there is nothing that can prevent that, we just
    should ignore the other bits.
    
    The following patch fixes it by masking src with mask, this way already
    the src == mask test will DTRT, and as the code for or_mask uses
    gen_int_mode, if the most significant bit is set after shifting it left by
    pos, it will be properly sign-extended.
    
    2020-02-25  Jakub Jelinek  <jakub@redhat.com>
    
    	PR rtl-optimization/93908
    	* combine.c (find_split_point): For store into ZERO_EXTRACT, and src
    	with mask.
    
    	* gcc.c-torture/execute/pr93908.c: New test.
Comment 5 GCC Commits 2020-02-25 13:09:28 UTC
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:d9814ee5972b48eb67a7af171957d1a285453f46

commit r8-10063-gd9814ee5972b48eb67a7af171957d1a285453f46
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 25 13:56:47 2020 +0100

    combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908]
    
    git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13.
    I've managed to reduce it into the following testcase.  The problem is that
    during combine we see the s->k = -1; bitfield store and change the SET_SRC
    from a pseudo into a constant:
    (set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
                    (const_int 10 [0xa])) [0 +0 S2 A16])
            (const_int 2 [0x2])
            (const_int 7 [0x7]))
        (const_int -1 [0xffffffffffffffff]))
    This on s390x with the above option isn't recognized as valid instruction,
    so find_split_point decides to handle it as IOR or IOR/AND.
    src is -1, mask is 3 and pos is 7.
    src != mask (this is also incorrect, we want to set all (both) bits in the
    bitfield), so we go for IOR/AND, but instead of trying
    mem = (mem & ~0x180) | ((-1 << 7) & 0x180)
    we actually try
    mem = (mem & ~0x180) | (-1 << 7)
    and that is further simplified into:
    mem = mem | (-1 << 7)
    aka
    mem = mem | 0xff80
    which doesn't set just the 2-bit bitfield, but also many other bitfields
    that shouldn't be touched.
    We really should do:
    mem = mem | 0x180
    instead.
    The problem is that we assume that no bits but those low len (2 here) will
    be set in the SET_SRC, but there is nothing that can prevent that, we just
    should ignore the other bits.
    
    The following patch fixes it by masking src with mask, this way already
    the src == mask test will DTRT, and as the code for or_mask uses
    gen_int_mode, if the most significant bit is set after shifting it left by
    pos, it will be properly sign-extended.
    
    2020-02-25  Jakub Jelinek  <jakub@redhat.com>
    
    	PR rtl-optimization/93908
    	* combine.c (find_split_point): For store into ZERO_EXTRACT, and src
    	with mask.
    
    	* gcc.c-torture/execute/pr93908.c: New test.
Comment 6 Jakub Jelinek 2020-02-25 13:10:29 UTC
Fixed.