Bug 97053 - [10/11 Regression] an O2, O3 codegen bug
Summary: [10/11 Regression] an O2, O3 codegen bug
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.2.0
: P2 normal
Target Milestone: 10.3
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2020-09-15 02:50 UTC by Chris Uzdavinis
Modified: 2020-09-18 09:58 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-09-15 00:00:00


Attachments
preprocessed code in case something changes after submitting (33.03 KB, application/x-gzip)
2020-09-15 02:50 UTC, Chris Uzdavinis
Details
gcc11-pr97053.patch (3.41 KB, patch)
2020-09-15 14:58 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Uzdavinis 2020-09-15 02:50:21 UTC
Created attachment 49217 [details]
preprocessed code in case something changes after submitting

"Wrong" output codegen bug.  Very sensitive, almost any code change prevents the problem, including O2 or less, as well as running under ubsan or asan, reordering variables, and in some cases changing the values involved (such as setting p=0 instead of 1)

Copying a buffer of 8 bytes of data (null terminated) results in a 6-bytes (of data) null terminated "string", losing 2 bytes from the end.

A variable "len" holds the strlen() of this string, reports 6, but some code affected by optimizer seems to think it is a larger value.

Expected output:

  runtime len=8
  orig=ABCDXXXX
  copy=ABCDXXXX

actual output:

  runtime len=6
  orig=ABCDXXXX
  copy=ABCDXX


Even though above the actual output shows len=6 when printing "len", a separate line of code is not present in the assembly (presumably optimized away under a confused compile-time eval.)

    if (len < 8)
        printf("len < 8\n"); // but len == 6


Problem visible in 10.1 and 10.2 at -O2 and -O3, but not -O1 or less, nor in in 9.x or earlier.  

CODE: 

#include <cstdio>
#include <cstring>

struct Data
{
    short n;
    char name[8 + 1];
    int p;
    char s;
    int b;
};

int main(int argc, char** argv)
{
    int p = 1;  
    char orig[9]{"XXXXXXXX"};
    std::memcpy(orig, "ABCD", 4);

    Data data{};
    data.n = 5u;
  
    // this should copy ABCDXXXX but loses final 2 XX's
    std::memcpy(data.name, orig, 8); 
    
    data.s = 'X';
    data.b = p;

    const size_t len = strlen(data.name);
    if (len < 8) 
        printf("len < 8\n"); // does not print

    std::printf("runtime len=%lu\n", len); // this, however, prints 6
    std::printf("orig=%s\ncopy=%s\n", orig, data.name);
}


Live on godbolt:  https://godbolt.org/z/v6xqh8
Comment 1 Richard Biener 2020-09-15 06:21:07 UTC
Confirmed.  It is store-mergings wrong-doing:

> diff -u t.ii.194t.widening_mul t.ii.195t.store-merging
--- t.ii.194t.widening_mul      2020-09-15 08:15:39.824282586 +0200
+++ t.ii.195t.store-merging     2020-09-15 08:15:39.824282586 +0200
@@ -1,6 +1,10 @@
 
 ;; Function main (main, funcdef_no=26, decl_uid=3207, cgraph_uid=27, symbol_order=26) (executed once)
 
+Coalescing successful!
+Merged into 1 stores
+New sequence of 2 stores to replace old one of 3 stores
+Merging successful!
 main (int argc, char * * argv)
 {
   const size_t len;
@@ -11,12 +15,11 @@
   <bb 2> [local count: 1073741824]:
   orig = "XXXXXXXX";
   memcpy (&orig, "ABCD", 4);
-  MEM <char[12]> [(struct Data *)&data + 8B] = {};
   data.n = 5;
   _7 = MEM <long unsigned int> [(char * {ref-all})&orig];
   MEM <long unsigned int> [(char * {ref-all})&data + 2B] = _7;
-  data.s = 88;
-  data.b = 1;
+  MEM <unsigned long> [(struct Data *)&data + 8B] = 0;
+  MEM <unsigned long> [(void *)&data + 16B] = 4294967384;
   len_11 = strlen (&data.name);
   printf ("runtime len=%lu\n", len_11);
   printf ("orig=%s\ncopy=%s\n", &orig, &data.name);

we merge data + 8 = {} "across" data + 2 = _7 but fail to see they alias.
Comment 2 Eric Botcazou 2020-09-15 07:08:47 UTC
The statement generated by DSE

MEM <char[12]> [(struct Data *)&data + 8B] = {};

looks nonsensical and I guess store-merging is not prepared for it.
Comment 3 Jakub Jelinek 2020-09-15 08:21:01 UTC
Started with my r10-179-g3afd514bca6ea572e614b5289c4429ace693311b
I'll have a look.
Comment 4 Jakub Jelinek 2020-09-15 09:08:57 UTC
Reduced testcase:

struct S { short a; char b[9]; int c; char d; int e; };

__attribute__((noipa)) void
foo (char *x, char *y)
{
  if (__builtin_strcmp (x, "ABCDXXXX") != 0
      || __builtin_strcmp (y, "ABCDXXXX") != 0)
    __builtin_abort ();
}

int
main ()
{
  char a[9] = "XXXXXXXX";
  struct S b = {};
  __builtin_memcpy (a, "ABCD", 4);
  b.a = 5;
  __builtin_memcpy (b.b, a, 8); 
  b.d = 'X';
  b.e = 1;
  foo (a, b.b);
  return 0;
}
Comment 5 Jakub Jelinek 2020-09-15 09:53:39 UTC
Better testcase that won't cease to test the bug even if FRE or some other pass gets smarter and optimizes the a = "XXXXXXXX"; __builtin_memcpy (&a, "ABCD", 4);
into a = "ABCDXXXX":

struct S { short a; char b[9]; int c; char d; int e; };

__attribute__((noipa)) void
foo (struct S *x)
{
  if (__builtin_strcmp (x->b, "ABCDXXXX") != 0)
    __builtin_abort ();
}

__attribute__((noipa)) void
bar (char *x)
{
  struct S b = {};
  b.a = 5;
  __builtin_memcpy (b.b, x, 8); 
  b.d = 'X';
  b.e = 1;
  foo (&b);
}

int
main ()
{
  bar ("ABCDXXXX");
  return 0;
}

(In reply to Eric Botcazou from comment #2)
> The statement generated by DSE
> 
> MEM <char[12]> [(struct Data *)&data + 8B] = {};
> 
> looks nonsensical and I guess store-merging is not prepared for it.

There is nothing non-sensical nor wrong on it, it is standard GIMPLE IL, equivalent in behavior to __builtin_memset (&data + 8B, 0, 12);
Comment 6 Jakub Jelinek 2020-09-15 10:55:21 UTC
I think the problem is that check_no_overlap only looks at later m_store_info stores that would overlap and cause problems.
But in this case, the problem is that we'd need to check earlier m_store_info store too.
We have in the stmt order
bitpos bitsize kind order
64 96 INTEGER_CST 0
0 16 INTEGER_CST 1
16 64 MEM_REF 2
128 8 INTEGER_CST 3
160 32 INTEGER_CST 4
After sort_by_bitpos this is:
0 16 INTEGER_CST 1
16 64 MEM_REF 2
64 96 INTEGER_CST 0
128 8 INTEGER_CST 3
160 32 INTEGER_CST 4
The first store in this order doesn't overlap anything and is followed by adjacent store of incompatible kind, so is kept as is.
Then the MEM_REF store is kept as is too, because again it overlaps with the next one, but is of a different kind.
The bug is when deciding if the 3rd store could be mergeable with the 4th store.
They are overlapping and of the same kind that can be handled for overlapping stores, but they shouldn't be merged, because there was a non-mergeable overlapping store in between them in the sort_by_order order (the MEM_REF one).
Comment 7 Jakub Jelinek 2020-09-15 13:23:07 UTC
Another testcase for -O2 -fno-tree-dse that shows more issues, that it affects not just the overlapping case handling, but also can affect also the adjacent cases:
struct __attribute__((packed, may_alias)) S { long long s; };
struct __attribute__((packed, may_alias)) T { short t; };

__attribute__((noipa)) void
test (char *p, char *q, int s)
{
  if ((s & 1) == 0)
    {
      if (*(short __attribute__((may_alias)) *) &p[sizeof (short)]
	  != *(short __attribute__((may_alias)) *) &q[sizeof (short)]
	  || (((struct S __attribute__((may_alias)) *) &p[1])->s
	      != ((struct S __attribute__((may_alias)) *) &q[1])->s)
	  || (*(short __attribute__((may_alias)) *) &p[2 * sizeof (short)]
	      != *(short __attribute__((may_alias)) *) &q[2 * sizeof (short)]))
	__builtin_abort ();
    }
  else
    {
      if (*(short __attribute__((may_alias)) *) &p[sizeof (short)]
	  != *(short __attribute__((may_alias)) *) &q[sizeof (short)]
	  || (((struct S __attribute__((may_alias)) *) &p[1])->s
	      != ((struct S __attribute__((may_alias)) *) &q[1])->s)
	  || (((struct T __attribute__((may_alias)) *) &p[2 * sizeof (short) - 1])->t
	      != ((struct T __attribute__((may_alias)) *) &q[2 * sizeof (short) - 1])->t)
	  || p[3 * sizeof (short) - 2] != q[3 * sizeof (short) - 2])
	__builtin_abort ();
    }
}

__attribute__((noipa)) void
foo (long long *p, char *q, char *r, char *s)
{
  char a[64] __attribute__((aligned (__alignof (short))));
  *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1;
  ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0];
  *(short __attribute__((may_alias)) *) &a[2 * sizeof (short)] = 2;
  *(short __attribute__((may_alias)) *) &q[sizeof (short)] = 1;
  ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0];
  *(short __attribute__((may_alias)) *) &s[2 * sizeof (short)] = 2;
  test (a, q, 0);
}

__attribute__((noipa)) void
bar (long long *p, char *q, char *r, char *s, char *t)
{
  char a[64] __attribute__((aligned (__alignof (short))));
  *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1;
  ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0];
  ((struct T __attribute__((may_alias)) *) &a[2 * sizeof (short) - 1])->t = 2;
  a[3 * sizeof (short) - 2] = 3;
  *(short __attribute__((may_alias)) *) &q[sizeof (short)] = 1;
  ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0];
  ((struct T __attribute__((may_alias)) *) &s[2 * sizeof (short) - 1])->t = 2;
  t[3 * sizeof (short) - 2] = 3;
  test (a, q, 1);
}

__attribute__((noipa)) void
baz (long long *p, char *q, char *r, char *s)
{
  char a[64] __attribute__((aligned (__alignof (short))));
  *(short __attribute__((may_alias)) *) &a[2 * sizeof (short)] = 2;
  ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0];
  *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1;
  *(short __attribute__((may_alias)) *) &q[2 * sizeof (short)] = 2;
  ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0];
  *(short __attribute__((may_alias)) *) &s[sizeof (short)] = 1;
  test (a, q, 2);
}

__attribute__((noipa)) void
qux (long long *p, char *q, char *r, char *s, char *t)
{
  char a[64] __attribute__((aligned (__alignof (short))));
  *(short __attribute__((may_alias)) *) &a[2 * sizeof (short) - 1] = 2;
  ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0];
  a[3 * sizeof (short) - 2] = 3;
  *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1;
  ((struct T __attribute__((may_alias)) *) &q[2 * sizeof (short) - 1])->t = 2;
  ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0];
  s[3 * sizeof (short) - 2] = 3;
  ((struct T __attribute__((may_alias)) *) &t[sizeof (short)])->t = 1;
  test (a, q, 3);
}

int
main ()
{
  char a[64];
  long long p = -1LL;
  foo (&p, &a[0], &a[0], &a[0]);
  bar (&p, &a[0], &a[0], &a[0], &a[0]);
  baz (&p, &a[0], &a[0], &a[0]);
  qux (&p, &a[0], &a[0], &a[0], &a[0]);
  return 0;
}
Comment 8 Jakub Jelinek 2020-09-15 14:58:53 UTC
Created attachment 49221 [details]
gcc11-pr97053.patch

Untested fix.
Comment 9 GCC Commits 2020-09-16 07:45:07 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:bd909071ac04e94f4b6f0baab64d0687ec55681d

commit r11-3219-gbd909071ac04e94f4b6f0baab64d0687ec55681d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 16 09:42:33 2020 +0200

    store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
    
    As the testcases show, if we have something like:
      MEM <char[12]> [&b + 8B] = {};
      MEM[(short *) &b] = 5;
      _5 = *x_4(D);
      MEM <long long unsigned int> [&b + 2B] = _5;
      MEM[(char *)&b + 16B] = 88;
      MEM[(int *)&b + 20B] = 1;
    then in sort_by_bitpos the stores are almost like in the given order,
    except the first store is after the = _5; store.
    We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
    while the former INTEGER_CST, and we can't coalesce the = _5 store with
    the = {} store because the former is MEM_REF, the latter INTEGER_CST.
    But we happily coalesce the remaining 3 stores, which is wrong, because the
    = _5; store overlaps those and is in between them in the program order.
    We already have code to deal with similar cases in check_no_overlap, but we
    deal only with the following stores in sort_by_bitpos order, not the earlier
    ones.
    
    The following patch checks also the earlier ones.  In coalesce_immediate_stores
    it computes the first one that needs to be checked (all the ones whose
    bitpos + bitsize is smaller or equal to merged_store->start don't need to be
    checked and don't need to be checked even for any following attempts because
    of the sort_by_bitpos sorting) and the end of that (that is the first store
    in the merged_store).
    
    2020-09-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/97053
            * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
            START, FIRST_EARLIER and LAST_EARLIER arguments.  Return false if
            any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
            has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
            be merged store.
            (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
            Adjust check_no_overlap caller.
            (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
            and last_earlier variables, adjust them during iterations.  Adjust
            check_no_overlap callers, call check_no_overlap even when extending
            overlapping stores by extra INTEGER_CST stores.
    
            * gcc.dg/store_merging_31.c: New test.
            * gcc.dg/store_merging_32.c: New test.
Comment 10 GCC Commits 2020-09-16 07:50:36 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7e97e7470e74b0d9a68000938a359a7049774d77

commit r10-8769-g7e97e7470e74b0d9a68000938a359a7049774d77
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 16 09:42:33 2020 +0200

    store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
    
    As the testcases show, if we have something like:
      MEM <char[12]> [&b + 8B] = {};
      MEM[(short *) &b] = 5;
      _5 = *x_4(D);
      MEM <long long unsigned int> [&b + 2B] = _5;
      MEM[(char *)&b + 16B] = 88;
      MEM[(int *)&b + 20B] = 1;
    then in sort_by_bitpos the stores are almost like in the given order,
    except the first store is after the = _5; store.
    We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
    while the former INTEGER_CST, and we can't coalesce the = _5 store with
    the = {} store because the former is MEM_REF, the latter INTEGER_CST.
    But we happily coalesce the remaining 3 stores, which is wrong, because the
    = _5; store overlaps those and is in between them in the program order.
    We already have code to deal with similar cases in check_no_overlap, but we
    deal only with the following stores in sort_by_bitpos order, not the earlier
    ones.
    
    The following patch checks also the earlier ones.  In coalesce_immediate_stores
    it computes the first one that needs to be checked (all the ones whose
    bitpos + bitsize is smaller or equal to merged_store->start don't need to be
    checked and don't need to be checked even for any following attempts because
    of the sort_by_bitpos sorting) and the end of that (that is the first store
    in the merged_store).
    
    2020-09-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/97053
            * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
            START, FIRST_EARLIER and LAST_EARLIER arguments.  Return false if
            any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
            has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
            be merged store.
            (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
            Adjust check_no_overlap caller.
            (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
            and last_earlier variables, adjust them during iterations.  Adjust
            check_no_overlap callers, call check_no_overlap even when extending
            overlapping stores by extra INTEGER_CST stores.
    
            * gcc.dg/store_merging_31.c: New test.
            * gcc.dg/store_merging_32.c: New test.
    
    (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)
Comment 11 Jakub Jelinek 2020-09-16 07:55:07 UTC
Should be fixed now for 10.3+ and 11.1+.
Comment 12 GCC Commits 2020-09-18 09:52:03 UTC
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a24744c1ed89e255f3db5b3981519f538d231886

commit r9-8918-ga24744c1ed89e255f3db5b3981519f538d231886
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 16 09:42:33 2020 +0200

    store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
    
    As the testcases show, if we have something like:
      MEM <char[12]> [&b + 8B] = {};
      MEM[(short *) &b] = 5;
      _5 = *x_4(D);
      MEM <long long unsigned int> [&b + 2B] = _5;
      MEM[(char *)&b + 16B] = 88;
      MEM[(int *)&b + 20B] = 1;
    then in sort_by_bitpos the stores are almost like in the given order,
    except the first store is after the = _5; store.
    We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
    while the former INTEGER_CST, and we can't coalesce the = _5 store with
    the = {} store because the former is MEM_REF, the latter INTEGER_CST.
    But we happily coalesce the remaining 3 stores, which is wrong, because the
    = _5; store overlaps those and is in between them in the program order.
    We already have code to deal with similar cases in check_no_overlap, but we
    deal only with the following stores in sort_by_bitpos order, not the earlier
    ones.
    
    The following patch checks also the earlier ones.  In coalesce_immediate_stores
    it computes the first one that needs to be checked (all the ones whose
    bitpos + bitsize is smaller or equal to merged_store->start don't need to be
    checked and don't need to be checked even for any following attempts because
    of the sort_by_bitpos sorting) and the end of that (that is the first store
    in the merged_store).
    
    2020-09-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/97053
            * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
            START, FIRST_EARLIER and LAST_EARLIER arguments.  Return false if
            any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
            has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
            be merged store.
            (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
            Adjust check_no_overlap caller.
            (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
            and last_earlier variables, adjust them during iterations.  Adjust
            check_no_overlap callers, call check_no_overlap even when extending
            overlapping stores by extra INTEGER_CST stores.
    
            * gcc.dg/store_merging_31.c: New test.
            * gcc.dg/store_merging_32.c: New test.
    
    (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)
Comment 13 GCC Commits 2020-09-18 09:58:26 UTC
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:87ae45cdbd7b70a4c92d5137552228ed9ad9e9e7

commit r8-10518-g87ae45cdbd7b70a4c92d5137552228ed9ad9e9e7
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Sep 16 09:42:33 2020 +0200

    store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
    
    As the testcases show, if we have something like:
      MEM <char[12]> [&b + 8B] = {};
      MEM[(short *) &b] = 5;
      _5 = *x_4(D);
      MEM <long long unsigned int> [&b + 2B] = _5;
      MEM[(char *)&b + 16B] = 88;
      MEM[(int *)&b + 20B] = 1;
    then in sort_by_bitpos the stores are almost like in the given order,
    except the first store is after the = _5; store.
    We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
    while the former INTEGER_CST, and we can't coalesce the = _5 store with
    the = {} store because the former is MEM_REF, the latter INTEGER_CST.
    But we happily coalesce the remaining 3 stores, which is wrong, because the
    = _5; store overlaps those and is in between them in the program order.
    We already have code to deal with similar cases in check_no_overlap, but we
    deal only with the following stores in sort_by_bitpos order, not the earlier
    ones.
    
    The following patch checks also the earlier ones.  In coalesce_immediate_stores
    it computes the first one that needs to be checked (all the ones whose
    bitpos + bitsize is smaller or equal to merged_store->start don't need to be
    checked and don't need to be checked even for any following attempts because
    of the sort_by_bitpos sorting) and the end of that (that is the first store
    in the merged_store).
    
    2020-09-16  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/97053
            * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
            START, FIRST_EARLIER and LAST_EARLIER arguments.  Return false if
            any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
            has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
            be merged store.
            (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
            Adjust check_no_overlap caller.
            (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
            and last_earlier variables, adjust them during iterations.  Adjust
            check_no_overlap callers, call check_no_overlap even when extending
            overlapping stores by extra INTEGER_CST stores.
    
            * gcc.dg/store_merging_31.c: New test.
            * gcc.dg/store_merging_32.c: New test.
    
    (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)