Created attachment 49217 [details] preprocessed code in case something changes after submitting "Wrong" output codegen bug. Very sensitive, almost any code change prevents the problem, including O2 or less, as well as running under ubsan or asan, reordering variables, and in some cases changing the values involved (such as setting p=0 instead of 1) Copying a buffer of 8 bytes of data (null terminated) results in a 6-bytes (of data) null terminated "string", losing 2 bytes from the end. A variable "len" holds the strlen() of this string, reports 6, but some code affected by optimizer seems to think it is a larger value. Expected output: runtime len=8 orig=ABCDXXXX copy=ABCDXXXX actual output: runtime len=6 orig=ABCDXXXX copy=ABCDXX Even though above the actual output shows len=6 when printing "len", a separate line of code is not present in the assembly (presumably optimized away under a confused compile-time eval.) if (len < 8) printf("len < 8\n"); // but len == 6 Problem visible in 10.1 and 10.2 at -O2 and -O3, but not -O1 or less, nor in in 9.x or earlier. CODE: #include <cstdio> #include <cstring> struct Data { short n; char name[8 + 1]; int p; char s; int b; }; int main(int argc, char** argv) { int p = 1; char orig[9]{"XXXXXXXX"}; std::memcpy(orig, "ABCD", 4); Data data{}; data.n = 5u; // this should copy ABCDXXXX but loses final 2 XX's std::memcpy(data.name, orig, 8); data.s = 'X'; data.b = p; const size_t len = strlen(data.name); if (len < 8) printf("len < 8\n"); // does not print std::printf("runtime len=%lu\n", len); // this, however, prints 6 std::printf("orig=%s\ncopy=%s\n", orig, data.name); } Live on godbolt: https://godbolt.org/z/v6xqh8
Confirmed. It is store-mergings wrong-doing: > diff -u t.ii.194t.widening_mul t.ii.195t.store-merging --- t.ii.194t.widening_mul 2020-09-15 08:15:39.824282586 +0200 +++ t.ii.195t.store-merging 2020-09-15 08:15:39.824282586 +0200 @@ -1,6 +1,10 @@ ;; Function main (main, funcdef_no=26, decl_uid=3207, cgraph_uid=27, symbol_order=26) (executed once) +Coalescing successful! +Merged into 1 stores +New sequence of 2 stores to replace old one of 3 stores +Merging successful! main (int argc, char * * argv) { const size_t len; @@ -11,12 +15,11 @@ <bb 2> [local count: 1073741824]: orig = "XXXXXXXX"; memcpy (&orig, "ABCD", 4); - MEM <char[12]> [(struct Data *)&data + 8B] = {}; data.n = 5; _7 = MEM <long unsigned int> [(char * {ref-all})&orig]; MEM <long unsigned int> [(char * {ref-all})&data + 2B] = _7; - data.s = 88; - data.b = 1; + MEM <unsigned long> [(struct Data *)&data + 8B] = 0; + MEM <unsigned long> [(void *)&data + 16B] = 4294967384; len_11 = strlen (&data.name); printf ("runtime len=%lu\n", len_11); printf ("orig=%s\ncopy=%s\n", &orig, &data.name); we merge data + 8 = {} "across" data + 2 = _7 but fail to see they alias.
The statement generated by DSE MEM <char[12]> [(struct Data *)&data + 8B] = {}; looks nonsensical and I guess store-merging is not prepared for it.
Started with my r10-179-g3afd514bca6ea572e614b5289c4429ace693311b I'll have a look.
Reduced testcase: struct S { short a; char b[9]; int c; char d; int e; }; __attribute__((noipa)) void foo (char *x, char *y) { if (__builtin_strcmp (x, "ABCDXXXX") != 0 || __builtin_strcmp (y, "ABCDXXXX") != 0) __builtin_abort (); } int main () { char a[9] = "XXXXXXXX"; struct S b = {}; __builtin_memcpy (a, "ABCD", 4); b.a = 5; __builtin_memcpy (b.b, a, 8); b.d = 'X'; b.e = 1; foo (a, b.b); return 0; }
Better testcase that won't cease to test the bug even if FRE or some other pass gets smarter and optimizes the a = "XXXXXXXX"; __builtin_memcpy (&a, "ABCD", 4); into a = "ABCDXXXX": struct S { short a; char b[9]; int c; char d; int e; }; __attribute__((noipa)) void foo (struct S *x) { if (__builtin_strcmp (x->b, "ABCDXXXX") != 0) __builtin_abort (); } __attribute__((noipa)) void bar (char *x) { struct S b = {}; b.a = 5; __builtin_memcpy (b.b, x, 8); b.d = 'X'; b.e = 1; foo (&b); } int main () { bar ("ABCDXXXX"); return 0; } (In reply to Eric Botcazou from comment #2) > The statement generated by DSE > > MEM <char[12]> [(struct Data *)&data + 8B] = {}; > > looks nonsensical and I guess store-merging is not prepared for it. There is nothing non-sensical nor wrong on it, it is standard GIMPLE IL, equivalent in behavior to __builtin_memset (&data + 8B, 0, 12);
I think the problem is that check_no_overlap only looks at later m_store_info stores that would overlap and cause problems. But in this case, the problem is that we'd need to check earlier m_store_info store too. We have in the stmt order bitpos bitsize kind order 64 96 INTEGER_CST 0 0 16 INTEGER_CST 1 16 64 MEM_REF 2 128 8 INTEGER_CST 3 160 32 INTEGER_CST 4 After sort_by_bitpos this is: 0 16 INTEGER_CST 1 16 64 MEM_REF 2 64 96 INTEGER_CST 0 128 8 INTEGER_CST 3 160 32 INTEGER_CST 4 The first store in this order doesn't overlap anything and is followed by adjacent store of incompatible kind, so is kept as is. Then the MEM_REF store is kept as is too, because again it overlaps with the next one, but is of a different kind. The bug is when deciding if the 3rd store could be mergeable with the 4th store. They are overlapping and of the same kind that can be handled for overlapping stores, but they shouldn't be merged, because there was a non-mergeable overlapping store in between them in the sort_by_order order (the MEM_REF one).
Another testcase for -O2 -fno-tree-dse that shows more issues, that it affects not just the overlapping case handling, but also can affect also the adjacent cases: struct __attribute__((packed, may_alias)) S { long long s; }; struct __attribute__((packed, may_alias)) T { short t; }; __attribute__((noipa)) void test (char *p, char *q, int s) { if ((s & 1) == 0) { if (*(short __attribute__((may_alias)) *) &p[sizeof (short)] != *(short __attribute__((may_alias)) *) &q[sizeof (short)] || (((struct S __attribute__((may_alias)) *) &p[1])->s != ((struct S __attribute__((may_alias)) *) &q[1])->s) || (*(short __attribute__((may_alias)) *) &p[2 * sizeof (short)] != *(short __attribute__((may_alias)) *) &q[2 * sizeof (short)])) __builtin_abort (); } else { if (*(short __attribute__((may_alias)) *) &p[sizeof (short)] != *(short __attribute__((may_alias)) *) &q[sizeof (short)] || (((struct S __attribute__((may_alias)) *) &p[1])->s != ((struct S __attribute__((may_alias)) *) &q[1])->s) || (((struct T __attribute__((may_alias)) *) &p[2 * sizeof (short) - 1])->t != ((struct T __attribute__((may_alias)) *) &q[2 * sizeof (short) - 1])->t) || p[3 * sizeof (short) - 2] != q[3 * sizeof (short) - 2]) __builtin_abort (); } } __attribute__((noipa)) void foo (long long *p, char *q, char *r, char *s) { char a[64] __attribute__((aligned (__alignof (short)))); *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1; ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0]; *(short __attribute__((may_alias)) *) &a[2 * sizeof (short)] = 2; *(short __attribute__((may_alias)) *) &q[sizeof (short)] = 1; ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0]; *(short __attribute__((may_alias)) *) &s[2 * sizeof (short)] = 2; test (a, q, 0); } __attribute__((noipa)) void bar (long long *p, char *q, char *r, char *s, char *t) { char a[64] __attribute__((aligned (__alignof (short)))); *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1; ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0]; ((struct T __attribute__((may_alias)) *) &a[2 * sizeof (short) - 1])->t = 2; a[3 * sizeof (short) - 2] = 3; *(short __attribute__((may_alias)) *) &q[sizeof (short)] = 1; ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0]; ((struct T __attribute__((may_alias)) *) &s[2 * sizeof (short) - 1])->t = 2; t[3 * sizeof (short) - 2] = 3; test (a, q, 1); } __attribute__((noipa)) void baz (long long *p, char *q, char *r, char *s) { char a[64] __attribute__((aligned (__alignof (short)))); *(short __attribute__((may_alias)) *) &a[2 * sizeof (short)] = 2; ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0]; *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1; *(short __attribute__((may_alias)) *) &q[2 * sizeof (short)] = 2; ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0]; *(short __attribute__((may_alias)) *) &s[sizeof (short)] = 1; test (a, q, 2); } __attribute__((noipa)) void qux (long long *p, char *q, char *r, char *s, char *t) { char a[64] __attribute__((aligned (__alignof (short)))); *(short __attribute__((may_alias)) *) &a[2 * sizeof (short) - 1] = 2; ((struct S __attribute__((may_alias)) *) &a[1])->s = p[0]; a[3 * sizeof (short) - 2] = 3; *(short __attribute__((may_alias)) *) &a[sizeof (short)] = 1; ((struct T __attribute__((may_alias)) *) &q[2 * sizeof (short) - 1])->t = 2; ((struct S __attribute__((may_alias)) *) &r[1])->s = p[0]; s[3 * sizeof (short) - 2] = 3; ((struct T __attribute__((may_alias)) *) &t[sizeof (short)])->t = 1; test (a, q, 3); } int main () { char a[64]; long long p = -1LL; foo (&p, &a[0], &a[0], &a[0]); bar (&p, &a[0], &a[0], &a[0], &a[0]); baz (&p, &a[0], &a[0], &a[0]); qux (&p, &a[0], &a[0], &a[0], &a[0]); return 0; }
Created attachment 49221 [details] gcc11-pr97053.patch Untested fix.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:bd909071ac04e94f4b6f0baab64d0687ec55681d commit r11-3219-gbd909071ac04e94f4b6f0baab64d0687ec55681d Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 16 09:42:33 2020 +0200 store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053] As the testcases show, if we have something like: MEM <char[12]> [&b + 8B] = {}; MEM[(short *) &b] = 5; _5 = *x_4(D); MEM <long long unsigned int> [&b + 2B] = _5; MEM[(char *)&b + 16B] = 88; MEM[(int *)&b + 20B] = 1; then in sort_by_bitpos the stores are almost like in the given order, except the first store is after the = _5; store. We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF, while the former INTEGER_CST, and we can't coalesce the = _5 store with the = {} store because the former is MEM_REF, the latter INTEGER_CST. But we happily coalesce the remaining 3 stores, which is wrong, because the = _5; store overlaps those and is in between them in the program order. We already have code to deal with similar cases in check_no_overlap, but we deal only with the following stores in sort_by_bitpos order, not the earlier ones. The following patch checks also the earlier ones. In coalesce_immediate_stores it computes the first one that needs to be checked (all the ones whose bitpos + bitsize is smaller or equal to merged_store->start don't need to be checked and don't need to be checked even for any following attempts because of the sort_by_bitpos sorting) and the end of that (that is the first store in the merged_store). 2020-09-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/97053 * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER, START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive has order in between FIRST_ORDER and LAST_ORDER and overlaps the to be merged store. (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument. Adjust check_no_overlap caller. (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier and last_earlier variables, adjust them during iterations. Adjust check_no_overlap callers, call check_no_overlap even when extending overlapping stores by extra INTEGER_CST stores. * gcc.dg/store_merging_31.c: New test. * gcc.dg/store_merging_32.c: New test.
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:7e97e7470e74b0d9a68000938a359a7049774d77 commit r10-8769-g7e97e7470e74b0d9a68000938a359a7049774d77 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 16 09:42:33 2020 +0200 store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053] As the testcases show, if we have something like: MEM <char[12]> [&b + 8B] = {}; MEM[(short *) &b] = 5; _5 = *x_4(D); MEM <long long unsigned int> [&b + 2B] = _5; MEM[(char *)&b + 16B] = 88; MEM[(int *)&b + 20B] = 1; then in sort_by_bitpos the stores are almost like in the given order, except the first store is after the = _5; store. We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF, while the former INTEGER_CST, and we can't coalesce the = _5 store with the = {} store because the former is MEM_REF, the latter INTEGER_CST. But we happily coalesce the remaining 3 stores, which is wrong, because the = _5; store overlaps those and is in between them in the program order. We already have code to deal with similar cases in check_no_overlap, but we deal only with the following stores in sort_by_bitpos order, not the earlier ones. The following patch checks also the earlier ones. In coalesce_immediate_stores it computes the first one that needs to be checked (all the ones whose bitpos + bitsize is smaller or equal to merged_store->start don't need to be checked and don't need to be checked even for any following attempts because of the sort_by_bitpos sorting) and the end of that (that is the first store in the merged_store). 2020-09-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/97053 * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER, START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive has order in between FIRST_ORDER and LAST_ORDER and overlaps the to be merged store. (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument. Adjust check_no_overlap caller. (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier and last_earlier variables, adjust them during iterations. Adjust check_no_overlap callers, call check_no_overlap even when extending overlapping stores by extra INTEGER_CST stores. * gcc.dg/store_merging_31.c: New test. * gcc.dg/store_merging_32.c: New test. (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)
Should be fixed now for 10.3+ and 11.1+.
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a24744c1ed89e255f3db5b3981519f538d231886 commit r9-8918-ga24744c1ed89e255f3db5b3981519f538d231886 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 16 09:42:33 2020 +0200 store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053] As the testcases show, if we have something like: MEM <char[12]> [&b + 8B] = {}; MEM[(short *) &b] = 5; _5 = *x_4(D); MEM <long long unsigned int> [&b + 2B] = _5; MEM[(char *)&b + 16B] = 88; MEM[(int *)&b + 20B] = 1; then in sort_by_bitpos the stores are almost like in the given order, except the first store is after the = _5; store. We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF, while the former INTEGER_CST, and we can't coalesce the = _5 store with the = {} store because the former is MEM_REF, the latter INTEGER_CST. But we happily coalesce the remaining 3 stores, which is wrong, because the = _5; store overlaps those and is in between them in the program order. We already have code to deal with similar cases in check_no_overlap, but we deal only with the following stores in sort_by_bitpos order, not the earlier ones. The following patch checks also the earlier ones. In coalesce_immediate_stores it computes the first one that needs to be checked (all the ones whose bitpos + bitsize is smaller or equal to merged_store->start don't need to be checked and don't need to be checked even for any following attempts because of the sort_by_bitpos sorting) and the end of that (that is the first store in the merged_store). 2020-09-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/97053 * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER, START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive has order in between FIRST_ORDER and LAST_ORDER and overlaps the to be merged store. (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument. Adjust check_no_overlap caller. (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier and last_earlier variables, adjust them during iterations. Adjust check_no_overlap callers, call check_no_overlap even when extending overlapping stores by extra INTEGER_CST stores. * gcc.dg/store_merging_31.c: New test. * gcc.dg/store_merging_32.c: New test. (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:87ae45cdbd7b70a4c92d5137552228ed9ad9e9e7 commit r8-10518-g87ae45cdbd7b70a4c92d5137552228ed9ad9e9e7 Author: Jakub Jelinek <jakub@redhat.com> Date: Wed Sep 16 09:42:33 2020 +0200 store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053] As the testcases show, if we have something like: MEM <char[12]> [&b + 8B] = {}; MEM[(short *) &b] = 5; _5 = *x_4(D); MEM <long long unsigned int> [&b + 2B] = _5; MEM[(char *)&b + 16B] = 88; MEM[(int *)&b + 20B] = 1; then in sort_by_bitpos the stores are almost like in the given order, except the first store is after the = _5; store. We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF, while the former INTEGER_CST, and we can't coalesce the = _5 store with the = {} store because the former is MEM_REF, the latter INTEGER_CST. But we happily coalesce the remaining 3 stores, which is wrong, because the = _5; store overlaps those and is in between them in the program order. We already have code to deal with similar cases in check_no_overlap, but we deal only with the following stores in sort_by_bitpos order, not the earlier ones. The following patch checks also the earlier ones. In coalesce_immediate_stores it computes the first one that needs to be checked (all the ones whose bitpos + bitsize is smaller or equal to merged_store->start don't need to be checked and don't need to be checked even for any following attempts because of the sort_by_bitpos sorting) and the end of that (that is the first store in the merged_store). 2020-09-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/97053 * gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER, START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive has order in between FIRST_ORDER and LAST_ORDER and overlaps the to be merged store. (imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument. Adjust check_no_overlap caller. (imm_store_chain_info::coalesce_immediate_stores): Add first_earlier and last_earlier variables, adjust them during iterations. Adjust check_no_overlap callers, call check_no_overlap even when extending overlapping stores by extra INTEGER_CST stores. * gcc.dg/store_merging_31.c: New test. * gcc.dg/store_merging_32.c: New test. (cherry picked from commit bd909071ac04e94f4b6f0baab64d0687ec55681d)