[Bug target/97400] New: [10/11 Regression] SVE: wrong code since r10-3906-g96eb7d7a64

acoplan at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Oct 13 11:10:31 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97400

            Bug ID: 97400
           Summary: [10/11 Regression] SVE: wrong code since
                    r10-3906-g96eb7d7a64
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49364
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49364&action=edit
assembly generated at r10-3906

AArch64 GCC miscompiles the following testcase:

int a[256];
int c, f;
int *d[10];
int main(void)
{
  for (; f < 256; f++) {
    a[f] = c = 9;
    for (; c >= 0; c--) {
      d[c] = 0;
    }
  }
  return a[255];
}

with -O3 -march=armv8.2-a+sve since
r10-3906-g96eb7d7a642085f651e9940f0ee75568d7c4441d7. The program should exit
with status code 9 but instead exits with status code 0.

The program produces the correct result at
r10-3681-g3faf75d458529592007436a0972f44e14ebf46f6, but between these two
revisions, GCC ICEs on this input, so the bad commit lies somewhere in between
these.

To reproduce the issue back to r10-3906, you need to add -fno-common to the
command line (this became the default in GCC 10).

Examining the broken assembly code, it appears that the scalar epilogue for the
inner loop tramples backwards through d into the end of a:

.L8:
        add     x1, x4, 1032          // x1 = &d[0]
        sub     w5, w0, #1            // w5 = -1
        sub     w13, w0, #2
        sub     w12, w0, #3
        sub     w11, w0, #4
        sub     w9, w0, #5
        str     xzr, [x1, w0, sxtw 3]
        sub     w8, w0, #6
        str     xzr, [x1, w5, sxtw 3] // incorrectly sets a[255] = 0
[...]

The layout of .bss here is:

f (4 bytes) | padding (4 bytes) | a (1024 bytes) | d (80 bytes) | c (4 bytes)

I've attached the broken assembly code generated by GCC at
r10-3906-g96eb7d7a642085f651e9940f0ee75568d7c4441d7.


More information about the Gcc-bugs mailing list