[Bug tree-optimization/106533] New: loop distribution not distributing inner loop (to memcpy) when perfect loop nest

Fri Aug 5 07:22:36 GMT 2022

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106533

            Bug ID: 106533
           Summary: loop distribution not distributing inner loop (to
                    memcpy) when perfect loop nest
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vineetg at rivosinc dot com
  Target Milestone: ---

Created attachment 53415
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53415&action=edit
test case

When tinkering with a slightly modified version of stream benchmark [1]
observed that Loop distribution is not distributing a nested copy loop into "0
loop and 1 libcall (memcpy)".

This is with test built with -O2, mainline gcc, as of June 14, 2022: commit
6abe341558ab

Actual test is attached but the loops look like following. Loop 7 (copy) is
distributed to memcpy in general case - but not if benchmark built with #define
COPYONLY (which elides loops 8,9,10 from compilation).

-->8---
    for (j=0; j<10000000; j++) {                            // 1
        a[j] = 1.0;
        b[j] = 2.0;
        c[j] = 0.0;
    }

    for (j = 0; j < 10000000; j++)                          // 2
        a[j] = 2.0E0 * a[j];

    for (k=0; k<10; k++)                                    // 3
    {
        for (j=0; j<10000000; j++) c[j] = a[j];             // 7
#ifndef COPYONLY
        for (j=0; j<10000000; j++) b[j] = scalar*c[j];      // 8
        for (j=0; j<10000000; j++) c[j] = a[j]+b[j];        // 9
        for (j=0; j<10000000; j++) a[j] = b[j]+scalar*c[j]; // 10
#endif
    }

    for (k=1; k<10; k++)
        for (j=0; j<4; j++)                                 // 6
            avgtime[j] = avgtime[j] + times[j][k];
            ..

    for (j=0; j<4; j++)                                     // 5
        avgtime[j] = avgtime[j]/(double)(NTIMES-1);
            ..

-->8---