This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/78456] New: [6/7 Regression] 171.swim loops not interchanged, vectorized perf loss on aarch64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78456

            Bug ID: 78456
           Summary: [6/7 Regression] 171.swim loops not interchanged,
                    vectorized perf loss on aarch64
           Product: gcc
           Version: 6.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chris_s_jones at yahoo dot com
  Target Milestone: ---

Created attachment 40102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40102&action=edit
test case

When debugging a perf regression in 171.swim after moving to gfortran 6.2.0, I
notice that a nested loop in MAIN is not being interchanged, causing
sub-optimal vectorization in this case.  A simplified test case is attached
with an excerpt shown here:

      DO 3500 I = 1, MNMIN
       DO 4500 J = 1, MNMIN
         FOO = FOO + ABS(X0(I,J))
         BAR = BAR + ABS(X1(I,J))
         BAZ = BAZ + ABS(X2(I,J))
 4500  CONTINUE
       X1(I,I) = X1(I,I) 
     1  * ( MOD (I, 100) /100.)
 3500 CONTINUE

In 4.8.2, the compiler generates the sequence:
 230:   4cdf7e47        ld1     {v7.2d}, [x18], #16
 234:   4cdf7fd0        ld1     {v16.2d}, [x30], #16
 238:   4cdf7c34        ld1     {v20.2d}, [x1], #16
 23c:   4ee0f8f5        fabs    v21.2d, v7.2d
 240:   4ee0fa16        fabs    v22.2d, v16.2d
 244:   4ee0fa97        fabs    v23.2d, v20.2d
 248:   4e75d400        fadd    v0.2d, v0.2d, v21.2d
 24c:   4e76d421        fadd    v1.2d, v1.2d, v22.2d
 250:   4e77d442        fadd    v2.2d, v2.2d, v23.2d

In 6.2.0 and on the trunk, I'm seeing the values assembled from multiple
locations since the missing loop interchange means it doesn't use adjacent
values:
 2c8:   fc606834        ldr     d20, [x1,x0]
 2cc:   52800050        mov     w16, #0x2                       // #2
 2d0:   d294dc0e        mov     x14, #0xa6e0                    // #42720
 2d4:   6b14021f        cmp     w16, w20
 2d8:   fc606bd6        ldr     d22, [x30,x0]
 2dc:   fc6068f7        ldr     d23, [x7,x0]
 2e0:   8b0d0000        add     x0, x0, x13
 2e4:   fd69b835        ldr     d21, [x1,#21360]
 2e4:   fd69b835        ldr     d21, [x1,#21360]
 2e8:   6e0806b0        mov     v16.d[0], v21.d[0]
 2ec:   6e180690        mov     v16.d[1], v20.d[0]
 2f0:   4ee0fa19        fabs    v25.2d, v16.2d
 2f4:   fd69bbd8        ldr     d24, [x30,#21360]
 2f8:   6e080706        mov     v6.d[0], v24.d[0]
 2fc:   6e1806c6        mov     v6.d[1], v22.d[0]
 300:   4ee0f8db        fabs    v27.2d, v6.2d
 304:   fd69b8fd        ldr     d29, [x7,#21360]
 308:   6e0807a7        mov     v7.d[0], v29.d[0]
 30c:   6e1806e7        mov     v7.d[1], v23.d[0]
 310:   4ee0f8fe        fabs    v30.2d, v7.2d
 314:   4e79d75a        fadd    v26.2d, v26.2d, v25.2d
 318:   4e7bd79c        fadd    v28.2d, v28.2d, v27.2d
 31c:   4e7ed7ff        fadd    v31.2d, v31.2d, v30.2d

Flags used: -O3 -march=armv8-a+crypto -mcpu=cortex-a57+crypto -ffast-math
-funroll-loops -fvect-cost-model=unlimited -floop-interchange -g -c -o sink.o
sink.f

I understand -floop-interchange is now an alias for -floop-nest-optimize but am
wondering why this case wasn't interchanged.  The perf difference seems
significant for this case.  Manually swapping the loop indices in the source
causes the better code sequence to be generated.

Behaves similarly for gfortran 6.2.0 and trunk, built using:
configure 'CFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3'
'CXXFLAGS_FOR_TARGET=-march=armv8-a -mcpu=cortex-a57 -O3'
--prefix=/home/gcc-aarch64/6.2.0-linux-gnu --target=aarch64-linux-gnu
--with-sysroot=/home/gcc-aarch64/6.2.0-linux-gnu/sysroot --enable-__cxa_atexit
--with-gnu-as --with-gnu-ld --enable-shared --disable-libssp
--disable-libmudflap --enable-languages=c,c++,fortran --disable-libsanitizer
--disable-nls

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]