This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/69710] performance issue with SP Linpack with Autovectorization


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69710

--- Comment #1 from Doug Gilmore <doug.gilmore at imgtec dot com> ---
Created attachment 37615
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37615&action=edit
daxpy for DP (previous was for SP)

Compilation example:

arm-linux-gnueabihf-gcc -O3 -save-temps daxpy.c saxpy.c -c -mfpu=neon  -c
-fdump-tree-{vect,ivopts}-{verbose,details} -fdump-tree-{slp1,optimized}
-fsched-verbose=9 \
-fdump-rtl-sched{1,2} -marm  -funsafe-math-optimizations -funroll-all-loops

Note that Neon does not support DP, thus daxpy.s won't contain
autovectorized code.

I haven't built a ToT compiler for aarch64-linux-gnu, but I suspect
that you will see autovectorized code in daxpy.s in which reasonable
schedules are being produced (loads are being moved above stores).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]