This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/69710] performance issue with SP Linpack with Autovectorization
- From: "doug.gilmore at imgtec dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 06 Feb 2016 21:45:40 +0000
- Subject: [Bug tree-optimization/69710] performance issue with SP Linpack with Autovectorization
- Auto-submitted: auto-generated
- References: <bug-69710-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69710
--- Comment #1 from Doug Gilmore <doug.gilmore at imgtec dot com> ---
Created attachment 37615
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37615&action=edit
daxpy for DP (previous was for SP)
Compilation example:
arm-linux-gnueabihf-gcc -O3 -save-temps daxpy.c saxpy.c -c -mfpu=neon -c
-fdump-tree-{vect,ivopts}-{verbose,details} -fdump-tree-{slp1,optimized}
-fsched-verbose=9 \
-fdump-rtl-sched{1,2} -marm -funsafe-math-optimizations -funroll-all-loops
Note that Neon does not support DP, thus daxpy.s won't contain
autovectorized code.
I haven't built a ToT compiler for aarch64-linux-gnu, but I suspect
that you will see autovectorized code in daxpy.s in which reasonable
schedules are being produced (loads are being moved above stores).