This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

--- Comment #14 from Evandro Menezes <e.menezes at samsung dot com> ---
Compiling the test-case above with just -O2, I can reproduce the code I
mentioned initially and easily measure the cycle count to run it on target
using perf.

The binary created by GCC runs in about 447000 user cycles and the one created
by LLVM, in about 499000 user cycles.  IOW, fused multiply-add is a win on A57.

Looking further why Geekbench's {D,S}GEMM performs worse with GCC than with
LLVM, both using "-Ofast", GCC fails to vectorize the loop in
"gemm_block_kernel", while LLVM does.

I should've done a more detailed analysis in this issue before submitting this
bug, sorry.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]