This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations
- From: "e.menezes at samsung dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 22 Oct 2014 16:36:13 +0000
- Subject: [Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations
- Auto-submitted: auto-generated
- References: <bug-63503-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #14 from Evandro Menezes <e.menezes at samsung dot com> ---
Compiling the test-case above with just -O2, I can reproduce the code I
mentioned initially and easily measure the cycle count to run it on target
using perf.
The binary created by GCC runs in about 447000 user cycles and the one created
by LLVM, in about 499000 user cycles. IOW, fused multiply-add is a win on A57.
Looking further why Geekbench's {D,S}GEMM performs worse with GCC than with
LLVM, both using "-Ofast", GCC fails to vectorize the loop in
"gemm_block_kernel", while LLVM does.
I should've done a more detailed analysis in this issue before submitting this
bug, sorry.