This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

From: "e.menezes at samsung dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 22 Oct 2014 16:36:13 +0000
Subject: [Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations
Auto-submitted: auto-generated
References: <bug-63503-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

--- Comment #14 from Evandro Menezes <e.menezes at samsung dot com> ---
Compiling the test-case above with just -O2, I can reproduce the code I
mentioned initially and easily measure the cycle count to run it on target
using perf.

The binary created by GCC runs in about 447000 user cycles and the one created
by LLVM, in about 499000 user cycles.  IOW, fused multiply-add is a win on A57.

Looking further why Geekbench's {D,S}GEMM performs worse with GCC than with
LLVM, both using "-Ofast", GCC fails to vectorize the loop in
"gemm_block_kernel", while LLVM does.

I should've done a more detailed analysis in this issue before submitting this
bug, sorry.

References:
- [Bug target/63503] New: [AArch64] A57 executes fused multiply-add poorly in some situations
  - From: e.menezes at samsung dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]