[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

e.menezes at samsung dot com gcc-bugzilla@gcc.gnu.org
Tue Oct 28 20:57:00 GMT 2014


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

--- Comment #21 from Evandro <e.menezes at samsung dot com> ---
(In reply to ramana.radhakrishnan@arm.com from comment #20)
> What's the kind of performance delta you see if you managed to unroll 
> the loop just a wee bit ? Probably not much looking at the code produced 
> here.

Comparing the cycle counts on Juno when running the program from the matrix
multiplication test above built with -Ofast and unrolling:

-fno-unroll-loops: 592000
-funroll-loops --param max-unroll-times=2: 594000
-funroll-loops --param max-unroll-times=4: 592000
-funroll-loops: 590000 (implies --param max-unroll-times=8)
-funroll-loops --param max-unroll-times=16: 581000

It seems to me that without effective iv-opt in place, loops have to be
unrolled too aggressively to make any difference in this case, greatly
sacrificing code size.



More information about the Gcc-bugs mailing list