[Bug c/79359] Squaring a complex float gives inefficient code with or without -ffast-math
drraph at gmail dot com
gcc-bugzilla@gcc.gnu.org
Sun Feb 5 20:36:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79359
--- Comment #1 from Raphael C <drraph at gmail dot com> ---
In case it's of any help, here is an explanation of the assembly that ICC gives
with -fp-model strict.
R = real and C = complex. Here "x" just means don't know or unused.
We start with xmm0 = {x, x, C, R}.
The desired output is (R+iC)^2= R^2 + 2RCi - C^2
vmovsldup xmm1, xmm0 # xmm1 = { x, x, R, R }
vmovshdup xmm2, xmm0 # xmm2 = { x, x, C, C }
vshufps xmm3, xmm0, xmm0, 177 # xmm3 = { x, x, R, C }
vmulps xmm4, xmm1, xmm0 # xmm4 = { x, x, RC, RR }
vmulps xmm5, xmm2, xmm3 # xmm5 = { x, x, RC, CC }
vaddsubps xmm0, xmm4, xmm5 # xmm0 = { x, x, 2RC, RR-CC }
ret
More information about the Gcc-bugs
mailing list