[Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
dominiq at lps dot ens.fr
gcc-bugzilla@gcc.gnu.org
Fri May 18 11:54:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395
--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 11:24:06 UTC ---
The assembly code for -O3 is almost the same for revisions 187182 and 187183.
However with '-O3 -ffast-math', revision 187182 gives for the loop
L12:
movapd %xmm2, %xmm1
L9:
movsd 8(%rsi), %xmm0
andpd %xmm3, %xmm0
comisd %xmm0, %xmm1
movapd %xmm0, %xmm2
maxsd %xmm1, %xmm2
cmovb %edx, %eax
addl $1, %edx
addq $8, %rsi
cmpl %ecx, %edx
jne L12
while revision 187183 gives
L6:
movapd %xmm2, %xmm1
L3:
movsd 8(%rsi), %xmm0
movapd %xmm1, %xmm3
andpd %xmm4, %xmm0
comisd %xmm0, %xmm1
movapd %xmm0, %xmm2
cmplesd %xmm1, %xmm2
cmovb %edx, %eax
addl $1, %edx
addq $8, %rsi
cmpl %ecx, %edx
andpd %xmm2, %xmm3
andnpd %xmm0, %xmm2
orpd %xmm3, %xmm2
jne L6
(for the later -ffast-math only change ucomisd to comisd).
More information about the Gcc-bugs
mailing list