This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Optimizing hypot()
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 24 Jul 2003 00:27:50 +0200 (CEST)
- Subject: Optimizing hypot()
Hi!
Looking at the asm output for the following C snipped I wonder why the two
functions dont end up the same.
Compiled with gcc-3.4 -S -O3 -ffast-math -march=pentium3 -msse -mfpmath=sse
#include <math.h>
float hypot1(float x, float y)
{
return sqrtf(x*x+y*y);
}
float hypot2(float x, float y)
{
return hypotf(x, y);
}
The first is expanded to partially sse and a fsqrt, the second is probably
inlined from libc(?) with only traditional fpu instructions.
If I remove the #include <math.h>, the first function is expanded to fully
sse including sqrtss, while the second one is not recognized as builtin.
Also the second one looks like
hypot2:
pushl %ebp #
movl %esp, %ebp #,
subl $24, %esp #,
flds 12(%ebp) # y
fstpl 8(%esp) #
flds 8(%ebp) # x
fstpl (%esp) #
call hypotf #
cvtsi2ss %eax, %xmm1 # tmp64,
movss %xmm1, -4(%ebp) #,
flds -4(%ebp) #
leave
ret
Note the bogous temporary xmm1 use at the end.
I think hypot2 could be improved much, if it is recognized as builtin and
optimized for sse/sse2. Is someone working on such transformations?
Thanks,
Richard.