This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Optimizing hypot()


Hi!

Looking at the asm output for the following C snipped I wonder why the two
functions dont end up the same.

Compiled with gcc-3.4 -S -O3 -ffast-math -march=pentium3 -msse -mfpmath=sse

#include <math.h>

float hypot1(float x, float y)
{
        return sqrtf(x*x+y*y);
}

float hypot2(float x, float y)
{
        return hypotf(x, y);
}

The first is expanded to partially sse and a fsqrt, the second is probably
inlined from libc(?) with only traditional fpu instructions.

If I remove the #include <math.h>, the first function is expanded to fully
sse including sqrtss, while the second one is not recognized as builtin.

Also the second one looks like

hypot2:
        pushl   %ebp    #
        movl    %esp, %ebp      #,
        subl    $24, %esp       #,
        flds    12(%ebp)        # y
        fstpl   8(%esp) #
        flds    8(%ebp) # x
        fstpl   (%esp)  #
        call    hypotf  #
        cvtsi2ss        %eax, %xmm1     # tmp64,
        movss   %xmm1, -4(%ebp) #,
        flds    -4(%ebp)        #
        leave
        ret

Note the bogous temporary xmm1 use at the end.

I think hypot2 could be improved much, if it is recognized as builtin and
optimized for sse/sse2. Is someone working on such transformations?

Thanks,

Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]