This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR 15492: floating-point arguments are loaded too early to x87 stack


* Uros Bizjak:

> Current (Aug. 19) mainline CVS gcc generates:
>
> with "gcc -O2 -fomit-frame-pointer":
> test:
>        fldl    4(%esp)
>        fldl    12(%esp)
>        fxch    %st(1)
>        fmul    %st(0), %st
>        fxch    %st(1)
>        fmul    %st(0), %st
>        faddp   %st, %st(1)
>        ret
>
> and without optimization, "gcc -fomit-frame-pointer":
> test:
>        fldl    4(%esp)
>        fmull   4(%esp)
>        fldl    12(%esp)
>        fmull   12(%esp)
>        faddp   %st, %st(1)
>        ret
>
> According to "How to optimize for the Pentium family of microprocessors" 
> by Agner Fog, "fld r/m32/m64" consumes one clock cycle on P1, PMMX, 
> PPRO, P2, P3 and P4 in all its forms. As it is shown, gcc actually 
> de-optimizes code with "-O2".

This is simply not true.  The code generated with -O2 actually runs
faster, even though it contains more instructions.

> This shows, how serious problem could be:
> gcc -ffast-math -S -O2 almabench.c
> grep fxch almabench.s | wc -l
>    114

On modern x86 CPUs, fxch is executed at instruction decoding time by
renaming floating point registers.  It only costs execution time if
the instruction decoder cannot keep up with the remaining pipeline (or
if the working set exceeds the size of the processor's trace cache, if
there is one).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]