This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PR 15492: floating-point arguments are loaded too early to x87stack


Florian Weimer wrote:


On modern x86 CPUs, fxch is executed at instruction decoding time by
renaming floating point registers. It only costs execution time if
the instruction decoder cannot keep up with the remaining pipeline (or
if the working set exceeds the size of the processor's trace cache, if
there is one).


fxch is only part of problem. Another problem is, that more fp-stack registers are used in -O2 case. Consider another example from PR 15492:

double test1 (double a, int x, double b, int y, double c)
{
       return sin (c) + tan (b) / sqrt (a) + x * fabs (b) + y;
}

test1 ( with -O2 ):
       fldl    4(%esp)        #st(0)
       fsqrt
       fldl    16(%esp)    #st(0) st(1)
       fld     %st(0)        #st(0) st(1) st(2)
       fldl    28(%esp)    #st(0) st(1) st(2) st(3)
       fxch    %st(2)
       fabs
       fxch    %st(3)
       fdivrl  .LC1        #st(0) st(1) st(2) st(3)
       fxch    %st(1)
       fptan            #st(0) st(1) st(2) st(3) st(4)
       fstp    %st(0)        #st(0) st(1) st(2) st(3)
       fxch    %st(2)
       fsin            #st(0) st(1) st(2) st(3)
       fxch    %st(2)
       fmulp   %st, %st(1)    #st(0) st(1) st(2)
       faddp   %st, %st(1)    #st(0) st(1)
       fildl   12(%esp)    #st(0) st(1) st(2)
       fmulp   %st, %st(2)    #st(0) st(1)
       faddp   %st, %st(1)    #st(0)
       fildl   24(%esp)    #st(0) st(1)
       faddp   %st, %st(1)    #st(0)
       ret

test1 ( without -O2 ):
       fldl    28(%esp)    #st(0)
       fsin
       fldl    16(%esp)    #st(0) st(1)
       fptan            #st(0) st(1) st(2)
       fstp    %st(0)        #st(0) st(1)
       fldl    4(%esp)        #st(0) st(1) st(2)
       fsqrt
       fdivrp  %st, %st(1)    #st(0) st(1)
       faddp   %st, %st(1)    #st(0)
       fildl   12(%esp)    #st(0) st(1)
       fldl    16(%esp)    #st(0) st(1) st(2)
       fabs
       fmulp   %st, %st(1)    #st(0) st(1)
       faddp   %st, %st(1)    #st(0)
       fildl   24(%esp)    #st(0) st(1)
       faddp   %st, %st(1)    #st(0)
       ret

If functions are more complex, more stack space is wasted. There was no problem with fadd, fsub, fmul and fdiv insn, because they can operate with one of their arguments off stack. OTOH, arguments to fsin, fcos, etc are always loaded at the beginning of the function. In optimized (-O2) case, argument to fsin insn is filling up fp-stack slot for half of the function calculation, but it could be loaded just in front of fsin insn.

Perhaps this problem could be solved by introducing an expander for fsin & co. in i386.md to expand "fsin (memory)" pattern into "arg load; sin" sequence? Then combiner could combine "arg load; sin" sequence into "*fsin_m" pattern when appropriate and this new expander would expand this pseudo-insn back to "fldl;fsin". This would IMHO move a load in front of fsin when appropriate.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]