This is the mail archive of the
mailing list for the GCC project.
Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics
> Quoting Roger Sayle <email@example.com>:
> > But now the best bit, for which I'll thank you in advance. In looking
> > at so much floating point code, it's become apparent that GCC's
> > reg-stack.c pass can do a much better job at shuffling floating point
> > registers. I was up late last night working on an improvement/rewrite
> > of change_stack that should reduce the number of fxch instructions we
> > generate, and replace more uses "fstp %st(x)" with "ffreep %st(0)"
> > (which is faster on AMD processors). I know there are PRs in this area,
> > so these changes might even make it into GCC v4.0.
> Perhaps a great number of fxch instructions can be reduced by loading x87
> registers in appropriate time, not at the beginning of the function
> At least this situation could easily be resolved:
> fldl 4(%esp)
> fldl 12(%esp)
> fxch %st(1)
> Instead of inserting fxch instruction, the position of two fldls could be exchanged.
In general it would be really cool if reg-stack was able to do
instruction scheduling in a way minimizing need for fxch (reordering
flds is not the only thing it can possibly do), but it is pretty
involved project. For single basic block the optimal ordering can be
found in linear time if I remember correctly, I dunno for whole CFG...