This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,RFC] Disallow reordering of x87 insns while scheduling


On Sat, 16 Apr 2005, Richard Henderson wrote:
> On Sat, Apr 16, 2005 at 08:58:46PM -0600, Roger Sayle wrote:
> > This is clearly a problem for code size, so perhaps the patch below
> > should be restricted to -Os.  However, on many members of the IA-32
> > family, there's also performance consequences for these fxch insns.
>
> Are there?  AFAIK, from Pentium 1 on they are pretty much free.

To take Pentium 1 as an example, whilst FXCH instructions can be free
under special conditions, i.e. they must appear between *two* suitable
floating point instructions, it's not uncommon that that the instruction
either immediately before or after is either an integer or non-pairable
FP insn.

In the small example I gave:

foo:    flds    b
        fadds   a
        fldz
        fucompp

vs.

foo:    flds    b
        fldz
        fxch    %st(1)
        fadds   a
        fxch    %st(1)
        fucompp

The fldz instruction on Pentium1 is not pairable, so the first fxch
costs a single cycle, whilst the second fxch is indeed free.  Instead,
loading the zero late hides the latency of the addition, so the first
sequence is faster on Pentium1 (Interestingly this is what GCC generates
with -mtune=pentium but for unrelated issue reasons).


Whilst the common belief is that FXCH is free, empirical evidence
from analysis of compiler quality on numerical benchmarks shows the
strange trend that the number of FXCH insns approximates the quality
of x87 code.  For example, the following analysis of a old/fixed
GCC regression by the ATLAS folks on relatively recent IA-32 CPUs
(Pentium3 and Athlon where FXCH should be free).
http://www.cs.utk.edu/~rwhaley/ATLAS/gcc30.html


Of course, the FXCH count hypothesis, if confirmed, doesn't necessarily
indicate that the FXCH isn't free, but is perhaps symptomatic of poor
register allocation, cache size effects from larger code, reduction in
the size of the out-of-order execulation lookahead buffer or overly
aggressive scheduling.


The real problem with the example above is that GCC's current DFA
scheduler is (i) greedy (reorders insns to schedule available insns
early even if there's no net benefit) and (ii) doesn't have a cost
model to reflect that it's greedy choice is often slightly inferior.
Clearly, there is *a* cost for an fxch, even if only when optimizing
for size or targeting Pentium1 (or earlier), so it can't be claimed
that its always free or that this is a problem doesn't need to be
addressed.


I concede that scheduling FP instructions should/could help Pentium1,
but equally, GCC's poor scheduling can also hurt it's performance.
Unfortunately, I don't have an old Pentium1 machine to benchmark on,
and given their age and Intel's FDIV bug recall, I'm note sure how
many are left in circulation as a fraction of GCC's target demographic.

Let me know how you propose we should proceed?

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]