This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Committed] Don't hoist FP constants on x87


On 8/25/06, Roger Sayle <roger@eyesopen.com> wrote:

It turns out that there's already code to penalize speculatively
executing assignments to stack registers in ifcvt.c's function
cheap_bb_rtx_cost_p which adds COSTS_N_INSNS(1) to the cost of
the move.  This is clearly insufficient, and the patch below which
prevents speculatively executing x87 instructions should resolve
the PR.

I wonder could you please repeat your timings with this patch and
see if this works for you?

With your patch the "if" part indeed compiles to the "optimal" code (only one fldz). Execution time also improves to the same time as in case of gcse.c change.


The more fundamental problem is that if-conversion needs to be made
smarter about when speculative execution is performance win.  For
a start it should pay attention to the branch prediction probabilities.
However, a more subtle fix for this PR may also be to introduce a new
UNCOND_BRANCH_COST target macro.  The decision of whether to speculatively
execute the "else" block is based upon whether it is cheaper than the
unconditional branch that it saves.  Currently the code uses BRANCH_COST,
which is average cost of a conditional branch, which is often much
more expensive than an unconditional branch which should perfectly
predicted by the CPU.  Indeed unconditional jumps may often be
completely eliminated by the compiler's basic block reordering.

Hence, for the combined costs of "fldz" and "fstp", which are currently
estimated to be COSTS_N_INSNS(2) total, is currently <= BRANCH_COST, which
may be as high as COSTS_N_INSNS(5) when tuning for the athlon!  Using
UNCOND_BRANCH_COST as a limit, which should be around COSTS_N_INSNS(1)
on x86 would help.  Of course improving the middle-end's rtx_cost
interface so that the fstp (pop) insn has the correct latency of six on
P4 would also prevent us from making these poor speculation choices.

I'll admit I should also improve IS_STACK_MODE to better account for
SSE.

Let me know what you think.

Perhaps we can introduce costs of FP loads/moves/stores into ix86_rtx_costs() and calculate correct costs there? Currently, ix86_rtx_costs() handles only loads of FP constants, costs of FPmoves (and SSE moves) are simply ignored, although they are clearly defined in struct processor_cost for every target. By introducing correct costs, we could also calculate exact target dependant cost of compensating fstp insn (currently, we blindly add COSTS_N_INSN(1)).

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]