This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Add floating point timings to rs6000_rtx_costs
- From: David Edelsohn <dje at watson dot ibm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Mon, 05 Jul 2004 19:28:47 -0400
- Subject: Re: [PATCH] Add floating point timings to rs6000_rtx_costs
- References: <Pine.LNX.4.44.0407041150560.8253-100000@www.eyesopen.com>
Why does the patch use the FP instruction latency for the cost?
The values will be used with the COSTS_N_INSNS() macro. I think the
values should be the latency of the class of instruction divided by either
the latency of a simple FP instruction or the latency of a simple FXP
instructions. In other words, it should be scaled with respect to the
cost of a single instruction. This is what my colleagues and I did for
the POWER4/POWER5 integer multiply.
Also, the ChangeLog has a typo referring to "ppc640_cost".
> One quick question from a middle-end optimization perspective: What
> is the behaviour of the rs6000's single precision FP operations when
> the values in floating point registers aren't previously rounded to
> single precision? Because "fmuls" is cheaper than "fmul", it might
> make sense to optimize (float)((double)x * (double)y) with -ffast-math.
PowerPC processors always hold floating point values in FPRs as
64-bit quantities. The value always can be used as an input to any
floating point operation. The operation is performed, the result rounded
to the appropriate precision, and the value stored in the result register.
If the operand has excess precision, it will be used in the operation.
Some processors implement an early exit of the single precision FP
multiply operation when the additional precision will not be visible.
David