This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add floating point timings to rs6000_rtx_costs


	Why does the patch use the FP instruction latency for the cost?
The values will be used with the COSTS_N_INSNS() macro.  I think the
values should be the latency of the class of instruction divided by either
the latency of a simple FP instruction or the latency of a simple FXP
instructions.  In other words, it should be scaled with respect to the
cost of a single instruction.  This is what my colleagues and I did for
the POWER4/POWER5 integer multiply.

	Also, the ChangeLog has a typo referring to "ppc640_cost".

> One quick question from a middle-end optimization perspective:  What
> is the behaviour of the rs6000's single precision FP operations when
> the values in floating point registers aren't previously rounded to
> single precision?  Because "fmuls" is cheaper than "fmul", it might
> make sense to optimize (float)((double)x * (double)y) with -ffast-math.

	PowerPC processors always hold floating point values in FPRs as
64-bit quantities.  The value always can be used as an input to any
floating point operation.  The operation is performed, the result rounded
to the appropriate precision, and the value stored in the result register.
If the operand has excess precision, it will be used in the operation.
Some processors implement an early exit of the single precision FP
multiply operation when the additional precision will not be visible.

David


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]