[PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions

Uros Bizjak uros@kss-loka.si
Thu Aug 26 12:31:00 GMT 2004


Roger Sayle wrote:

>Ideally the OPTIMIZE_MODE_SWITCHING would be used to control
>the placement of the "fldcw" instructions that actually change
>the mode.  Currently, every single fistp is paired with its
>own fldcw save/restore pair.  This causes much more work than
>necessary for two sequential floors for example.
>
>  
>
That would be nice, but current algorithm for mode switching checks, 
which mode is needed for an instruction and then generates appropriate 
instruction sequence to switch that to that mode.

>At present OPTIMIZE_MODE_SWITCHING is used purely to avoid recalculating
>the bit pattern that needs to loaded into the x87 control world.
>
>Even in this respect things can be improved.  The OPTIMIZE_MODE_SWITCHING
>machinery could be improved to pass EMIT_MODE_SET both the new and the
>original modes that are being switched.  Currently, the i386 backend's
>emit_i387_cw_initialization inserts a "fstcw" instruction on every mode
>change.  This is only required on transitions from FP_CW_UNINITIALIZED
>or FP_CW_ANY which may have changed the FP control word.  Otherwise the
>"fstcw" that's used to determine the FLOOR mask can be reused to calculate
>the CEIL mask.
>
>  
>
EMIT_MODE_SET is used to generate fldcw! This is already working and it 
generates only one fldcw for multiple mode switches. To handle fstcw 
instructions (which are now blindly inserted by a conversion instruction 
itself) another machinery would be needed to track _mode_changes_. 
Unfortunatelly any insn without i387_cw attribute changes mode to 
I387_CW_ANY...

Perhaps this can be solved by emitting "fldcw (new_mode)" just before 
fistp/frndint insn only iff mode state is changed between
 I387_CW_{UNINITIALIZED, FLOOR, CEIL, TRUNC}. "call" or "asm" 
instruction should change mode to   I387_CW_UNINITIALIZED, which when 
hanged to from other 3 modes would emit "fldcw (old_mode)" in front of 
them. The problematic instruction could be rint() and nearbyint() 
instructions (which round to current mode), but this can be solved by 
changing their i387_cw attribute to I387_CW_STORED to force emitting of 
fldcw (old_mode) in front of them.

To summarise: a state is needed to track mode changes and:
-state should be initialized to "ANY".
-when state is changed from {ANY, UNINITIALIZED} to {FLOOR, CEIL, 
TRUNC}, EMIT_MODE_SET would be generated (as it is case with my patch), 
"and fldcw (new_mode);fistp/frndint" would be emitted.
- when state is changed from {FLOOR, CEIL, TRUNC} to {UNINITIALIZED, 
STORED} a "fldcw (old mode)" would be generated.
- mode can be changed to UNINITIALIZED state by "asm" or "call" pattern.
- mode can be changed to STORED state by rint or nearbyint insn.

I think it would work. Ah, and UNSPEC_FRNDINT inside exp* patterns would 
have to be changed to UNSPEC_FRNDINT_TRUNC.

>Even preserving the i386's current FP mode switching implementation,
>you'll need to add length attributes to your frndintxf4_* define_insns.
>These macro patterns are oversized which will confuse GCC's branch
>range prediction and other parts of the compiler.  The exisiting
>fistp patterns don't need this as the default "length" calculation
>already includes the fldcw/fistp/fldcw sequence.
>
>;; The (bounding maximum) length of an instruction in bytes.
>;; ??? fistp is in fact fldcw/fistp/fldcw sequence.  Later we may want
>;; to split it and compute proper length as for other insns.
>
>
>Your new define_insns have no length and only type attribute "fpspc".
>  
>
Yes... However, do you have any preferred value, which should be used?

>Finally, as hinted in the .md file comment above, it would be
>preferable if the new frndintxf4_* and the exisiting fix_trunc*
>patterns could become expanders themselves that decompose into
>their three constituent instructions.  This would allow better
>scheduling, and perhaps even the use of CSE/GCSE to optimize mode
>switching instead of optimize_mode_switching.
>  
>
Yes, but these instructions are (fp)->(int), whereas new instructions 
are (fp)->(fp). There is already a problem with trunc/btrunc issue 
(fixed by my patch). However, it is a long term plan to unify these 
expanders (at least lrint would need it, and (int)trunc(x) instructions 
would also benefit from this unification).

>The only show stopper, for me, with your patch are the missing
>"length" attributes, but clearly we can do much better than the
>current scheme, though I appreciate you're mostly following
>existing precedent.  These are really backend issues/preferences,
>so hopefully RTH will comment whether the patch is good as is,
>needs the lengths fixed, or needs more significant changes.
>  
>
As shown above, fldcw optimization could be implemented on top of my 
patch, and ATM unification of expanders would be too hard. I suggest to 
put patch in the mainline, then implement fldcw optimization and then 
implement unification of expanders.

>I hope this helps.
>  
>
Yes, thanks for suggestions!

Uros.



More information about the Gcc-patches mailing list