[PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions
Uros Bizjak
uros@kss-loka.si
Thu Aug 26 12:31:00 GMT 2004
- Previous message (by thread): [PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions
- Next message (by thread): [PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Roger Sayle wrote:
>Ideally the OPTIMIZE_MODE_SWITCHING would be used to control
>the placement of the "fldcw" instructions that actually change
>the mode. Currently, every single fistp is paired with its
>own fldcw save/restore pair. This causes much more work than
>necessary for two sequential floors for example.
>
>
>
That would be nice, but current algorithm for mode switching checks,
which mode is needed for an instruction and then generates appropriate
instruction sequence to switch that to that mode.
>At present OPTIMIZE_MODE_SWITCHING is used purely to avoid recalculating
>the bit pattern that needs to loaded into the x87 control world.
>
>Even in this respect things can be improved. The OPTIMIZE_MODE_SWITCHING
>machinery could be improved to pass EMIT_MODE_SET both the new and the
>original modes that are being switched. Currently, the i386 backend's
>emit_i387_cw_initialization inserts a "fstcw" instruction on every mode
>change. This is only required on transitions from FP_CW_UNINITIALIZED
>or FP_CW_ANY which may have changed the FP control word. Otherwise the
>"fstcw" that's used to determine the FLOOR mask can be reused to calculate
>the CEIL mask.
>
>
>
EMIT_MODE_SET is used to generate fldcw! This is already working and it
generates only one fldcw for multiple mode switches. To handle fstcw
instructions (which are now blindly inserted by a conversion instruction
itself) another machinery would be needed to track _mode_changes_.
Unfortunatelly any insn without i387_cw attribute changes mode to
I387_CW_ANY...
Perhaps this can be solved by emitting "fldcw (new_mode)" just before
fistp/frndint insn only iff mode state is changed between
I387_CW_{UNINITIALIZED, FLOOR, CEIL, TRUNC}. "call" or "asm"
instruction should change mode to I387_CW_UNINITIALIZED, which when
hanged to from other 3 modes would emit "fldcw (old_mode)" in front of
them. The problematic instruction could be rint() and nearbyint()
instructions (which round to current mode), but this can be solved by
changing their i387_cw attribute to I387_CW_STORED to force emitting of
fldcw (old_mode) in front of them.
To summarise: a state is needed to track mode changes and:
-state should be initialized to "ANY".
-when state is changed from {ANY, UNINITIALIZED} to {FLOOR, CEIL,
TRUNC}, EMIT_MODE_SET would be generated (as it is case with my patch),
"and fldcw (new_mode);fistp/frndint" would be emitted.
- when state is changed from {FLOOR, CEIL, TRUNC} to {UNINITIALIZED,
STORED} a "fldcw (old mode)" would be generated.
- mode can be changed to UNINITIALIZED state by "asm" or "call" pattern.
- mode can be changed to STORED state by rint or nearbyint insn.
I think it would work. Ah, and UNSPEC_FRNDINT inside exp* patterns would
have to be changed to UNSPEC_FRNDINT_TRUNC.
>Even preserving the i386's current FP mode switching implementation,
>you'll need to add length attributes to your frndintxf4_* define_insns.
>These macro patterns are oversized which will confuse GCC's branch
>range prediction and other parts of the compiler. The exisiting
>fistp patterns don't need this as the default "length" calculation
>already includes the fldcw/fistp/fldcw sequence.
>
>;; The (bounding maximum) length of an instruction in bytes.
>;; ??? fistp is in fact fldcw/fistp/fldcw sequence. Later we may want
>;; to split it and compute proper length as for other insns.
>
>
>Your new define_insns have no length and only type attribute "fpspc".
>
>
Yes... However, do you have any preferred value, which should be used?
>Finally, as hinted in the .md file comment above, it would be
>preferable if the new frndintxf4_* and the exisiting fix_trunc*
>patterns could become expanders themselves that decompose into
>their three constituent instructions. This would allow better
>scheduling, and perhaps even the use of CSE/GCSE to optimize mode
>switching instead of optimize_mode_switching.
>
>
Yes, but these instructions are (fp)->(int), whereas new instructions
are (fp)->(fp). There is already a problem with trunc/btrunc issue
(fixed by my patch). However, it is a long term plan to unify these
expanders (at least lrint would need it, and (int)trunc(x) instructions
would also benefit from this unification).
>The only show stopper, for me, with your patch are the missing
>"length" attributes, but clearly we can do much better than the
>current scheme, though I appreciate you're mostly following
>existing precedent. These are really backend issues/preferences,
>so hopefully RTH will comment whether the patch is good as is,
>needs the lengths fixed, or needs more significant changes.
>
>
As shown above, fldcw optimization could be implemented on top of my
patch, and ATM unification of expanders would be too hard. I suggest to
put patch in the mainline, then implement fldcw optimization and then
implement unification of expanders.
>I hope this helps.
>
>
Yes, thanks for suggestions!
Uros.
- Previous message (by thread): [PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions
- Next message (by thread): [PATCH] implement rint{,f,l}, floor{,f,l}. ceil{,f,l}, trunc{,f,l} and nearbyint{,f,l} as x87 built-in functions
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the Gcc-patches
mailing list