extend fwprop optimization
Wed Feb 27 18:37:00 GMT 2013
On Tue, Feb 26, 2013 at 2:59 AM, Steven Bosscher <firstname.lastname@example.org> wrote:
> On Tue, Feb 26, 2013 at 2:12 AM, Wei Mi wrote:
>> But it is not a good transformation unless we know insn split will
>> change a << (b & 63) to a << b; Here we want to see what the rtl looks
>> like after insn splitting in fwprop cost estimation (We call
>> split_insns in estimate_split_and_peephole(), but not to do insn
>> splitting actually in this phase).
> So you're splitting to find out that the shift is truncated to 5 or 6
> bits. That looks like what you really want is to have
> SHIFT_COUNT_TRUNCATED working for your target. It isn't defined for
> /* Define if shifts truncate the shift count which implies one can
> omit a sign-extension or zero-extension of a shift count.
> On i386, shifts do truncate the count. But bit test instructions
> take the modulo of the bit offset operand. */
> /* #define SHIFT_COUNT_TRUNCATED */
> Perhaps SHIFT_COUNT_TRUNCATED should be turned into a target hook, and
> take an rtx_code (or a pattern) to let the target decide whether a
> truncation is applicable or not.
> This is a target thing, so perhaps Uros has some ideas about this.
> I'm guessing cse.c would then handle your code transformation already,
> or can be made to do so without a lot of extra work, e.g. teach
> fold_rtx about such (shift (...) (and (...))) transformations that are
> really truncations.
Thanks for pointing out fold_rtx. I took a look at it and cse
yesterday, and I agreed with you fold_rtx could be extended to handle
the motivational case. But I still think fwprop extension could be
1. fold_rtx doesn't handling all the propagation-simplification tasks.
It only handles some typical cases. I think cse doesn't want to be
very cumbersome to include all the fwprop's functionality. fwprop
extension tries to generally handle the propagation-simplification
problem. I think cse contains fold_rtx partially because existing
fwprop and combine are not ideal. If fwprop could handle the general
case, cse could simply try to find common subexpression.
2. fold_rtx does the simplification only based on the current insn,
while fwprop extension tries to consider the def-uses group in a
whole. When all the uses could be propagated, we have the choices: a)
do all the propagations then delete the def insn, even if some
propagations may not be beneficial. b) only select beneficial
propagations and leave the def insn there. fwprop extension has a cost
model to choose which way to go.
What do you think?
More information about the Gcc-patches