This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: fwprop and CSE const anchor opt
Thank you very much. This was very informative.
Richard Sandiford writes:
> If we have an instruction:
>
> A: (set (reg Z) (plus (reg X) (const_int 0xdeadbeef)))
>
> we will need to use something like:
>
> (set (reg Y) (const_int 0xdead0000))
> (set (reg Y) (ior (reg Y) (const_int 0xbeef)))
> B: (set (reg Z) (plus (reg X) (reg Y)))
>
> But if A is in a loop, the Y loads can be hoisted, and the cost
> of A is effectively the same as the cost of B. In other words,
> the (il)legitimacy of the constant operand doesn't really matter.
My guess is that A not being a recognizable insn, this is relevant at RTL
expansion. Is this correct?
> In summary, the current costs generally work because:
>
> (a) We _usually_ only apply costs to arbitrary instructions
> (rather than candidate instruction patterns) before
> loop optimisation.
I don't think I understand this point. I see the part that the cost is
typically queried before loop optimization but I don't understand the
distinction between "arbitrary instructions" and "candidate instruction
patterns". Can you please explain the difference?
> (b) It doesn't matter what we return for invalid candidate
> instruction patterns, because recog will reject them anyway.
>
> So I suppose my next question is: are you seeing this problem with cse1
> or cse2? The reasoning behind the zero cost might still be valid for
> REG_EQUAL notes in cse1. However, it's probably not right for cse2,
> which runs after loop hoisting.
I am seeing it with both, so at least at cse2 we could do it with this.
> Perhaps we could add some kind of context parameter to rtx_costs
> to choose between the hoisting and non-hoisting cost. As well as
> helping with your case, it could let us use the non-hoisting cost
> before loop optimisation in cases where the insn isn't going to
> go in a loop. The drawback is that we then have to replicate
> even more of the .md file in rtx_costs.
>
> Alternatively, perhaps we could just assume that rtx_costs always
> returns the hoisted cost when optimising for speed, in which case
> I think your alternative solution would be theoretically correct
> (i.e. not a hack ;)).
OK, I think I am going to propose this in the patch then. It might still be
interesting to experiment with providing more context to rtx_costs.
> E.g. suppose we're deciding how to implement an in-loop multiplication.
> We calculate the cost of a multiplication instruction vs. the cost of a
> shift/add sequence, but we don't consider whether any of the backend-specific
> shift/add set-up instructions could be hoisted. This would lead to us
> using multiplication insns in cases where we don't want to.
>
> (This was one of the most common situations in which the zero cost helped.)
I am not sure I understand this. Why would we decide to hoist suboperations
of a multiplication? If it is loop-variant then even the suboperations are
loop-variant whereas if it is loop-invariant then we can hoist the whole
operation. What am I missing?
Adam