This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Reduce the number of extraneous rtl when expandingmutliply
- From: Roger Sayle <roger at eyesopen dot com>
- To: Andrew Pinski <pinskia at physics dot uc dot edu>
- Cc: Steven Bosscher <stevenb at suse dot de>, <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 9 Nov 2004 20:41:39 -0700 (MST)
- Subject: Re: [PATCH] Reduce the number of extraneous rtl when expandingmutliply
On Tue, 9 Nov 2004, Andrew Pinski wrote:
> * expmed.c (expand_mult_const): If we have an alg_m as the first
> operation and there is only one other operation, then don't copy
> the register into a new one.
I don't think this is safe. I was just about to approve this patch
(under the condition that you mention PR middle-end/18293 in the
ChangeLog) when I it struck me that we can't use op0 directly as
the accumulator, as op0's pseudo mustn't be modified, but the
accum pseudo may potentially be modified in place.
Consider the possibility where op0 is the pseudo corresponding to
a user-declared register variable upon calling expand_mult_const.
If the second step in the "struct algorithm" is anything other than
alg_shift, when not optimizing, we can call force_operand with
accum (== op0) as target. This will destructively modify op0,
which may still be live outside of the call to expand_mult_const.
Interestingly, for the test case in the PR, the second (and final)
step in the algorithm is alg_shift, so this approach is safe...
I think a better solution to this particular PR, which would also
further improve compile-time performance and memory usage, would
be to special case multiplications by powers of two at the start
of expand_mult, just before the current call to choose_mult_variant,
and instead directly call expand_shift (LSHIFT_EXPR, ...). This
bypasses synth_mult (which is starting to show up in profiling)
for the common cases of multiplications by 1, 2, 4, 8, 16 etc...
expand_shift contains the necessary shift by addition optimizations
and should always be the solution ultimately chosen by synth_mult
for powers of two. It also has the added benefit of avoiding the
reg-reg copy to a temporary accumulator.
Sorry for not noticing the flaw in your original solution earlier.
Do you agree with the above analysis and/or benefits of the above