This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Tweak signed division by power of two


On Tue, 13 Jul 2004, Richard Henderson wrote:
> On Tue, Jul 13, 2004 at 11:48:46AM -0600, Roger Sayle wrote:
> > 	* expmed.c (expand_sdiv_pow2): New function to expand signed division
> > 	by a positive power of two, split out from expand_divmod.  Provide
> > 	an alternate implementation when shifts are expensive.  Lower the
> > 	threshold for using a branchless implementation to BRANCH_COST >= 2.
> > 	(expand_divmod): Call expand_sdiv_pow2 for suitable divisions.
>
> Hum.  There's a lot of tweaking that could be done here.
>
> > +       if (shift_cost[mode][ushift] > COSTS_N_INSNS (1))
>
> (1) AND with non-register operands is not necessarily cost 1.
>     In particular, you'd want to consider the cost of loading
>     the constant.

Indeed.  As I mentioned in my e-mail, this is why I used
"> COSTS_N_INSNS (1)" and not ">= COSTS_N_INSNS (1)", as although both
the RSHIFT and the AND both have immediate operands, most architectures
are guaranteed to handle the smaller operands for the shift in a single
instruction.

Ahh, looking at alpha.c's alpha_rtx_cost_data, I see the concern.
The EV4, has COSTS_N_INSNS(2) for a shift, and the EV5 has
COSTS_N_INSNS(1)+1.  So my "> COSTS_N_INSNS(1)" isn't equivalent
to ">= COSTS_N_INSNS(2)" as I'd assumed.  Given that the Pentium4
has a shift cost of COSTS_N_INSNS(4), would you be happy changing
the above test to:

   if (shift_cost[mode][ushift] >= COSTS_N_INSNS (2))


This achieves perfect selectivity between the (b) and (c) cases
in your e-mail.  With this change, on alpha we'll always choose
the faster of (b) and (c), and when the latencies are equivalent,
we'll choose the shorter.


> > +   label = gen_label_rtx ();
> > +   temp = copy_to_mode_reg (mode, op0);
> > +   do_cmp_and_jump (temp, const0_rtx, GE, mode, label);
> > +   expand_inc (temp, GEN_INT (d - 1));
> > +   emit_label (label);
> > +   return expand_shift (RSHIFT_EXPR, mode, temp, shift, NULL_RTX, 0);
>
> (2) This sequence is particularly good for conditional-move targets.
>     For /1024, on alpha, we can chose from...
> Given equivalent latencies, I'd prefer the smallest alternative.

This is trickier.  The huge BRANCH_COST of six means that we've always
generated a straight-line sequence as initial RTL, and I doubt that
if-conversion or combine is anywhere near clever enough to recover the
conditional move sequence.

I'll be happy to investigate a follow-up patch to check whether the
target has a suitable/cheap conditional move instruction, and use
your proposed (a) sequence instead.  For the EV4, EV5 and EV6 timings
and sizes you gave, the conditional move is always preferable.  The
clean-up of moving this functionality out of expand_divmod into the
new expand_sdiv_pow2 will make such experimentation much cleaner.

With the proposed ammendment above, this patch should now generate
alpha/x86 code atleast as good as originally, and in the case of EV4
and Pentium4 even better than we were doing before.

Ok for mainline with the above change?

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]