This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR middle-end/18293: Fast-path expand_mult for 2^N

On Tue, 7 Dec 2004, Zack Weinberg wrote:
> Can I suggest instead a structure like this:
>    if (coeff == 0)
>      return const0_rtx;
>    else if (coeff == 1)
>      return op0;
>    else if (EXACT_POWER_OF_2_OR_ZERO_P (coeff))
>      return expand_shift (LSHIFT_EXPR, mode, op0,
>                           build_int_cst (NULL_TREE, floor_log2 (coeff)),
>                           target, unsignedp);
>    else
>      {
>        mult_cost = rtx_cost (gen_rtx_MULT (mode, op0, op1), SET);
>        if (choose_mult_variant (mode, coeff, &algorithm, &variant, mult_cost))
>          return expand_mult_const (mode, op0, coeff, target,
>                                    &algorithm, variant);
>      }
> I find this somewhat easier to read, and it is also arguably more
> efficient for the *0, *1 cases while being no less efficient for
> anything else.

Unfortunately, whilst admittedly being arguably more efficient for
the *0 and *1 cases, which should hopefully never occur in practice,
it's slower for the non-power-of-two cases, which should be far more

The intention was that the only potential slow-down to the original
code was the EXACT_POWER_OF_2_OR_ZERO_P test, so that for the majority
of multiplier values (3,5,6,7,9...) there's only a single additional
test on the critical path.  I think the name "EXACT_POWER_OF_2_OR_ZERO_P"
makes it clear that the condition includes zero.  I would also hope that
the multipliers zero and one wouldn't occur frequently enough to merit
testing them on the critial path.

On a more general note, though not a complaint of your suggestions,
is there a GNU style guideline for the use of "else", as above?
It turns out that the compiler has a much harder time of processing
"else if" structures, than when they are not required.  The initial
RTL we generate contains unreachable jumps, and far more labels and
basic blocks than strictly necessary.  Additionally, the C++ parser
performs much more memory allocation of scopes, and at the generic-level
we end up creating potentially unbounded trees that blow up the stack
on some platforms.  c.f. PR middle-end/12454.

In summary, GCC currently compiles

	if (coeff == 0)
	  return const0_rtx;
	if (coeff == 1)
	  return op0;

faster and with less memory allocation than

	if (coeff == 0)
	  return const0_rtx;
	else if (coeff == 1)
	  return op0;

[But perhaps my understanding of GCC's middle-end is unduly influencing
my preferred coding style].  My understanding was that we don't currently
mandate this apsect of programming style, instead leaving it to individual
author's discretion.

> [We do still strength-reduce x*2 all the way to x+x, right?]
Yes, we do.  expand_shift contains the optimization that x<<1 is x+x,
and even that x<<2 is t=x+x,t+t on platforms where rtx_costs
of N additions is lower than the rtx_cost of a left shift by N

I suspect that for gcc 4.1 and later we'll canonicalize to multiplication
at the tree-level (x+x -> x*2, x<<2 -> x*4, etc...) to simplify the task
of fold, tree-ssa and the loop optimizers.  We'll then lower these all
back to shifts and additions during RTL expansion.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]