This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] PR middle-end/18293: Fast-path expand_mult for 2^N
- From: Roger Sayle <roger at eyesopen dot com>
- To: Paul Schlie <schlie at comcast dot net>
- Cc: zack at codesourcery dot com, <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 8 Dec 2004 06:40:41 -0700 (MST)
- Subject: Re: [PATCH] PR middle-end/18293: Fast-path expand_mult for 2^N
On Wed, 8 Dec 2004, Paul Schlie wrote:
> > Roger Sayle wrote:
> >> Zack Weinberg wrote:
> >> [We do still strength-reduce x*2 all the way to x+x, right?]
> > Yes, we do. expand_shift contains the optimization that x<<1 is x+x,
> > and even that x<<2 is t=x+x,t+t on platforms where rtx_costs
> > of N additions is lower than the rtx_cost of a left shift by N
> > bits.
> - wouldn't ever expect it to be advantageous, even on 8-bit
> single-bit-shift per cycle targets.
On the Intel Pentium 4, an addition has a latency of a single cycle,
but a shift by a constant has a latency of four. This means that
multiplications by eight are faster implemented by three consecutive
additions than by a single shift instruction. The P4 is a significant
> However on a similar topic, has any further thought been given to
> (bool)(x & <pow2-const>) => (bool)(x >> <log2-const>)
> as at best it's a target specific optimization, and generally grossly
> counter productive on targets which do not shift efficiently? [and
> doubly painful if the (x >> <log2-const>) expression is needlessly
> promoted to int on small targets]?
I agree that this performance regression still needs to be fixed
for 4.0.0, but I've not yet confirmed that the RTL if-conversion
pass will catch this optimization when appropriate. Without it
we'll just be exchanging one performance regression for another.
I promise to look into it today. Can you remind me of the PR#?