This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR target/28946
- From: Roger Sayle <roger at eyesopen dot com>
- To: Richard Earnshaw <rearnsha at arm dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, gcc patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 5 Sep 2006 09:59:09 -0600 (MDT)
- Subject: Re: [PATCH] Fix PR target/28946
Hi Richard,
On Tue, 5 Sep 2006, Richard Earnshaw wrote:
> > Taking a small step backwards perhaps we're missing a completely
> > different optimization here... if (((unsigned)x >> 5) != 0) could
> > probably be better expanded/transformed as if ((unsigned) x >= 32),
> > especially on pentium-4s where the cost of a shift is significant.
>
> GCC for ARM already generates a near optimal sequence for this, namely:
>
> fct:
> movs r0, r0, lsr #5
> beq .L2
> b fct1
> .L2:
> b fct2
>
> which uses the following pattern in the MD file
>
> (define_insn "*shiftsi3_compare0_scratch"
> [(set (reg:CC_NOOV CC_REGNUM)
> (compare:CC_NOOV (match_operator:SI 3 "shift_operator"
> [(match_operand:SI 1 "s_register_operand" "r")
> (match_operand:SI 2 "arm_rhs_operand" "rM")])
> (const_int 0)))
> (clobber (match_scratch:SI 0 "=r"))]
> "TARGET_ARM"
> "mov%?s\\t%0, %1%S3"
>
> so I'm not sure why the x86 can't do something similar.
Indeed, an x86 pattern to recognize this case would be a good solution.
> I'm concerned about trying to convert this to a comparison with a
> constant. Non-small constants are very expensive to generate in Thumb
> state.
Interesting.
Roger's third law of optimization: For every pessimization, there is an
equal and opposite optimization (and vice versa). So if the expression
"x >= 32" is better written as "(x >> 5) != 0" on ARM, that would make a
good missed optimization PR.
I notice that these two equivalent expressions generate different
code both on ARM and on Thumb. Unfortunately, I can't immediately
tell that the shift is better, as thumb seems to generate/need? an
additional comparison after the shift (not present in ARM mode):
mainline -O2 -mthumb:
lsr r0, r0, #5
cmp r0, #0
beq .L2
vs.
cmp r0, #31
bls .L2
and mainline -O2
movs r0, r0, lsr #5
beq .L2
vs.
cmp r0, #31
bls .L2
Perhaps there are instruction encoding or timing/scheduling issues that
might make one more advantagous over the other, with and without -Os?
For the RTL optimizers, at least, "x >= 32" should be easier to reason
about and therefore a preferable canonical form (provided backends can
emit it optimally as an insn variant).
I need to read up on how thumb synthesizes small constants, and therefore
which values are potentially problematic.
Roger
--