This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Fix PR target/28946
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, gcc patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 06 Sep 2006 11:22:37 +0100
- Subject: Re: [PATCH] Fix PR target/28946
- References: <Pine.LNX.firstname.lastname@example.org>
On Tue, 2006-09-05 at 16:59, Roger Sayle wrote:
> Hi Richard,
> On Tue, 5 Sep 2006, Richard Earnshaw wrote:
> > > Taking a small step backwards perhaps we're missing a completely
> > > different optimization here... if (((unsigned)x >> 5) != 0) could
> > > probably be better expanded/transformed as if ((unsigned) x >= 32),
> > > especially on pentium-4s where the cost of a shift is significant.
> > GCC for ARM already generates a near optimal sequence for this, namely:
> > fct:
> > movs r0, r0, lsr #5
> > beq .L2
> > b fct1
> > .L2:
> > b fct2
> > which uses the following pattern in the MD file
> > (define_insn "*shiftsi3_compare0_scratch"
> > [(set (reg:CC_NOOV CC_REGNUM)
> > (compare:CC_NOOV (match_operator:SI 3 "shift_operator"
> > [(match_operand:SI 1 "s_register_operand" "r")
> > (match_operand:SI 2 "arm_rhs_operand" "rM")])
> > (const_int 0)))
> > (clobber (match_scratch:SI 0 "=r"))]
> > "TARGET_ARM"
> > "mov%?s\\t%0, %1%S3"
> > so I'm not sure why the x86 can't do something similar.
> Indeed, an x86 pattern to recognize this case would be a good solution.
> > I'm concerned about trying to convert this to a comparison with a
> > constant. Non-small constants are very expensive to generate in Thumb
> > state.
> Roger's third law of optimization: For every pessimization, there is an
> equal and opposite optimization (and vice versa). So if the expression
> "x >= 32" is better written as "(x >> 5) != 0" on ARM, that would make a
> good missed optimization PR.
> I notice that these two equivalent expressions generate different
> code both on ARM and on Thumb. Unfortunately, I can't immediately
> tell that the shift is better, as thumb seems to generate/need? an
> additional comparison after the shift (not present in ARM mode):
> mainline -O2 -mthumb:
> lsr r0, r0, #5
> cmp r0, #0
> beq .L2
> cmp r0, #31
> bls .L2
> and mainline -O2
> movs r0, r0, lsr #5
> beq .L2
> cmp r0, #31
> bls .L2
> Perhaps there are instruction encoding or timing/scheduling issues that
> might make one more advantagous over the other, with and without -Os?
> For the RTL optimizers, at least, "x >= 32" should be easier to reason
> about and therefore a preferable canonical form (provided backends can
> emit it optimally as an insn variant).
> I need to read up on how thumb synthesizes small constants, and therefore
> which values are potentially problematic.
The thumb compare-immediate instruction only supports constants up to
255. I guess when we can use that instruction it will be preferable,
since it doesn't require a scratch that can be clobbered; however, when
it can't we then need either three instructions (load 1 into a register,
shift it to the right position do the compare) or two instructions plus
a literal pool entry.
The cmp #0 in the thumb case above is just a missed back-end
optimization (the lsr will set the Z flag just fine). Since gcc -mthumb
doesn't model the condition code register at all, I need to create
special cbranch variants for all these special cases. That's not
particularly nice, but the absence of non-flag-setting move and add
instructions means that there's no other way of doing it that reload can
cope with. However, this specific case isn't particularly bad since
there are no potential output-reload cases I need to deal with...