This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR target/28946

Hi Richard,

On Tue, 5 Sep 2006, Richard Earnshaw wrote:
> > Taking a small step backwards perhaps we're missing a completely
> > different optimization here...  if (((unsigned)x >> 5) != 0) could
> > probably be better expanded/transformed as if ((unsigned) x >= 32),
> > especially on pentium-4s where the cost of a shift is significant.
> GCC for ARM already generates a near optimal sequence for this, namely:
> fct:
>         movs    r0, r0, lsr #5
>         beq     .L2
>         b       fct1
> .L2:
>         b       fct2
> which uses the following pattern in the MD file
> (define_insn "*shiftsi3_compare0_scratch"
>   [(set (reg:CC_NOOV CC_REGNUM)
> 	(compare:CC_NOOV (match_operator:SI 3 "shift_operator"
> 			  [(match_operand:SI 1 "s_register_operand" "r")
> 			   (match_operand:SI 2 "arm_rhs_operand" "rM")])
> 			 (const_int 0)))
>    (clobber (match_scratch:SI 0 "=r"))]
>   "mov%?s\\t%0, %1%S3"
> so I'm not sure why the x86 can't do something similar.

Indeed, an x86 pattern to recognize this case would be a good solution.

> I'm concerned about trying to convert this to a comparison with a
> constant.  Non-small constants are very expensive to generate in Thumb
> state.


Roger's third law of optimization: For every pessimization, there is an
equal and opposite optimization (and vice versa).  So if the expression
"x >= 32" is better written as "(x >> 5) != 0" on ARM, that would make a
good missed optimization PR.

I notice that these two equivalent expressions generate different
code both on ARM and on Thumb.  Unfortunately, I can't immediately
tell that the shift is better, as thumb seems to generate/need? an
additional comparison after the shift (not present in ARM mode):

mainline -O2 -mthumb:

        lsr     r0, r0, #5
        cmp     r0, #0
        beq     .L2
        cmp     r0, #31
        bls     .L2

and mainline -O2

        movs    r0, r0, lsr #5
        beq     .L2
        cmp     r0, #31
        bls     .L2

Perhaps there are instruction encoding or timing/scheduling issues that
might make one more advantagous over the other, with and without -Os?
For the RTL optimizers, at least, "x >= 32" should be easier to reason
about and therefore a preferable canonical form (provided backends can
emit it optimally as an insn variant).

I need to read up on how thumb synthesizes small constants, and therefore
which values are potentially problematic.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]