[PATCH] Fix PR target/28946

Richard Earnshaw rearnsha@arm.com
Tue Sep 5 14:32:00 GMT 2006


On Tue, 2006-09-05 at 14:20, Roger Sayle wrote:
> On Tue, 5 Sep 2006, Uros Bizjak wrote:
> > 2006-09-06 Uros Bizjak <uros@kss-loka.si>
> >
> > 	PR target/28946
> > 	* combine.c (try_combine): Force PARALLEL of comparison and
> > 	arithmetic insn even if arithmetic result is not used.
> >
> >       * gcc.target/i386/pr28946.c: New test.
> 
> I was going to point out tht a generic change to combine like this
> really needs more testing that C & C++ on x86, especially during
> stage 3.  However, from your latest comments in the bugzilla PR it
> looks like you've already discovered as issue with the use of "and"
> vs "test".
> 
> Taking a small step backwards perhaps we're missing a completely
> different optimization here...  if (((unsigned)x >> 5) != 0) could
> probably be better expanded/transformed as if ((unsigned) x >= 32),
> especially on pentium-4s where the cost of a shift is significant.
> 
> A less intrusive patch/workaround for the 4.0 and 4.1 branches might
> be to add a peephole2 to recognize the "shrl $foo, reg; testl reg, reg"
> sequence and simplify it.  Less than ideal, but unlikely to change
> anything other than the affected code.
> 
> However if we stick with a combine solution, this might be one of
> those instances where we need to attempt to recognize the combination
> directly (to catch testl or a mythical shift-compare, ARM?), and if
> that fails, try again with a parallel containing the original SET.

GCC for ARM already generates a near optimal sequence for this, namely:

fct:
        movs    r0, r0, lsr #5
        beq     .L2
        b       fct1
.L2:
        b       fct2

which uses the following pattern in the MD file

(define_insn "*shiftsi3_compare0_scratch"
  [(set (reg:CC_NOOV CC_REGNUM)
	(compare:CC_NOOV (match_operator:SI 3 "shift_operator"
			  [(match_operand:SI 1 "s_register_operand" "r")
			   (match_operand:SI 2 "arm_rhs_operand" "rM")])
			 (const_int 0)))
   (clobber (match_scratch:SI 0 "=r"))]
  "TARGET_ARM"
  "mov%?s\\t%0, %1%S3"

so I'm not sure why the x86 can't do something similar.

I'm concerned about trying to convert this to a comparison with a
constant.  Non-small constants are very expensive to generate in Thumb
state.



More information about the Gcc-patches mailing list