This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Fix PR target/28946
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, gcc patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 05 Sep 2006 15:32:13 +0100
- Subject: Re: [PATCH] Fix PR target/28946
- References: <Pine.LNX.firstname.lastname@example.org>
On Tue, 2006-09-05 at 14:20, Roger Sayle wrote:
> On Tue, 5 Sep 2006, Uros Bizjak wrote:
> > 2006-09-06 Uros Bizjak <email@example.com>
> > PR target/28946
> > * combine.c (try_combine): Force PARALLEL of comparison and
> > arithmetic insn even if arithmetic result is not used.
> > * gcc.target/i386/pr28946.c: New test.
> I was going to point out tht a generic change to combine like this
> really needs more testing that C & C++ on x86, especially during
> stage 3. However, from your latest comments in the bugzilla PR it
> looks like you've already discovered as issue with the use of "and"
> vs "test".
> Taking a small step backwards perhaps we're missing a completely
> different optimization here... if (((unsigned)x >> 5) != 0) could
> probably be better expanded/transformed as if ((unsigned) x >= 32),
> especially on pentium-4s where the cost of a shift is significant.
> A less intrusive patch/workaround for the 4.0 and 4.1 branches might
> be to add a peephole2 to recognize the "shrl $foo, reg; testl reg, reg"
> sequence and simplify it. Less than ideal, but unlikely to change
> anything other than the affected code.
> However if we stick with a combine solution, this might be one of
> those instances where we need to attempt to recognize the combination
> directly (to catch testl or a mythical shift-compare, ARM?), and if
> that fails, try again with a parallel containing the original SET.
GCC for ARM already generates a near optimal sequence for this, namely:
movs r0, r0, lsr #5
which uses the following pattern in the MD file
[(set (reg:CC_NOOV CC_REGNUM)
(compare:CC_NOOV (match_operator:SI 3 "shift_operator"
[(match_operand:SI 1 "s_register_operand" "r")
(match_operand:SI 2 "arm_rhs_operand" "rM")])
(clobber (match_scratch:SI 0 "=r"))]
so I'm not sure why the x86 can't do something similar.
I'm concerned about trying to convert this to a comparison with a
constant. Non-small constants are very expensive to generate in Thumb