This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: RFC: Combine of compare & and oddity

From: "Wilco Dijkstra" <wdijkstr at arm dot com>
To: "'Segher Boessenkool'" <segher at kernel dot crashing dot org>
Cc: "'GCC Patches'" <gcc-patches at gcc dot gnu dot org>
Date: Thu, 3 Sep 2015 12:43:34 +0100
Subject: RE: RFC: Combine of compare & and oddity
Authentication-results: sourceware.org; auth=none
References: <000e01d0e5a2$1e2f66b0$5a8e3410$ at com> <20150902184747 dot GA7676 at gate dot crashing dot org>

> Segher Boessenkool wrote:
> Hi Wilco,
> 
> On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
> widest
> > register type. During matching these are expanded into a very inefficient sequence that
> fails to
> > match. For example (x & 2) == 0 is matched in combine like this:
> >
> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> >             (const_int 1 [0x1])
> >             (const_int 1 [0x1]))
> >         (const_int 0 [0])))
> 
> Yes.  Some processors even have specific instructions to do this.

However there are 2 issues with this, one is the spurious subreg, the other is 
that only a subset of legal zero_extracts are tried (only single bit and the
degenerate case of zero_extract with shift of 0 - which I think should not be a
zero_extract). All other AND immediate remain as AND. 

So to emit an AND on targets without such specific instructions, or where such 
instructions are more expensive than a simple AND (*), you need now at least 3 different 
backend patterns for any instruction that can emit AND immediate...

(*) I think that is another issue in combine - when both alternatives match you
want to select the lowest cost one, not the first one that matches.

> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> >                 (const_int 1 [0x1]))
> >             (const_int 1 [0x1]))
> >         (const_int 0 [0])))
> 
> This is after r223067.  Combine tests only one "final" instruction; that
> revision rewrites zero_ext* if it doesn't match and tries again.  This
> helps for processors that can do more generic masks (like rs6000, and I
> believe also aarch64?): without it, you need many more patterns to match
> all the zero_ext{ract,end} cases.

But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the 
sign bit, and for mask tests, you shift out all the non-mask bits.

So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test? Or would it be better to let each target decide
on how to canonicalize bit tests and only try that alternative?

> > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the
> immediate
> > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> >
> > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> 
> It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> processors that have special bit-test instructions.  If we now want combine
> to not do this, we'll have to update all backends that rely on it.

Would any backend actually rely on this given it only does some specific masks,
has a redundant shift with 0 for the mask case and the odd subreg as well?

> > (it even uses shifts by 0 at
> > times which are unlikely to ever match anything).
> 
> It matches fine on some targets.  It certainly looks silly, and could be
> expressed simpler.  Patch should be simple; watch this space / remind me /
> or file a PR.

I don't think this issue matters as it seems unlikely it ever matches.

> > Why does combine not try to match the obvious (x &
> > C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code
> generation on
> > many targets.
> 
> They are common, and many processors had instructions for them, which is
> (supposedly) why combine special-cased them.

Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

> > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
> redundant
> > subreg.
> 
> Not really trivial...  Think about comparisons...
> 
> int32_t x;
> ((x >> 31) & 1) > 0;
> // is not the same as
> (x & 0x80000000) > 0; // signed comparison
> 
> (You do not easily know what the COMPARE is used for).

Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
the compare condition. However it looks like that case can't happen - (x & 0x80000000) 
comparisons with have the AND optimized away much earlier.

> > However wouldn't it make more sense to never special case certain AND immediate in the first
> > place?
> 
> Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
> for those that do).  We need to know the impact of such a change, at the least.

The alternative would be let the target decide how to canonicalize bit tests. 
That seems like a better solution than trying multiple possibilities which can never 
match on most targets.

Wilco

Follow-Ups:
- Re: RFC: Combine of compare & and oddity
  - From: Segher Boessenkool

References:
- RFC: Combine of compare & and oddity
  - From: Wilco Dijkstra
- Re: RFC: Combine of compare & and oddity
  - From: Segher Boessenkool

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]