RFC: Combine of compare & and oddity

Jeff Law law@redhat.com
Thu Sep 3 16:15:00 GMT 2015

On 09/03/2015 08:59 AM, Wilco Dijkstra wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>>>>> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
>>>> widest
>>>>> register type. During matching these are expanded into a very inefficient sequence that
>>>> fails to
>>>>> match. For example (x & 2) == 0 is matched in combine like this:
>>>>> Failed to match this instruction:
>>>>> (set (reg:CC 66 cc)
>>>>>      (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>>>>>              (const_int 1 [0x1])
>>>>>              (const_int 1 [0x1]))
>>>>>          (const_int 0 [0])))
>>>> Yes.  Some processors even have specific instructions to do this.
>>> However there are 2 issues with this, one is the spurious subreg,
>> Combine didn't make that up out of thin air; something already used
>> DImode here.  It could simplify it to SImode in this case, that is
>> true, don't know why it doesn't; it isn't necessarily faster code to
>> do so, it can be slower, it might not match, etc.
> The relevant RTL instructions on AArch64 are:
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>          (and:SI (reg/v:SI 76 [ xD.2641 ])
>              (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>       (nil))
>   (insn 26 25 27 2 (set (reg:CC 66 cc)
>          (compare:CC (reg:SI 77 [ D.2705 ])
>              (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>       (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>          (nil)))
> I don't see anything using DI...
Is this aarch64?  Might be the case that combine wants to do everything 
in word_mode for some reason or another.

> Yes, so it looks like some other backends match the odd pattern and then have another
> pattern change it back into the canonical AND/TST form during the split phase (maybe
> the subreg confuses register allocation or block other optimizations). This all seems
> a lot of unnecessary complexity for a few special immediates when there is a much
> simpler solution...
subregs get in the way in various places.  So if we can avoid generating 
them, then that's good.

>>> But there are more efficient ways to emit single bit and masks tests that apply
>>> to most CPUs rather than doing something specific that works for just one target
>>> only. For example single bit test is a simple shift into carry flag or into the
>>> sign bit, and for mask tests, you shift out all the non-mask bits.
>> Most of those are quite target-specific.  Some others are already done,
>> and/or done by other passes.
I wouldn't go that far.  Many targets have simple, cheap ways to do 
single bit testing.  And there are some targets where trying to shift a 
bit into the carry flag would be *horribly* bad performance-wise as they 
only have single bit shifters.

> But what combine does here is even more target-specific. Shifts and AND setting flags
> are universally supported on targets with condition code register.
> Bitfield test/extract instructions are more rare, and when they are supported, they
> may well be more expensive.
I don't think it's anywhere near that clear cut.

>>> So my question is, is it combine's job to try all possible permutations that
>>> constitute a bit or mask test?
>> Combine converts the merged instructions to what it thinks is the
>> canonical or cheapest form, and uses that.  It does not try multiple
>> options (the zero_ext* -> and+shift rewriting is not changing the
>> semantics of the pattern at all).
> But the change from AND to zero_extract is already changing semantics...
It shouldn't be changing semantics -- changing form != changing semantics.

>> Combine does not try multiple options.
> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
> If that is feasible then adding a 3rd option should be possible.
Combine will stop once it finds a match.  It may, in some circumstances, 
try more than one representation to find a match, but that's the 
exception rather than the rule.

>> You will end up with a *lot* of target hooks like this.  It will also
>> make testing harder (less coverage).  I am not sure that is a good idea.
> We certainly need a lot more target hooks in general so GCC can do the right thing
> (rather than using costs inconsistently all over the place). But that's a different
> discussion...
Let's be very careful here, target hooks aren't always the solution. 
I'd rather see the costing models fixed and use those across the board. 
  But frankly, I don't know how to fix the costing models.


More information about the Gcc-patches mailing list