[PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

Jim Wilson jimw@sifive.com
Wed Nov 11 21:43:00 GMT 2020


On Wed, Nov 11, 2020 at 2:55 AM Jakub Jelinek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> > The patch addresses this by disallowing that rule, if an exact
> power-of-2 is
> > seen as C1.  The reason why I would prefer to have this canonicalised the
> > same way the (X & C1) * C2 is canonicalised, is that cleaning this up
> during
> > combine is more difficult on some architectures that require multiple
> insns
> > to represent the shifted constant (i.e. C1 << C2).
>
> As I said, it is better to decide which one is better before or during
> expansion based on target costs, sure, combine can't catch everything.
>

it could be fixed in combine if we allowed 4 instructions to split into 3.
We allow combinations of 4 insns, but we only allow splits into 2 insns as
far as I know.

Trying 7, 8, 9 -> 10:
    7: r80:DI=0x1
    8: r81:DI=r80:DI<<0x23
      REG_DEAD r80:DI
      REG_EQUAL 0x800000000
    9: r79:DI=r81:DI-0x8
      REG_DEAD r81:DI
      REG_EQUAL 0x7fffffff8
   10: r77:DI=r78:DI&r79:DI
      REG_DEAD r79:DI
      REG_DEAD r78:DI
Failed to match this instruction:
(set (reg:DI 77)
    (and:DI (reg:DI 78)
        (const_int 34359738360 [0x7fffffff8])))

The AND operation can be implemented with 3 shifts, a left shift to clear
the upper bits, a right shift to clear the lower bits, and then another
left shift to shift it back to position.  We are then left with 4 shifts,
and we can have a combiner pattern to match those 4 shifts and reduce to
2.  But this would require combine.c changes to work.  Unless maybe we
don't split the pattern into 3 insns and accept it as is, but there is risk
that this could result in worse code.

Or alternatively, maybe we could have an ANDDI3 expander which accepts mask
constants like this and emits the 3 shifts directly instead of forcing
the constant to a register.  Then we just need to be able to recognize that
these 3 shifts plus the fourth one can be combined into 2 shifts which
might work already, and if not it should be a simple combiner pattern.
This doesn't help if the same RTL can be created after initial RTL
expansion.

With the draft B extension we do this operation with a single instruction,
but we should get this right for systems without the B extension also.

Jim


More information about the Gcc-patches mailing list