This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Validity of SUBREG+AND-imm transformations

From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
To: Jeff Law <law at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
Date: Mon, 29 Feb 2016 10:51:24 +0000
Subject: Re: Validity of SUBREG+AND-imm transformations
Authentication-results: sourceware.org; auth=none
References: <56D055C6 dot 6040800 at foss dot arm dot com> <56D0C2A1 dot 1070501 at redhat dot com>

Hi Jeff,

On 26/02/16 21:24, Jeff Law wrote:

On 02/26/2016 06:40 AM, Kyrill Tkachov wrote:

Hi all,

I'm looking at a case where some RTL passes create an RTL expression of
the form:
(subreg:QI (and:SI (reg:SI x1)
             (const_int 31)) 0)

which I'd like to simplify to:
(and:QI (subreg:QI (reg:SI x1) 0)
         (const_int 31))

I can think of cases where the first is better and other cases where the second is better -- a lot depends on context.  I don't have a good sense for which is better in general.

Note that as-written these don't trigger the subtle issues in what happens with upper bits.  That's more for extensions.

(subreg:SI (whatever:QI))

vs

{zero,sign}_extend:SI (whatever:QI))

vs

(and:SI (subreg:SI (whatever:QI) (const_int 0x255)))


The first leave the bits beyond QI as "undefined" and sometimes (but I doubt all that often in practice) the compiler will use the undefined nature of those bits to enable optimizations.


The second & 3rd variants crisply define the upper bits.


Thanks for the explanation.


It's easy enough to express in RTL but I'm trying to convince myself on
its validity.
I know there are some subtle points in this area. combine_simplify_rtx
in combine.c
has a comment:
       /* Note that we cannot do any narrowing for non-constants since
      we might have been counting on using the fact that some bits were
      zero.  We now do this in the SET.  */

That comment makes no sense.  Unfortunately it goes back to a change from Kenner in 1994 -- which predates having patch discussions here and consistently adding tests to the testsuite.

The code used to do this:


-      if (GET_MODE_CLASS (mode) == MODE_INT
-         && GET_MODE_CLASS (GET_MODE (SUBREG_REG (x))) == MODE_INT
-         && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)))
-         && subreg_lowpart_p (x))
-       return force_to_mode (SUBREG_REG (x), mode, GET_MODE_MASK (mode),
-                             NULL_RTX, 0);

Which appears to check that we've got a narrowing subreg expression, and if we do try to force the SUBREG_REG into the right mode using force_to_mode.

But if we had a narrowing SUBREG_REG, then I can't see how anything would have been dependign on the upper bits being zero.


and if I try to implement this transformation in simplify_subreg from
simplify-rtx.c
I get some cases where combine goes into an infinite recursion in
simplify_comparison
because it tries to do:

       /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
          fits in both M1 and M2 and the SUBREG is either paradoxical
          or represents the low part, permute the SUBREG and the AND
          and try again.  */

Right. I think you just end up ping-ponging between the two equivalent representations. Which may indeed argue that the existing representation is preferred and we should look deeper into why the existing representation isn't beinghandled as well as it should be.


After a bit more experimentation I found it's easy enough to avoid looping on this spot.
Just check if the gen_lowpart part of that transformation returned something different from the original
expression...


Performing this transformation would help a lot with recognition of some
patterns that
I'm working on, so would it be acceptable to teach combine or
simplify-rtx to do this?

How does it help recognition?   What kinds of patterns are you trying to recognize?


On aarch64 the normal integer-side variable shift/rotate instructions truncate their shift amount.
However, we can't enable SHIFT_COUNT_TRUNCATED by default on aarch64 because some of the alternatives
in our shift patterns use the vector shift instructions, which don't truncate the shift amount.
So I'm trying to add a combine pattern to eliminate redundant truncations where possible when using
the normal shift instructions.

So I'm trying to create a define_insn to match something like:
  [(set (match_operand:SI 0 "register_operand" "=r")
    (ashift:SI
      (match_operand:SI 1 "register_operand" "r")
      (and:QI
        (match_operand:QI 2 "register_operand" "r")
        (match_operand:QI 3 "const_int_operand" "n"))))]


where operands[3] is 31 for SImode. The 'and' expression has to be in QImode because our shift expanders
expand the shift amount to QImode.
However, when combine tries to combine the and-mask operation with the shift, for example from the code:
unsigned f1(unsigned x, int y) { return x << (y & 31); }

the expression it tries to match is:
(set (reg/i:SI 0 x0)
    (ashift:SI (reg:SI 0 x0 [ x ])
        (subreg:QI (and:SI (reg:SI 1 x1 [ y ])
                (const_int 31 [0x1f])) 0)))

That is, the subreg is not propagated inside. I think the define_insn pattern for that would look
somewhat unnatural, so I'm trying to get the subreg to be propagated to the reg inside.

Thanks,
Kyrill

References:
- Validity of SUBREG+AND-imm transformations
  - From: Kyrill Tkachov
- Re: Validity of SUBREG+AND-imm transformations
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]