[Bug target/70119] New: AArch64 should take advantage of implicit truncation of variable shift amount without defining SHIFT_COUNT_TRUNCATED

Mon Mar 7 10:41:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70119

            Bug ID: 70119
           Summary: AArch64 should take advantage of implicit truncation
                    of variable shift amount without defining
                    SHIFT_COUNT_TRUNCATED
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Consider the testcases:

unsigned f1(unsigned x, int y) { return x << (y & 31); }

unsigned long f2(unsigned long x, int y) { return x << (y & 63); }

unsigned long
f3 (unsigned long bit_addr)
{
  unsigned long bitnumb = bit_addr & 63;
  return (1L << bitnumb);
}

Currently we generate for -O2:
f1:
        and     w1, w1, 31
        lsl     w0, w0, w1
        ret
f2:
        and     w1, w1, 63
        lsl     x0, x0, x1
        ret
f3:
        and     x0, x0, 63
        mov     x1, 1
        lsl     x0, x1, x0
        ret

The masking of the shift amount could be omitted because the lsl (and other
shuft/rotate instructions) implicitly truncate their shift amount to the
register width.

GCC could figure that out if we defined SHIFT_COUNT_TRUNCATED, but we can't do
that for TARGET_SIMD because the variable shift patterns have alternatives that
perform the shifts on the vector registers. Those instructions don't truncate
their shift amount.

A simple solution is to write a pattern for combine to catch the shift/rotate
by an and-immediate and emit the simple ALU shift/rotate instruction:
(set (reg:SI 1)
     (ashift:SI (reg:SI 2)
                (and:QI (reg:QI 3) (const_int 31))))

The AND operation is in QImode because the shift expanders expand the shift
amount to a QImode value.

This doesn't quite work, however. During combine the midend creates a subreg of
the whole AND expression for the shift amount:
(subreg:QI (and:SI (reg:SI x1)
            (const_int 31)) 0) 

instead of propagating the subreg inside the AND.
Some discussion at:
https://gcc.gnu.org/ml/gcc/2016-02/msg00357.html
(thread continues into 2016-03)

One solution could be to teach simplify-rtx to move the subreg inside.
Another proposed solution is to teach the backend to match different modes for
the shift amounts.

However, I haven't had much luck implementing that idea.
The "ashl" standard name must expand to a single mode for the shift amount and
any explicit masking operation (like in the testcases) that is propagated into
the shift amount must be forced to that mode. For QImode and SImode shift
amounts I'm seeing the same issue as above (subreg of an AND-immediate) and for
DImode shift amounts I see zero_extends of SImode rtxes being created for the
shift amount, that also don't match.