This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks


Hi Sam

On 13/07/18 17:09, Sam Tebbs wrote:
Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
}

void read2(unsigned long long a, unsigned long long b, unsigned long long *c,
   unsigned long long *d) {
   *c = combine(a, b); *d = combine(b, a);
}

When compiled with -O2, read2 would result in:

read2:
   and   x5, x1, #0xffffffff
   and   x4, x0, #0xffffffff00000000
   orr   x4, x4, x5
   and   x1, x1, #0xffffffff00000000
   and   x0, x0, #0xffffffff
   str   x4, [x2]
   orr   x0, x0, x1
   str   x0, [x3]
   ret

But with this patch results in:

read2:
   mov   x4, x1
   bfxil x4, x0, 0, 32
   str   x4, [x2]
   bfxil x0, x1, 0, 32
   str   x0, [x3]
   ret
Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions.

I am not a maintainer but I have a question about this patch. I may be missing something or reading
it wrong. So feel free to point it out:

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+            (match_operand 3 "const_int_operand"))
+        (and:DI (match_operand:DI 2 "register_operand" "0")
+            (match_operand 4 "const_int_operand"))))]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[3]))"
+  {
+    HOST_WIDE_INT op4 = INTVAL (operands[4]);
+    operands[3] = GEN_INT (64 - ceil_log2 (op4));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);

In the BFXIL you are reading %3 LSB bits from operand 1 and putting it in the LSBs of %0.
This means that the pattern should be masking the 32-%3 MSB of %0 and
%3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE>

Can you please compare a simpler version of the above example you gave to
make sure the generated assembly is equivalent before and after the patch:

void read2(unsigned long long a, unsigned long long b, unsigned long long *c) {
  *c = combine(a, b);
}


From the above text

read2:
  and   x5, x1, #0xffffffff
  and   x4, x0, #0xffffffff00000000
  orr   x4, x4, x5

read2:
  mov   x4, x1
  bfxil x4, x0, 0, 32

This does not seem equivalent to me.

Thanks
Sudi

+    return "";
+  }
+  [(set_attr "type" "bfx")]
+)
gcc/
2018-07-11  Sam Tebbs  <sam.tebbs@arm.com>

         * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt):
         Define.
         * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
         Define.
         * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  <sam.tebbs@arm.com>

         * gcc.target/aarch64/combine_bfxil.c: New file.
         * gcc.target/aarch64/combine_bfxil_2.c: New file.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]