[GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

Mon Jul 23 13:15:00 GMT 2018

On 07/23/2018 12:38 PM, Renlin Li wrote:
>> +(define_insn "*aarch64_bfxil<mode>"
>> +Â  [(set (match_operand:GPI 0 "register_operand" "=r,r")
>> +Â Â Â  (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
>> +Â Â Â Â Â Â Â Â Â Â Â  (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
>> +Â Â Â Â Â Â Â  (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
>> +Â Â Â Â Â Â Â Â Â Â Â  (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
>> +Â  "INTVAL (operands[3]) == ~INTVAL (operands[4])"
>> +Â  {
>> +Â Â Â  switch (which_alternative)
>> +Â Â Â  {
>> +Â Â Â Â Â  case 0:
>> +Â Â Â  operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3])));
>> +Â Â Â  return "bfxil\\t%<w>0, %<w>1, 0, %3";
>> +Â Â Â Â Â  case 1:
>> +Â Â Â  operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4])));
>> +Â Â Â  return "bfxil\\t%<w>0, %<w>2, 0, %3";
>> +Â Â Â Â Â  default:
>> +Â Â Â  gcc_unreachable ();
>> +Â Â Â  }
>> +Â  }
>> +Â  [(set_attr "type" "bfm")]
>> +)
>
> Hi Sam,
>
> Is it possible that operand[3] or operand[4] is 1?
>
> ceil_log2() could return 0 if the argument is 1.
> The width field of the resulting instruction will be 0. Is it still 
> correct?
>
> Regard,
> Renlin
>
>
Hi Renlin,

I think you have found an edge-case that I didn't think of and that the 
code would fail under. I have added a check for this situation and will 
send the updated patch soon.

Thanks,
Sam
>
> On 07/20/2018 10:33 AM, Sam Tebbs wrote:
>> Please disregard the original patch and see this updated version.
>>
>>
>> On 07/20/2018 10:31 AM, Sam Tebbs wrote:
>>> Hi all,
>>>
>>> Here is an updated patch that does the following:
>>>
>>> * Adds a new constraint in config/aarch64/constraints.md to check 
>>> for a constant integer that is left consecutive. This addresses 
>>> Richard Henderson's suggestion about combining the 
>>> aarch64_is_left_consecutive call and the const_int match in the 
>>> pattern.
>>>
>>> * Merges the two patterns defined into one.
>>>
>>> * Changes the pattern's type attribute to bfm.
>>>
>>> * Improved the comment above the aarch64_is_left_consecutive 
>>> implementation.
>>>
>>> * Makes the pattern use the GPI iterator to accept smaller integer 
>>> sizes (an extra test is added to check for this).
>>>
>>> * Improves the tests in combine_bfxil.c to ensure they aren't 
>>> optimised away and that they check for the pattern's correctness.
>>>
>>> Below is a new changelog and the example given before.
>>>
>>> Is this OK for trunk?
>>>
>>> This patch adds an optimisation that exploits the AArch64 BFXIL 
>>> instruction
>>> when or-ing the result of two bitwise and operations with 
>>> non-overlapping
>>> bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).
>>>
>>> Example:
>>>
>>> unsigned long long combine(unsigned long long a, unsigned long long 
>>> b) {
>>> Â  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
>>> }
>>>
>>> void read(unsigned long long a, unsigned long long b, unsigned long 
>>> long *c) {
>>> Â  *c = combine(a, b);
>>> }
>>>
>>> When compiled with -O2, read would result in:
>>>
>>> read:
>>> Â  andÂ Â  x5, x1, #0xffffffff
>>> Â  andÂ Â  x4, x0, #0xffffffff00000000
>>> Â  orrÂ Â  x4, x4, x5
>>> Â  strÂ Â  x4, [x2]
>>> Â  ret
>>>
>>> But with this patch results in:
>>>
>>> read:
>>> Â  movÂ Â Â  x4, x0
>>> Â  bfxilÂ Â Â  x4, x1, 0, 32
>>> Â  strÂ Â Â  x4, [x2]
>>> Â  ret
>>>
>>>
>>>
>>> Bootstrapped and regtested on aarch64-none-linux-gnu and 
>>> aarch64-none-elf with no regressions.
>>>
>>>
>>> gcc/
>>> 2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>
>>>
>>> Â Â Â Â Â Â Â  PR target/85628
>>> Â Â Â Â Â Â Â  * config/aarch64/aarch64.md (*aarch64_bfxil):
>>> Â Â Â Â Â Â Â  Define.
>>> Â Â Â Â Â Â Â  * config/aarch64/constraints.md (Ulc): Define
>>> Â Â Â Â Â Â Â  * config/aarch64/aarch64-protos.h 
>>> (aarch64_is_left_consecutive):
>>> Â Â Â Â Â Â Â  Define.
>>> Â Â Â Â Â Â Â  * config/aarch64/aarch64.c (aarch64_is_left_consecutive): 
>>> New function.
>>>
>>> gcc/testsuite
>>> 2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>
>>>
>>> Â Â Â Â Â Â Â  PR target/85628
>>> Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil.c: New file.
>>> Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil_2.c: New file.
>>>
>>>
>>> On 07/19/2018 02:02 PM, Sam Tebbs wrote:
>>>> Hi Richard,
>>>>
>>>> Thanks for the feedback. I find that using "is_left_consecutive" is 
>>>> more descriptive than checking for it being a power of 2 - 1, since 
>>>> it describes the requirement (having consecutive ones from the MSB) 
>>>> more explicitly. I would be happy to change it though if that is 
>>>> the consensus.
>>>>
>>>> I have addressed your point about just returning the string instead 
>>>> of using output_asm_insn and have changed it locally. I'll send an 
>>>> updated patch soon.
>>>>
>>>>
>>>> On 07/17/2018 02:33 AM, Richard Henderson wrote:
>>>>> On 07/16/2018 10:10 AM, Sam Tebbs wrote:
>>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>>> @@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode 
>>>>>> (unsigned regno, unsigned,
>>>>>> Â Â Â Â Â  return SImode;
>>>>>> Â  }
>>>>>> Â  +/* Implement IS_LEFT_CONSECUTIVE.Â  Check if an integer's bits 
>>>>>> are consecutive
>>>>>> +Â Â  ones from the MSB.Â  */
>>>>>> +bool
>>>>>> +aarch64_is_left_consecutive (HOST_WIDE_INT i)
>>>>>> +{
>>>>>> +Â  return (i | (i - 1)) == HOST_WIDE_INT_M1;
>>>>>> +}
>>>>>> +
>>>>> ...
>>>>>> +(define_insn "*aarch64_bfxil"
>>>>>> +Â  [(set (match_operand:DI 0 "register_operand" "=r")
>>>>>> +Â Â Â  (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â  (match_operand 3 "const_int_operand"))
>>>>>> +Â Â Â Â Â Â Â  (and:DI (match_operand:DI 2 "register_operand" "0")
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â  (match_operand 4 "const_int_operand"))))]
>>>>>> +Â  "INTVAL (operands[3]) == ~INTVAL (operands[4])
>>>>>> +Â Â Â  && aarch64_is_left_consecutive (INTVAL (operands[4]))"
>>>>> Better is to use a define_predicate to merge both that second test 
>>>>> and the
>>>>> const_int_operand.
>>>>>
>>>>> (I'm not sure about the "left_consecutive" language either.
>>>>> Isn't it more descriptive to say that op3 is a power of 2 minus 1?)
>>>>>
>>>>> (define_predicate "pow2m1_operand"
>>>>> Â Â  (and (match_code "const_int")
>>>>> Â Â Â Â Â Â Â  (match_test "exact_pow2 (INTVAL(op) + 1) > 0")))
>>>>>
>>>>> and use
>>>>>
>>>>> Â Â  (match_operand:DI 3 "pow2m1_operand")
>>>>>
>>>>> and then just the
>>>>>
>>>>> Â Â  INTVAL (operands[3]) == ~INTVAL (operands[4])
>>>>>
>>>>> test.
>>>>>
>>>>> Also, don't omit the modes for the constants.
>>>>> Also, there's no reason this applies only to DI mode;
>>>>> use the GPI iterator and %<w> in the output template.
>>>>>
>>>>>> +Â Â Â  HOST_WIDE_INT op3 = INTVAL (operands[3]);
>>>>>> +Â Â Â  operands[3] = GEN_INT (ceil_log2 (op3));
>>>>>> +Â Â Â  output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
>>>>>> +Â Â Â  return "";
>>>>> You can just return the string that you passed to output_asm_insn.
>>>>>
>>>>>> +Â  }
>>>>>> +Â  [(set_attr "type" "bfx")]
>>>>> The other aliases of the BFM insn use type "bfm";
>>>>> "bfx" appears to be aliases of UBFM and SBFM.
>>>>> Not that it appears to matter to the scheduling
>>>>> descriptions, but it is inconsistent.
>>>>>
>>>>>
>>>>> r~
>>>>
>>>
>>