[GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

Sam Tebbs sam.tebbs@arm.com
Mon Jul 30 11:31:00 GMT 2018


Hi all,

This update fixes an issue where the predicate would match but during 
register allocation the constraints wouldn't match and so an internal 
compiler error would be raised. This was fixed by adding two 
left_consecutive checks to the pattern's predicate to stop it from 
matching and causing the issue described above. The changelog and 
description from before still apply.

Thanks,
Sam


On 07/24/2018 05:23 PM, Sam Tebbs wrote:
>
>
> On 07/23/2018 02:14 PM, Sam Tebbs wrote:
>>
>>
>> On 07/23/2018 12:38 PM, Renlin Li wrote:
>>>> +(define_insn "*aarch64_bfxil<mode>"
>>>> +  [(set (match_operand:GPI 0 "register_operand" "=r,r")
>>>> +    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
>>>> +            (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
>>>> +        (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
>>>> +            (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
>>>> +  "INTVAL (operands[3]) == ~INTVAL (operands[4])"
>>>> +  {
>>>> +    switch (which_alternative)
>>>> +    {
>>>> +      case 0:
>>>> +    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3])));
>>>> +    return "bfxil\\t%<w>0, %<w>1, 0, %3";
>>>> +      case 1:
>>>> +    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4])));
>>>> +    return "bfxil\\t%<w>0, %<w>2, 0, %3";
>>>> +      default:
>>>> +    gcc_unreachable ();
>>>> +    }
>>>> +  }
>>>> +  [(set_attr "type" "bfm")]
>>>> +)
>>>
>>> Hi Sam,
>>>
>>> Is it possible that operand[3] or operand[4] is 1?
>>>
>>> ceil_log2() could return 0 if the argument is 1.
>>> The width field of the resulting instruction will be 0. Is it still 
>>> correct?
>>>
>>> Regard,
>>> Renlin
>>>
>>>
>> Hi Renlin,
>>
>> I think you have found an edge-case that I didn't think of and that 
>> the code would fail under. I have added a check for this situation 
>> and will send the updated patch soon.
>>
>> Thanks,
>> Sam
>
> Here is the updated patch that fixes the problem outlined by Renlin, 
> and adds another testcase to check for the same issue. The changelog 
> and example from my earlier email (sent on 07/20/2018 10:31 AM) still 
> apply.
>
> Thanks,
> Sam
>
>>>
>>> On 07/20/2018 10:33 AM, Sam Tebbs wrote:
>>>> Please disregard the original patch and see this updated version.
>>>>
>>>>
>>>> On 07/20/2018 10:31 AM, Sam Tebbs wrote:
>>>>> Hi all,
>>>>>
>>>>> Here is an updated patch that does the following:
>>>>>
>>>>> * Adds a new constraint in config/aarch64/constraints.md to check 
>>>>> for a constant integer that is left consecutive. This addresses 
>>>>> Richard Henderson's suggestion about combining the 
>>>>> aarch64_is_left_consecutive call and the const_int match in the 
>>>>> pattern.
>>>>>
>>>>> * Merges the two patterns defined into one.
>>>>>
>>>>> * Changes the pattern's type attribute to bfm.
>>>>>
>>>>> * Improved the comment above the aarch64_is_left_consecutive 
>>>>> implementation.
>>>>>
>>>>> * Makes the pattern use the GPI iterator to accept smaller integer 
>>>>> sizes (an extra test is added to check for this).
>>>>>
>>>>> * Improves the tests in combine_bfxil.c to ensure they aren't 
>>>>> optimised away and that they check for the pattern's correctness.
>>>>>
>>>>> Below is a new changelog and the example given before.
>>>>>
>>>>> Is this OK for trunk?
>>>>>
>>>>> This patch adds an optimisation that exploits the AArch64 BFXIL 
>>>>> instruction
>>>>> when or-ing the result of two bitwise and operations with 
>>>>> non-overlapping
>>>>> bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).
>>>>>
>>>>> Example:
>>>>>
>>>>> unsigned long long combine(unsigned long long a, unsigned long 
>>>>> long b) {
>>>>>   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
>>>>> }
>>>>>
>>>>> void read(unsigned long long a, unsigned long long b, unsigned 
>>>>> long long *c) {
>>>>>   *c = combine(a, b);
>>>>> }
>>>>>
>>>>> When compiled with -O2, read would result in:
>>>>>
>>>>> read:
>>>>>   and   x5, x1, #0xffffffff
>>>>>   and   x4, x0, #0xffffffff00000000
>>>>>   orr   x4, x4, x5
>>>>>   str   x4, [x2]
>>>>>   ret
>>>>>
>>>>> But with this patch results in:
>>>>>
>>>>> read:
>>>>>   mov    x4, x0
>>>>>   bfxil    x4, x1, 0, 32
>>>>>   str    x4, [x2]
>>>>>   ret
>>>>>
>>>>>
>>>>>
>>>>> Bootstrapped and regtested on aarch64-none-linux-gnu and 
>>>>> aarch64-none-elf with no regressions.
>>>>>
>>>>>
>>>>> gcc/
>>>>> 2018-07-11  Sam Tebbs  <sam.tebbs@arm.com>
>>>>>
>>>>>         PR target/85628
>>>>>         * config/aarch64/aarch64.md (*aarch64_bfxil):
>>>>>         Define.
>>>>>         * config/aarch64/constraints.md (Ulc): Define
>>>>>         * config/aarch64/aarch64-protos.h 
>>>>> (aarch64_is_left_consecutive):
>>>>>         Define.
>>>>>         * config/aarch64/aarch64.c (aarch64_is_left_consecutive): 
>>>>> New function.
>>>>>
>>>>> gcc/testsuite
>>>>> 2018-07-11  Sam Tebbs  <sam.tebbs@arm.com>
>>>>>
>>>>>         PR target/85628
>>>>>         * gcc.target/aarch64/combine_bfxil.c: New file.
>>>>>         * gcc.target/aarch64/combine_bfxil_2.c: New file.
>>>>>
>>>>>
>>>>> On 07/19/2018 02:02 PM, Sam Tebbs wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Thanks for the feedback. I find that using "is_left_consecutive" 
>>>>>> is more descriptive than checking for it being a power of 2 - 1, 
>>>>>> since it describes the requirement (having consecutive ones from 
>>>>>> the MSB) more explicitly. I would be happy to change it though if 
>>>>>> that is the consensus.
>>>>>>
>>>>>> I have addressed your point about just returning the string 
>>>>>> instead of using output_asm_insn and have changed it locally. 
>>>>>> I'll send an updated patch soon.
>>>>>>
>>>>>>
>>>>>> On 07/17/2018 02:33 AM, Richard Henderson wrote:
>>>>>>> On 07/16/2018 10:10 AM, Sam Tebbs wrote:
>>>>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>>>>> @@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode 
>>>>>>>> (unsigned regno, unsigned,
>>>>>>>>       return SImode;
>>>>>>>>   }
>>>>>>>>   +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's 
>>>>>>>> bits are consecutive
>>>>>>>> +   ones from the MSB.  */
>>>>>>>> +bool
>>>>>>>> +aarch64_is_left_consecutive (HOST_WIDE_INT i)
>>>>>>>> +{
>>>>>>>> +  return (i | (i - 1)) == HOST_WIDE_INT_M1;
>>>>>>>> +}
>>>>>>>> +
>>>>>>> ...
>>>>>>>> +(define_insn "*aarch64_bfxil"
>>>>>>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>>>>>>> +    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
>>>>>>>> +            (match_operand 3 "const_int_operand"))
>>>>>>>> +        (and:DI (match_operand:DI 2 "register_operand" "0")
>>>>>>>> +            (match_operand 4 "const_int_operand"))))]
>>>>>>>> +  "INTVAL (operands[3]) == ~INTVAL (operands[4])
>>>>>>>> +    && aarch64_is_left_consecutive (INTVAL (operands[4]))"
>>>>>>> Better is to use a define_predicate to merge both that second 
>>>>>>> test and the
>>>>>>> const_int_operand.
>>>>>>>
>>>>>>> (I'm not sure about the "left_consecutive" language either.
>>>>>>> Isn't it more descriptive to say that op3 is a power of 2 minus 1?)
>>>>>>>
>>>>>>> (define_predicate "pow2m1_operand"
>>>>>>>    (and (match_code "const_int")
>>>>>>>         (match_test "exact_pow2 (INTVAL(op) + 1) > 0")))
>>>>>>>
>>>>>>> and use
>>>>>>>
>>>>>>>    (match_operand:DI 3 "pow2m1_operand")
>>>>>>>
>>>>>>> and then just the
>>>>>>>
>>>>>>>    INTVAL (operands[3]) == ~INTVAL (operands[4])
>>>>>>>
>>>>>>> test.
>>>>>>>
>>>>>>> Also, don't omit the modes for the constants.
>>>>>>> Also, there's no reason this applies only to DI mode;
>>>>>>> use the GPI iterator and %<w> in the output template.
>>>>>>>
>>>>>>>> +    HOST_WIDE_INT op3 = INTVAL (operands[3]);
>>>>>>>> +    operands[3] = GEN_INT (ceil_log2 (op3));
>>>>>>>> +    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
>>>>>>>> +    return "";
>>>>>>> You can just return the string that you passed to output_asm_insn.
>>>>>>>
>>>>>>>> +  }
>>>>>>>> +  [(set_attr "type" "bfx")]
>>>>>>> The other aliases of the BFM insn use type "bfm";
>>>>>>> "bfx" appears to be aliases of UBFM and SBFM.
>>>>>>> Not that it appears to matter to the scheduling
>>>>>>> descriptions, but it is inconsistent.
>>>>>>>
>>>>>>>
>>>>>>> r~
>>>>>>
>>>>>
>>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb9634.patch
Type: text/x-patch
Size: 6831 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20180730/1fe33844/attachment.bin>


More information about the Gcc-patches mailing list