[Aarch64][PATCH] Improve Logical And Immediate Expressions

Wed Nov 23 16:30:00 GMT 2016

Hi Christophe,

This is not a regression per se; the patch causes the test case to generate one less instruction overall, but one additional 'and'. Trunk before the patch (-O2):

foo:
	and	w0, w0, 255
	lsl	w1, w0, 20
	orr	w0, w1, w0, lsl 8
	mov	w1, 65024
	movk	w1, 0xfe0, lsl 16
	and	w0, w0, w1
	ret

The new trunk aarch64 compiler with the patch generates fewer instructions but one additional 'and'

foo:
	and	w0, w0, 255
	lsl	w1, w0, 20
	orr	w0, w1, w0, lsl 8
	and	x0, x0, 268434944
	and	x0, x0, -2031617
	ret

-----Original Message-----
From: Christophe Lyon [mailto:christophe.lyon@linaro.org] 
Sent: Wednesday, November 23, 2016 6:42 AM
To: James Greenhalgh
Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

Hi Michael,

On 21 November 2016 at 10:52, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Fri, Nov 18, 2016 at 07:41:58AM +0000, Michael Collison wrote:
>> James,
>>
>> I incorporated all your suggestions, and successfully bootstrapped 
>> and re-ran the testsuite.
>>
>> Okay for trunk?
>>
>> 2016-11-18  Michael Collison  <michael.collison@arm.com>
>>
>>       * config/aarch64/aarch64-protos.h
>>       (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>       (aarch64_and_bitmask_imm): New prototypes
>>       * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>       New overloaded function to create bit mask covering the
>>       lowest to highest bits set.
>>       (aarch64_and_split_imm2): New overloaded functions to create bit
>>       mask of zeros between first and last bit set.
>>       (aarch64_and_bitmask_imm): New function to determine if a integer
>>       is a valid two instruction "and" operation.
>>       * config/aarch64/aarch64.md:(and<mode>3): New define_insn and _split
>>       allowing wider range of constants with "and" operations.
>>       * (ior<mode>3, xor<mode>3): Use new LOGICAL2 iterator to prevent
>>       "and" operator from matching restricted constant range used for
>>       ior and xor operators.
>>       * config/aarch64/constraints.md (UsO constraint): New SImode constraint
>>       for constants in "and" operantions.
>>       (UsP constraint): New DImode constraint for constants in "and" operations.
>>       * config/aarch64/iterators.md (lconst2): New mode iterator.
>>       (LOGICAL2): New code iterator.
>>       * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>       predicate
>>       (aarch64_logical_and_operand): New predicate allowing extended constants
>>       for "and" operations.
>>       * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>       additional constants are recognized and fewer instructions generated.
>>       * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>       additional constants are recognized and fewer instructions generated.
>>
>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 3cdd69b..7ef8cdf 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, 
>> unsigned);  int aarch64_get_condition_code (rtx);  bool 
>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT 
>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2 
>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned 
>> +HOST_WIDE_INT val_in, machine_mode mode);
>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
>> aarch64_classify_symbolic_expression (rtx);  bool 
>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
>> 3e663eb..8e33c40 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
>>    return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  
>> }
>>
>> +/* Create mask of ones, covering the lowest to highest bits set in 
>> +VAL_IN.  */
>> +
>> +unsigned HOST_WIDE_INT
>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
>> +  int lowest_bit_set = ctz_hwi (val_in);
>> +  int highest_bit_set = floor_log2 (val_in);
>> +  gcc_assert (val_in != 0);
>
> The comment above the function should make this precondition clear. Or 
> you could pick a well defined behaviour for when val_in == 0 (return 
> 0?), if that fits the rest of the algorithm.
>
> Otherwise, this patch looks OK to me.
>
> Thanks,
> James
>

This patch (r242739) causes a regression on aarch64:

  gcc.dg/zero_bits_compound-2.c scan-assembler-times \\(and: 2 now fails.

Christophe.