This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions

From: Michael Collison <Michael dot Collison at arm dot com>
To: James Greenhalgh <James dot Greenhalgh at arm dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Fri, 18 Nov 2016 07:41:58 +0000
Subject: RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Michael dot Collison at arm dot com;
Nodisclaimer: True
References: <HE1PR0802MB237722F87B6EA9A17703214395AA0@HE1PR0802MB2377.eurprd08.prod.outlook.com> <20161117162603.GB14894@arm.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

James,

I incorporated all your suggestions, and successfully bootstrapped and re-ran the testsuite.

Okay for trunk?

2016-11-18  Michael Collison  <michael.collison@arm.com>

	* config/aarch64/aarch64-protos.h
	(aarch64_and_split_imm1, aarch64_and_split_imm2)
	(aarch64_and_bitmask_imm): New prototypes
	* config/aarch64/aarch64.c (aarch64_and_split_imm1):
	New overloaded function to create bit mask covering the
	lowest to highest bits set.
	(aarch64_and_split_imm2): New overloaded functions to create bit
	mask of zeros between first and last bit set.
	(aarch64_and_bitmask_imm): New function to determine if a integer
	is a valid two instruction "and" operation.
	* config/aarch64/aarch64.md:(and<mode>3): New define_insn and _split
	allowing wider range of constants with "and" operations.
	* (ior<mode>3, xor<mode>3): Use new LOGICAL2 iterator to prevent
	"and" operator from matching restricted constant range used for
	ior and xor operators.
	* config/aarch64/constraints.md (UsO constraint): New SImode constraint
	for constants in "and" operantions.
	(UsP constraint): New DImode constraint for constants in "and" operations.
	* config/aarch64/iterators.md (lconst2): New mode iterator.
	(LOGICAL2): New code iterator.
	* config/aarch64/predicates.md (aarch64_logical_and_immediate): New
	predicate
	(aarch64_logical_and_operand): New predicate allowing extended constants
	for "and" operations.
	* testsuite/gcc.target/aarch64/and_const.c: New test to verify
	additional constants are recognized and fewer instructions generated.
	* testsuite/gcc.target/aarch64/and_const2.c: New test to verify
	additional constants are recognized and fewer instructions generated.

-----Original Message-----
From: James Greenhalgh [mailto:james.greenhalgh@arm.com] 
Sent: Thursday, November 17, 2016 9:26 AM
To: Michael Collison
Cc: gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

On Thu, Oct 27, 2016 at 08:44:02PM +0000, Michael Collison wrote:
> This patch is designed to improve code generation for "and" 
> instructions with certain immediate operands.
> 
> For the following test case:
> 
> int f2(int x)
> {
>    x &= 0x0ffffff8;
> 
>    x &= 0xff001fff;
> 
>    return x;
> }
> 
> the trunk aarch64 compiler generates:
> 
> mov	w1, 8184
> movk	w1, 0xf00, lsl 16
> and	w0, w0, w1
> 
> We can generate one fewer instruction if we recognize certain 
> constants. With the attached patch the current trunk compiler generates:
> 
> and	w0, w0, 268435448
> and	w0, w0, -16769025
> 
> Bootstrapped and make check successfully completed with no regressions 
> on aarch64-linux-gnu.
> 
> Okay for trunk?

Hi Michael,

I have some minor comments in line.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);  
> int aarch64_get_condition_code (rtx);  bool aarch64_bitmask_imm 
> (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); 
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); 
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, 
> +machine_mode mode);
>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
> aarch64_classify_symbolic_expression (rtx);  bool 
> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
> 3e663eb..db82c5c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
>    return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in 
> +VAL_IN.  */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
> +  int first_bit_set = ctz_hwi (val_in);
> +  int last_bit_set = floor_log2 (val_in);
> +  gcc_assert (first_bit_set < last_bit_set);

This assert can never trigger (by definition) unless VAL_IN == 0 (in which case floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case you're testing for, then this assert would be more explicit as

 gcc_assert (val_in != 0)

I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first and last are ambiguous (you have them the opposite way round from what I'd consider first [highest, thus leading bits] and last [lowest/trailing bits]).

> +
> +  return ((HOST_WIDE_INT_UC (2) << last_bit_set) -
> +	  (HOST_WIDE_INT_1U << first_bit_set)); }
> +
> +/* Create constant with zero bits between first and last bit set and one
> +   bits elsewhere.  */

I can't parse this comment in relation to what the code below does.
Looking at the code, you're building a new constant where the bits between the first and last bits are unmodified, and all other bits are set to one.

i.e. for 0000 1010 1000 you'd generate 1111 1010 1111.

> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm2 (HOST_WIDE_INT val_in) {
> +  return val_in | ~aarch64_and_split_imm1 (val_in); }
> +
> +/* Return true if VAL_IN is a valid 'and' bitmask immediate.  */
> +
> +bool
> +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
> +mode) {
> +  if (aarch64_bitmask_imm (val_in, mode))
> +    return false;
> +
> +  if (aarch64_move_imm (val_in, mode))
> +    return false;
> +
> +  unsigned HOST_WIDE_INT imm2 = aarch64_and_split_imm2 (val_in);
> +
> +  if (!aarch64_bitmask_imm (imm2, mode))
> +    return false;
> +
> +  return true;

You can replace these four lines with:

  return aarch64_bitmask_imm (imm2, mode);

> +}
>  
>  /* Return true if val is an immediate that can be loaded into a
>     register in a single instruction.  */

> diff --git a/gcc/testsuite/gcc.target/aarch64/and_const.c 
> b/gcc/testsuite/gcc.target/aarch64/and_const.c
> new file mode 100644
> index 0000000..9c377d7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/and_const.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int f2 (int x)
> +{
> +   x &= 0x0ffffff8;
> +
> +   x &= 0xff001fff;
> +
> +   return x;
> +}

It would be good to have a test for the DImode cases too (using long long).

Thanks,
James

Attachment: gnutools_5860_ipreview4.patch
Description: gnutools_5860_ipreview4.patch

Follow-Ups:
- Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
  - From: James Greenhalgh

References:
- Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
  - From: James Greenhalgh

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]