This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64, PATCH] Improve Neon store of zero


Hi,

I have addressed the issues you raised below.

Is the amended patch OK for trunk?

Thanks,

Jackson.

On 09/12/2017 05:28 PM, James Greenhalgh wrote:
On Wed, Sep 06, 2017 at 10:02:52AM +0100, Jackson Woodruff wrote:
Hi all,

I've attached a new patch that addresses some of the issues raised with
my original patch.

On 08/23/2017 03:35 PM, Wilco Dijkstra wrote:
Richard Sandiford wrote:

Sorry for only noticing now, but the call to aarch64_legitimate_address_p
is asking whether the MEM itself is a legitimate LDP/STP address.  Also,
it might be better to pass false for strict_p, since this can be called
before RA.  So maybe:

     if (GET_CODE (operands[0]) == MEM
	&& !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
	     && aarch64_mem_pair_operand (operands[0], <MODE>mode)))

There were also some issues with the choice of mode for the call the
aarch64_mem_pair_operand.

For a 128-bit wide mode, we want to check `aarch64_mem_pair_operand
(operands[0], DImode)` since that's what the stp will be.

For a 64-bit wide mode, we don't need to do that check because a normal
`str` can be issued.

I've updated the condition as such.


Is there any reason for doing this check at all (or at least this early during
expand)?

Not doing this check means that the zero is forced into a register, so
we then carry around a bit more RTL and rely on combine to merge things.


There is a similar issue with this part:

   (define_insn "*aarch64_simd_mov<mode>"
     [(set (match_operand:VQ 0 "nonimmediate_operand"
-		"=w, m,  w, ?r, ?w, ?r, w")
+		"=w, Ump,  m,  w, ?r, ?w, ?r, w")

The Ump causes the instruction to always split off the address offset. Ump
cannot be used in patterns that are generated before register allocation as it
also calls laarch64_legitimate_address_p with strict_p set to true.

I've changed the constraint to a new constraint 'Umq', that acts the
same as Ump, but calls aarch64_legitimate_address_p with strict_p set to
false and uses DImode for the mode to pass.

This looks mostly OK to me, but this conditional:

+  if (GET_CODE (operands[0]) == MEM
+      && !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
+	   && ((GET_MODE_SIZE (<MODE>mode) == 16
+		&& aarch64_mem_pair_operand (operands[0], DImode))
+	       || GET_MODE_SIZE (<MODE>mode) == 8)))

Has grown a bit too big in such a general pattern to live without a comment
explaining what is going on.

+(define_memory_constraint "Umq"
+  "@internal
+   A memory address which uses a base register with an offset small enough for
+   a load/store pair operation in DI mode."
+   (and (match_code "mem")
+	(match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0),
+						   PARALLEL, 0)")))

And here you want 'false' rather than '0'.

I'll happily merge the patch with those changes, please send an update.

Thanks,
James



ChangeLog:

gcc/

2017-08-29  Jackson Woodruff  <jackson.woodruff@arm.com>

	* config/aarch64/constraints.md (Umq): New constraint.
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>):
	Change to use Umq.
	(mov<mode>): Update condition.

gcc/testsuite

2017-08-29  Jackson Woodruff  <jackson.woodruff@arm.com>

	* gcc.target/aarch64/simd/vect_str_zero.c:
	Update testcase.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f3e084f8778d70c82823b92fa80ff96021ad26db..c20e513f59a35f3410eae3eb0fdc2fc86352a9fc 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -23,10 +23,17 @@
 	(match_operand:VALL_F16 1 "general_operand" ""))]
   "TARGET_SIMD"
   "
-    if (GET_CODE (operands[0]) == MEM
-	&& !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
-	     && aarch64_legitimate_address_p (<MODE>mode, operands[0],
-					      PARALLEL, 1)))
+  /* Force the operand into a register if it is not an
+     immediate whose use can be replaced with xzr.
+     If the mode is 16 bytes wide, then we will be doing
+     a stp in DI mode, so we check the validity of that.
+     If the mode is 8 bytes wide, then we will do doing a
+     normal str, so the check need not apply.  */
+  if (GET_CODE (operands[0]) == MEM
+      && !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
+	   && ((GET_MODE_SIZE (<MODE>mode) == 16
+		&& aarch64_mem_pair_operand (operands[0], DImode))
+	       || GET_MODE_SIZE (<MODE>mode) == 8)))
       operands[1] = force_reg (<MODE>mode, operands[1]);
   "
 )
@@ -126,7 +133,7 @@
 
 (define_insn "*aarch64_simd_mov<mode>"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-		"=w, Ump,  m,  w, ?r, ?w, ?r, w")
+		"=w, Umq,  m,  w, ?r, ?w, ?r, w")
 	(match_operand:VQ 1 "general_operand"
 		"m,  Dz, w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 9ce3d4efaf31a301dfb7c1772a6b685fb2cbd2ee..3649fb48a33454c208a6b81e051fdd316c495710 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -156,6 +156,14 @@
  (and (match_code "mem")
       (match_test "REG_P (XEXP (op, 0))")))
 
+(define_memory_constraint "Umq"
+  "@internal
+   A memory address which uses a base register with an offset small enough for
+   a load/store pair operation in DI mode."
+   (and (match_code "mem")
+	(match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0),
+						   PARALLEL, false)")))
+
 (define_memory_constraint "Ump"
   "@internal
   A memory address suitable for a load/store pair operation."
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
index 07198de109432b530745cc540790303ae0245efb..00cbf20a0b293e71ed713f0c08d89d8a525fa785 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c
@@ -7,7 +7,7 @@ void
 f (uint32x4_t *p)
 {
   uint32x4_t x = { 0, 0, 0, 0};
-  p[1] = x;
+  p[4] = x;
 
   /* { dg-final { scan-assembler "stp\txzr, xzr," } } */
 }
@@ -16,7 +16,9 @@ void
 g (float32x2_t *p)
 {
   float32x2_t x = {0.0, 0.0};
-  p[0] = x;
+  p[400] = x;
 
   /* { dg-final { scan-assembler "str\txzr, " } } */
 }
+
+/* { dg-final { scan-assembler-not "add\tx\[0-9\]\+, x0, \[0-9\]+" } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]