This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store


Hi Matthew

On 02/08/18 17:26, matthew.malcomson@arm.com wrote:
Use the STLUR instruction introduced in Armv8.4-a.
This insruction has the store-release semantic like STLR but can take a
9-bit unscaled signed immediate offset.

Example test case:
```
void
foo ()
{
     int32_t *atomic_vals = calloc (4, sizeof (int32_t));
     atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
}
```

Before patch generates
```
foo:
	stp	x29, x30, [sp, -16]!
	mov	x1, 4
	mov	x0, x1
	mov	x29, sp
	bl	calloc
	mov	w1, 2
	add	x0, x0, 4
	stlr	w1, [x0]
	ldp	x29, x30, [sp], 16
	ret
```

After patch generates
```
foo:
	stp	x29, x30, [sp, -16]!
	mov	x1, 4
	mov	x0, x1
	mov	x29, sp
	bl	calloc
	mov	w1, 2
	stlur	w1, [x0, 4]
	ldp	x29, x30, [sp], 16
	ret
```

Full bootstrap and regression test done on aarch64.

Ok for trunk?

gcc/
2018-07-26  Matthew Malcomson  <matthew.malcomson@arm.com>

         * config/aarch64/aarch64-protos.h
         (aarch64_offset_9bit_signed_unscaled_p): New declaration.
         * config/aarch64/aarch64.c
         (aarch64_offset_9bit_signed_unscaled_p): Rename from
         offset_9bit_signed_unscaled_p.
         * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro.
         * config/aarch64/atomics.md (atomic_store<mode>): Allow offset
         and use stlur.
         * config/aarch64/constraints.md (Ust): New constraint.
         * config/aarch64/predicates.md.
         (aarch64_sync_or_stlur_memory_operand): New predicate.

gcc/testsuite/
2018-07-26  Matthew Malcomson  <matthew.malcomson@arm.com>

	* gcc.target/aarch64/atomic-store.c: New.


Thank you for doing this. I am not a maintainer but I have a few nits on
this patch:

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c595385f7586692258f750b6aceb3ed9c8..630a75bf776fcdc374aa9ffa4bb020fea3719320 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_mov_operand_p (rtx, machine_mode);
...
-static inline bool
-offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
+bool
+aarch64_offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
 			       poly_int64 offset)

This needs to be aligned with the first argument

...

@@ -5837,7 +5837,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
 	     ldr/str instructions (only big endian will get here).  */
 	  if (mode == CImode)
 	    return (aarch64_offset_7bit_signed_scaled_p (TImode, offset)
-		    && (offset_9bit_signed_unscaled_p (V16QImode, offset + 32)
+		    && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, offset + 32)

This is not less that 80 characters

...

+;; STLUR instruction constraint requires Armv8.4
+(define_special_memory_constraint "Ust"
+ "@internal
+ A memory address suitable for use with an stlur instruction."
+  (and (match_operand 0 "aarch64_sync_or_stlur_memory_operand")
+       (match_test "TARGET_ARMV8_4")))
+

You are already checking for TARGET_ARMV8_4 inside
aarch64_sync_or_stlur_memory_operand. Also see my comment below for this
function.

...

+;; True if the operand is memory reference valid for one of a str or stlur
+;; operation.
+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+       (and (match_operand 0 "memory_operand")
+	    (match_code "plus" "0")
+	    (match_code "reg" "00")
+	    (match_code "const_int" "01")))
+{
+  if (aarch64_sync_memory_operand (op, mode))
+    return true;
+
+  if (!TARGET_ARMV8_4)
+    return false;
+
+  rtx mem_op = XEXP (op, 0);
+  rtx plus_op0 = XEXP (mem_op, 0);
+  rtx plus_op1 = XEXP (mem_op, 1);
+
+  if (GET_MODE (plus_op0) != DImode)
+    return false;
+
+  poly_int64 offset;
+  poly_int_rtx_p (plus_op1, &offset);
+  return aarch64_offset_9bit_signed_unscaled_p (mode, offset);
+})
+

This predicate body makes it a bit mixed up with the two type of
operands that you want to test especially looking at it from the
constraint check perspective. I am assuming you would not want to use
the non-immediate form of stlur and instead only use it in the
form:
STLUR <Wt>, [<Xn|SP>, #<simm>]
and use stlr for no immediate alternative. Thus the constraint does not
need to check for aarch64_sync_memory_operand.

My suggestion would be to make this operand check separate. Something
like:

+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+       (match_operand 0 "aarch64_stlur_memory_operand")))

Where you define aarch64_stlur_memory_operand as

+bool aarch64_stlur_memory_operand (rtx op)
+{
+  if (!TARGET_ARMV8_4)
+    return false;
+
+  rtx mem_op = XEXP (op, 0);
+  rtx plus_op0 = XEXP (mem_op, 0);
+  rtx plus_op1 = XEXP (mem_op, 1);
+
+  if (GET_MODE (plus_op0) != DImode)
+    return false;
+
+  poly_int64 offset;
+  poly_int_rtx_p (plus_op1, &offset);
+  return aarch64_offset_9bit_signed_unscaled_p (mode, offset);
+}

and use the same for the constraint.

+(define_special_memory_constraint "Ust"
+ "@internal
+ A memory address suitable for use with an stlur instruction."
+  (match_operand 0 "aarch64_stlur_memory_operand")

...

diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-store.c b/gcc/testsuite/gcc.target/aarch64/atomic-store.c
new file mode 100644
index 0000000000000000000000000000000000000000..0ffc88029bbfe748cd0fc3b6ff4fdd47373b0bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/atomic-store.c
...
+void
+foo_toolarge_offset ()
+{
+  int64_t *atomic_vals = calloc (4, sizeof (int64_t));
+  /* 9bit signed unscaled immediate => largest representable value +255.
+     32 * 8 == 256  */
+  atomic_store_explicit (atomic_vals + 32, 2, memory_order_release);
+}

Maybe its a good idea to check for the largest representable value and
the smallest representable value? Update this test to
foo_boundary_offset() and check for both sides of the two boundary?

Thanks
Sudi


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]