[PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

Richard Earnshaw (lists) Richard.Earnshaw@arm.com
Wed Jun 19 12:19:43 GMT 2024


On 18/06/2024 19:14, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>         (peephole2 to rewrite DI/DF store): New.
>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>         (thumb_movdf_insn): Likewise.
> 
> 	* config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> 
> Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
> ---
>  gcc/config/arm/iterators.md                   |  3 +++
>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..ed4b706773a 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>        gcc_assert (TARGET_HAVE_MOVT);
>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>      case 4:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";

See below for why I don't think this is a case we need to consider here.

>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 5:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>  	return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>      case 1:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";
>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 2:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>     (set_attr "conds" "clob")
>     (set_attr "type" "multiple")]
>  )
> -

> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> +	(mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> +  "TARGET_THUMB1
> +   && (peep2_reg_dead_p (1, operands[1])
> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"

I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).

> +  [(set (match_dup 0)
> +	(mem:DIDF (post_inc:SI (match_dup 1))))]
> +  ""

This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:

     operands[1] = change_address (operands[1], VOIDmode,
				   gen_rtx_POST_INC (SImode,
						     XEXP (operands[1], 0)));

> +)
> +
> +(define_peephole2
> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> +	(match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && peep2_reg_dead_p (1, operands[1])"
> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> +	(match_dup 0))]

There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:

  stmia r0!, {r0, r1}

is ok, but

  stmia r1!, {r0, r1}

is not.

> +  ""
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index 00000000000..167fa9ec876
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mthumb -Os" }  */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> +
> +void copy_df(double *dst, const double *src)
> +{
> +    *dst = *src;
> +}
> +
> +void copy_di(unsigned long long *dst, const unsigned long long *src)
> +{
> +    *dst = *src;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */

R.



More information about the Gcc-patches mailing list