Bug 40457 - use stm and ldm to access consecutive memory words
Summary: use stm and ldm to access consecutive memory words
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2009-06-16 09:06 UTC by Carrot
Modified: 2023-01-14 00:16 UTC (History)
5 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail:
Last reconfirmed: 2009-06-19 10:18:34


Attachments
test case (78 bytes, application/octet-stream)
2009-06-16 09:11 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2009-06-16 09:06:49 UTC
Current gcc can't make use of stm and ldm to reduce code size.
Comment 1 Carrot 2009-06-16 09:11:07 UTC
Created attachment 18005 [details]
test case

For this function

void foo(int* p)
{
  p[0] = 1;
  p[1] = 2;
}

gcc generates:

mov     r1, #1
mov     r3, #2
str     r1, [r0]
str     r3, [r0, #4]
bx      lr

We use one stm instruction to replace two str instructions.

For the second case:

int bar(int* p)
{
  int x = p[0] + p[1];
  return x;
}

gcc generates:

ldr     r2, [r0, #4]
ldr     r3, [r0]
add     r0, r2, r3
bx      lr

In this case we can use on ldm to replace the two ldr instructions.
Comment 2 Ramana Radhakrishnan 2009-06-16 10:03:48 UTC
Could you check to see why store_multiple_sequence doesn't find this in the peephole in the ARM backend ? 

GCC does generate stm and ldm's in a number of other places using the old peephole and store_multiple_sequence option. It might be worth digging into this further. 
Comment 3 Andrew Pinski 2009-06-16 15:19:56 UTC
I think these peepholes should moved over to peephole2 also if that area is being touched.
Comment 4 Richard Earnshaw 2009-06-16 15:50:51 UTC
You haven't specified what compilation options you were using.
Comment 5 Uroš Bizjak 2009-06-16 18:16:51 UTC
(In reply to comment #2)
> Could you check to see why store_multiple_sequence doesn't find this in the
> peephole in the ARM backend ? 

Registers also need to be consecutive, starting from certain register, i.e.:

str     r1, [r0]
str     r2, [r0, #4]

and

ldr     r3, [r0, #4]
ldr     r2, [r0]

Comment 6 Richard Earnshaw 2009-06-17 08:40:46 UTC
Subject: Re:  use stm and ldm to access consecutive
	memory words

> ------- Comment #5 from ubizjak at gmail dot com  2009-06-16 18:16 -------
> Registers also need to be consecutive, starting from certain register, i.e.:
> 
> str     r1, [r0]
> str     r2, [r0, #4]

No, register numbers simply need to be ascending and loaded from
consecutive memory addresses, so {r0, r2, r3, r5} is valid, but {r2, r5,
r0, r3} is not.



Comment 7 Carrot 2009-06-17 09:30:06 UTC
My command line option is -O2 -Os -mthumb

The compiler didn't run into load_multiple_sequence and store_multiple_sequence. The peephole rules specified it applies to TARGET_ARM only. Is there any special reason we didn't enable it in thumb mode?

For the ascending register number, do we have any code to rename a set of registers to make them ascending? In the generated code for the second function, the register numbers have different order compared with memory offsets.

ldr     r2, [r0, #4]
ldr     r3, [r0]
Comment 8 Ramana Radhakrishnan 2009-06-17 09:49:27 UTC
(In reply to comment #7)
> My command line option is -O2 -Os -mthumb
> 
> The compiler didn't run into load_multiple_sequence and
> store_multiple_sequence. The peephole rules specified it applies to TARGET_ARM
> only. Is there any special reason we didn't enable it in thumb mode?

ldms and stms in thumb mode overwrite the base register that is used for addresses and that's why this is not enabled for thumb mode. One could write a peephole2 pattern that checked for liveness of the base register - if the base register were dead after the instruction, then there could be an ldm or stm peepholed.


> 
> For the ascending register number, do we have any code to rename a set of
> registers to make them ascending? In the generated code for the second
> function, the register numbers have different order compared with memory
> offsets.
> 
> ldr     r2, [r0, #4]
> ldr     r3, [r0]
> 

Not that I am aware of , it's as good as renaming but to allow combinations to happen . It might be useful with PR9831 as well but is a separate problem. 
Comment 9 Ramana Radhakrishnan 2010-03-02 07:37:35 UTC
Not working on this.
Comment 10 Jie Zhang 2010-03-16 09:08:12 UTC
Not working on this.
Comment 12 Bernd Schmidt 2010-08-02 10:07:05 UTC
Subject: Bug 40457

Author: bernds
Date: Mon Aug  2 10:06:47 2010
New Revision: 162815

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162815
Log:
	PR target/40457
	* config/arm/arm.h (arm_regs_in_sequence): Declare.
	* config/arm/arm-protos.h (emit_ldm_seq, emit_stm_seq,
	load_multiple_sequence, store_multiple_sequence): Delete
	declarations.
	(arm_gen_load_multiple, arm_gen_store_multiple): Adjust
	declarations.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm.c (arm_regs_in_sequence): New array.
	(load_multiple_sequence): Now static.  New args SAVED_ORDER,
	CHECK_REGS.  All callers changed.
	If SAVED_ORDER is nonnull, copy the computed order into it.
	If CHECK_REGS is false, don't sort REGS.  Handle Thumb mode.
	(store_multiple_sequence): Now static.  New args NOPS_TOTAL,
	SAVED_ORDER, REG_RTXS and CHECK_REGS.  All callers changed.
	If SAVED_ORDER is nonnull, copy the computed order into it.
	If CHECK_REGS is false, don't sort REGS.  Set up REG_RTXS just
	like REGS.  Handle Thumb mode.
	(arm_gen_load_multiple_1): New function, broken out of
	arm_gen_load_multiple.
	(arm_gen_store_multiple_1): New function, broken out of
	arm_gen_store_multiple.
	(arm_gen_multiple_op): New function, with code from
	arm_gen_load_multiple and arm_gen_store_multiple moved here.
	(arm_gen_load_multiple, arm_gen_store_multiple): Now just
	wrappers around arm_gen_multiple_op.  Remove argument UP, all callers
	changed.
	(gen_ldm_seq, gen_stm_seq, gen_const_stm_seq): New functions.
	* config/arm/predicates.md (commutative_binary_operator): New.
	(load_multiple_operation, store_multiple_operation): Handle more
	variants of these patterns with different starting offsets.  Handle
	Thumb-1.
	* config/arm/arm.md: Include "ldmstm.md".
	(ldmsi_postinc4, ldmsi_postinc4_thumb1, ldmsi_postinc3, ldmsi_postinc2,
	ldmsi4, ldmsi3, ldmsi2, stmsi_postinc4, stmsi_postinc4_thumb1,
	stmsi_postinc3, stmsi_postinc2, stmsi4, stmsi3, stmsi2 and related
	peepholes): Delete.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm-ldmstm.ml: New file.

testsuite/
	PR target/40457
	* gcc.target/arm/pr40457-1.c: New test.
	* gcc.target/arm/pr40457-2.c: New test.


Added:
    trunk/gcc/config/arm/arm-ldmstm.ml
    trunk/gcc/config/arm/ldmstm.md
    trunk/gcc/testsuite/gcc.target/arm/pr40457-1.c
    trunk/gcc/testsuite/gcc.target/arm/pr40457-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm-protos.h
    trunk/gcc/config/arm/arm.c
    trunk/gcc/config/arm/arm.h
    trunk/gcc/config/arm/arm.md
    trunk/gcc/config/arm/predicates.md
    trunk/gcc/testsuite/ChangeLog

Comment 13 Bernd Schmidt 2010-09-29 20:06:58 UTC
Author: bernds
Date: Wed Sep 29 20:06:55 2010
New Revision: 164732

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164732
Log:
	PR target/40457
	* postreload.c (move2add_use_add2_insn): Use full_costs for
	comparison.
	(move2add_use_add3_insn): Likewise.
	(reload_cse_move2add): Likewise.
	* rtlanal.c (get_full_rtx_cost): New function.
	* rtl.h (struct full_rtx_costs): New.
	(init_costs_to_max, init_costs_to_zero, costs_lt_p,
	costs_add_n_insns): New inline functions.
	(get_full_rtx_cost): Declare.

testsuite/
	PR target/40457
	* gcc.target/arm/pr40457-3.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/arm/pr40457-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/postreload.c
    trunk/gcc/rtl.h
    trunk/gcc/rtlanal.c
    trunk/gcc/testsuite/ChangeLog
Comment 14 Ramana Radhakrishnan 2011-02-01 01:10:53 UTC
Now fixed I think looking at the output from trunk.
Comment 15 CVS Commits 2023-01-14 00:16:44 UTC
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>:

https://gcc.gnu.org/g:acddf6665f067bc98a2529a699b1d4509a7387cb

commit r13-5160-gacddf6665f067bc98a2529a699b1d4509a7387cb
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Fri Jan 13 21:15:41 2023 -0300

    [PR40457] [arm] expand SI-aligned movdi into pair of movsi
    
    When expanding a misaligned DImode move, emit aligned SImode moves if
    the parts are sufficiently aligned.  This enables neighboring stores
    to be peephole-combined into stm, as expected by the PR40457 testcase,
    even after SLP vectorizes the originally aligned SImode stores into a
    misaligned DImode store.
    
    
    for  gcc/ChangeLog
    
            PR target/40457
            * config/arm/arm.md (movmisaligndi): Prefer aligned SImode
            moves.