Current gcc can't make use of stm and ldm to reduce code size.
Created attachment 18005 [details] test case For this function void foo(int* p) { p[0] = 1; p[1] = 2; } gcc generates: mov r1, #1 mov r3, #2 str r1, [r0] str r3, [r0, #4] bx lr We use one stm instruction to replace two str instructions. For the second case: int bar(int* p) { int x = p[0] + p[1]; return x; } gcc generates: ldr r2, [r0, #4] ldr r3, [r0] add r0, r2, r3 bx lr In this case we can use on ldm to replace the two ldr instructions.
Could you check to see why store_multiple_sequence doesn't find this in the peephole in the ARM backend ? GCC does generate stm and ldm's in a number of other places using the old peephole and store_multiple_sequence option. It might be worth digging into this further.
I think these peepholes should moved over to peephole2 also if that area is being touched.
You haven't specified what compilation options you were using.
(In reply to comment #2) > Could you check to see why store_multiple_sequence doesn't find this in the > peephole in the ARM backend ? Registers also need to be consecutive, starting from certain register, i.e.: str r1, [r0] str r2, [r0, #4] and ldr r3, [r0, #4] ldr r2, [r0]
Subject: Re: use stm and ldm to access consecutive memory words > ------- Comment #5 from ubizjak at gmail dot com 2009-06-16 18:16 ------- > Registers also need to be consecutive, starting from certain register, i.e.: > > str r1, [r0] > str r2, [r0, #4] No, register numbers simply need to be ascending and loaded from consecutive memory addresses, so {r0, r2, r3, r5} is valid, but {r2, r5, r0, r3} is not.
My command line option is -O2 -Os -mthumb The compiler didn't run into load_multiple_sequence and store_multiple_sequence. The peephole rules specified it applies to TARGET_ARM only. Is there any special reason we didn't enable it in thumb mode? For the ascending register number, do we have any code to rename a set of registers to make them ascending? In the generated code for the second function, the register numbers have different order compared with memory offsets. ldr r2, [r0, #4] ldr r3, [r0]
(In reply to comment #7) > My command line option is -O2 -Os -mthumb > > The compiler didn't run into load_multiple_sequence and > store_multiple_sequence. The peephole rules specified it applies to TARGET_ARM > only. Is there any special reason we didn't enable it in thumb mode? ldms and stms in thumb mode overwrite the base register that is used for addresses and that's why this is not enabled for thumb mode. One could write a peephole2 pattern that checked for liveness of the base register - if the base register were dead after the instruction, then there could be an ldm or stm peepholed. > > For the ascending register number, do we have any code to rename a set of > registers to make them ascending? In the generated code for the second > function, the register numbers have different order compared with memory > offsets. > > ldr r2, [r0, #4] > ldr r3, [r0] > Not that I am aware of , it's as good as renaming but to allow combinations to happen . It might be useful with PR9831 as well but is a separate problem.
Not working on this.
http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01231.html
Subject: Bug 40457 Author: bernds Date: Mon Aug 2 10:06:47 2010 New Revision: 162815 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162815 Log: PR target/40457 * config/arm/arm.h (arm_regs_in_sequence): Declare. * config/arm/arm-protos.h (emit_ldm_seq, emit_stm_seq, load_multiple_sequence, store_multiple_sequence): Delete declarations. (arm_gen_load_multiple, arm_gen_store_multiple): Adjust declarations. * config/arm/ldmstm.md: New file. * config/arm/arm.c (arm_regs_in_sequence): New array. (load_multiple_sequence): Now static. New args SAVED_ORDER, CHECK_REGS. All callers changed. If SAVED_ORDER is nonnull, copy the computed order into it. If CHECK_REGS is false, don't sort REGS. Handle Thumb mode. (store_multiple_sequence): Now static. New args NOPS_TOTAL, SAVED_ORDER, REG_RTXS and CHECK_REGS. All callers changed. If SAVED_ORDER is nonnull, copy the computed order into it. If CHECK_REGS is false, don't sort REGS. Set up REG_RTXS just like REGS. Handle Thumb mode. (arm_gen_load_multiple_1): New function, broken out of arm_gen_load_multiple. (arm_gen_store_multiple_1): New function, broken out of arm_gen_store_multiple. (arm_gen_multiple_op): New function, with code from arm_gen_load_multiple and arm_gen_store_multiple moved here. (arm_gen_load_multiple, arm_gen_store_multiple): Now just wrappers around arm_gen_multiple_op. Remove argument UP, all callers changed. (gen_ldm_seq, gen_stm_seq, gen_const_stm_seq): New functions. * config/arm/predicates.md (commutative_binary_operator): New. (load_multiple_operation, store_multiple_operation): Handle more variants of these patterns with different starting offsets. Handle Thumb-1. * config/arm/arm.md: Include "ldmstm.md". (ldmsi_postinc4, ldmsi_postinc4_thumb1, ldmsi_postinc3, ldmsi_postinc2, ldmsi4, ldmsi3, ldmsi2, stmsi_postinc4, stmsi_postinc4_thumb1, stmsi_postinc3, stmsi_postinc2, stmsi4, stmsi3, stmsi2 and related peepholes): Delete. * config/arm/ldmstm.md: New file. * config/arm/arm-ldmstm.ml: New file. testsuite/ PR target/40457 * gcc.target/arm/pr40457-1.c: New test. * gcc.target/arm/pr40457-2.c: New test. Added: trunk/gcc/config/arm/arm-ldmstm.ml trunk/gcc/config/arm/ldmstm.md trunk/gcc/testsuite/gcc.target/arm/pr40457-1.c trunk/gcc/testsuite/gcc.target/arm/pr40457-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm-protos.h trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.h trunk/gcc/config/arm/arm.md trunk/gcc/config/arm/predicates.md trunk/gcc/testsuite/ChangeLog
Author: bernds Date: Wed Sep 29 20:06:55 2010 New Revision: 164732 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164732 Log: PR target/40457 * postreload.c (move2add_use_add2_insn): Use full_costs for comparison. (move2add_use_add3_insn): Likewise. (reload_cse_move2add): Likewise. * rtlanal.c (get_full_rtx_cost): New function. * rtl.h (struct full_rtx_costs): New. (init_costs_to_max, init_costs_to_zero, costs_lt_p, costs_add_n_insns): New inline functions. (get_full_rtx_cost): Declare. testsuite/ PR target/40457 * gcc.target/arm/pr40457-3.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr40457-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/postreload.c trunk/gcc/rtl.h trunk/gcc/rtlanal.c trunk/gcc/testsuite/ChangeLog
Now fixed I think looking at the output from trunk.
The master branch has been updated by Alexandre Oliva <aoliva@gcc.gnu.org>: https://gcc.gnu.org/g:acddf6665f067bc98a2529a699b1d4509a7387cb commit r13-5160-gacddf6665f067bc98a2529a699b1d4509a7387cb Author: Alexandre Oliva <oliva@adacore.com> Date: Fri Jan 13 21:15:41 2023 -0300 [PR40457] [arm] expand SI-aligned movdi into pair of movsi When expanding a misaligned DImode move, emit aligned SImode moves if the parts are sufficiently aligned. This enables neighboring stores to be peephole-combined into stm, as expected by the PR40457 testcase, even after SLP vectorizes the originally aligned SImode stores into a misaligned DImode store. for gcc/ChangeLog PR target/40457 * config/arm/arm.md (movmisaligndi): Prefer aligned SImode moves.