This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

ARM ldm/stm peepholes

This is a new version of the ldm/stm peepholes patch I posted as a WIP
earlier.  This is now in a state where I think it's an improvement over
the existing code and suitable for review.

The motivation here is PR40457, which notes that we cannot convert
multiple ldr/str instructions into ldm/stm for Thumb-1.  To do that, we
need to be able to show that the base register dies, as Thumb-1 only
supports the updating version of these instructions.  This means we have
to use peephole2 instead of peephole.

In addition to converting the existing peepholes, I've also added new
classes which try to optimize special situations requested in the PR.

1. Constant to memory moves: if all the input registers of an stm are
dead after the instruction, it doesn't much matter which value goes into
which register.  This can also use peephole2's ability to allocate free
        mov     ip, #1
-       str     ip, [sp, #0]
-       mov     ip, #0
-       str     ip, [sp, #4]
+       mov     lr, #0
+       stmia   sp, {ip, lr}

2. When loading registers for use in a commutative operation, their
order also doesn't matter if they are dead afterwards.

I don't know whether to include the generator program as documentation
or as the master copy, or whether to leave it out altogether.  It might
make sense to keep it if we want to increase the limit of 4 instructions
per pattern.

The patch may need more tuning to decide when to use these instructions;
I'll need suggestions as I don't think I have enough information
available to make these decisions.  For now, this uses the same
heuristics as previously, but it may trigger in slightly more
situations.  I'm open to suggestions.

I've tested this quite often with arm-none-linux-gnueabi QEMU testsuite
runs.  Yesterday, after fixing an aliasing issue, I managed to get a
successful SPEC2000 run on a Cortex-A9 board (with a 4.4-based compiler;
overall performance minimally higher than before); since then I've only
made cleanups and added comments.  Once approved, I'll rerun all tests
before committing.


Attachment: ldmstm-v6e.diff
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]