WIP patch, RFC: ARM ldm/stm peephole rewrite

Richard Earnshaw rearnsha@arm.com
Thu Mar 25 12:10:00 GMT 2010


On Wed, 2010-03-24 at 23:52 +0100, Bernd Schmidt wrote:
> This is a work-in-progress patch.  I'm posting it in the hope of getting
> early feedback from the ARM maintainers.
> 
> The motivation for this patch is to be able to generate ldm/stm insns in
> Thumb mode, where ldmia/stmia always update the base register.  To do
> this, it helps to be able to use peep2_reg_dead_p, so the peepholes need
> to be converted to define_peephole2.  In the future, some other
> peepholes I plan to write could use peephole2's mechanism to allocate
> free registers.
> 
> Not all forms of ldm/stm can be represented by existing patterns, and
> since there's a lot of them I've written a small generator program.
> 
> In several places, the code to generate ldm/stm instructions is guarded
> by TARGET_ARM.  I've left this in place, but shouldn't it be TARGET_32BIT?
> 
> The patch depends on the three bug fixes I've posted previously:
>   http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01160.html
>   http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01161.html
>   http://gcc.gnu.org/ml/gcc-patches/2010-03/msg01162.html
> 
> In the last test run, the results were mostly good, but I've identified
> another bug, this time in loop-invariant.c, which breaks match_dups.
> I'm currently testing a fix for that.

A number of comments that spring to mind.

1) Please no more ML in the ARM backend.  It's bad enough that the neon
patterns are generated that way, but that's a massively bigger chunk of
code.  This bit seems more like doing it for its own sake.  It creates a
barrier to maintenance for non-ML programmers and doesn't really bring
any benefits here (doesn't generate multiple files from the one source
file, for example).

2) Why are you generating the .md file fragment with the GPL exception
clause?  This file is part of core GCC not a library.

3) IIRC LDMIB with write-back trips a bug on some StrongARM chips (sadly
not yet obsolete).  So anything up to ARMv4 needs to avoid this.

4) LDM is not always the best thing.  On more recent chips for a small
number of regs they can be slower than individual loads, and only once
you go above 4 regs might it break even.  On some cores it may never be
faster than using LDRD (especially if the address is 64-bit aligned).
Don't forget than in Thumb-2 LDRD can take any two independent registers
with no ordering constraint.

Finally, I'm presuming that as a WIP this isn't aimed at 4.5. 

R.



More information about the Gcc-patches mailing list