arm load/store multiple
Richard Earnshaw
rearnsha@arm.com
Wed Nov 17 04:00:00 GMT 1999
The problem is constraining the register allocator to do the right thing.
To be able to use ldm/stm the registers must be in ascending order (though
not necessarily contiguous) for ascending memory addresses. Since I can't
see any way to constrain the register allocator to do this, we can only
use post-reload peepholes to try and patch up appropriate sequences of
loads and stores; but this tends to be very inefficient, particularly if
the scheduler has moved things around and broken up seqeunces of loads and
stores.
The only other way to get ldm/stm sequences is to catch them during rtl
generation and use explicit hard registers (this is used to handle block
copies), but then the explicit hard registers can hurt register
allocation, so it is a trade-off between the number of loads that can be
merged and the impact on the number of registers used -- the current code
uses 4 loads at a time, which has about the same impact on register
allocation as a function call (eg a call to memcpy).
If someone can think of a better way of representing this, so that some
pre-regalloc pass can combine loads in a way that will force the register
allocator to do the right thing, then I'm open to suggestions.
Another useful extension might be to have block spills/restores when we
run out of registers, but that would need a lot of work in reload.
Richard.
> I was thinking that gcc should use the load/store multiple
> instructions for:
>
> int *ip;
>
> main() {
> int i1, i2, i3, i4, i5, i6;
> i1 = ip[2];
> i2 = ip[3];
> i3 = ip[4];
> i4 = ip[5];
> i5 = ip[6];
> i6 = ip[7];
> asm volatile ("# bla bla");
> ip[2] = i2;
> ip[3] = i1;
> ip[4] = i4;
> ip[5] = i3;
> ip[6] = i6;
> ip[7] = i5;
> }
>
> type of code, but yet, it seems to have difficultly using them. Since
> I am not an arm head, I'm not certain that this would be a win. Since
> I haven't tracked it down, I don't know why it isn't using stm/ldm
> instructions more often. I checked the arm port and it seemed as
> though the code tries to group 2, 3 and 4 loads that are adjacent into
> one instruction. And yet, I see:
>
> ldr r2, L3
> ldr r3, [r2, #0]
> ldr r4, [r3, #8]
> add r1, r3, #12
> ldmia r1, {r1, r5, ip} @ phole ldm
> add r2, r3, #24
> ldmia r2, {r2, lr} @ phole ldm
> # bla bla
> str r2, [r3, #28]
> str r1, [r3, #8]
> str r4, [r3, #12]
> str ip, [r3, #16]
> str r5, [r3, #20]
> str lr, [r3, #24]
>
> As I explore around with different sample code, I see that the
> compiler does try a little bit, but just doesn't succeed often.
>
> I can't help but wonder if these types of instructions are supported
> in the right fashion in the md file.
>
> Anyway, thought I would forward this along.
More information about the Gcc-bugs
mailing list