Bug 9831 - [ARM] Peephole for multiple load/store could be more effective.
Summary: [ARM] Peephole for multiple load/store could be more effective.
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.3
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2003-02-24 15:26 UTC by gertom
Modified: 2019-05-17 22:16 UTC (History)
3 users (show)

See Also:
Host:
Target: arm-*-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-12-09 04:27:54


Attachments
multiple-load-store.tar.gz (580 bytes, application/x-gzip )
2003-05-21 15:17 UTC, gertom
Details
Testcase for gcc 4.4.0 (100 bytes, text/x-csrc)
2009-04-14 20:04 UTC, Alexandre Pereira Nunes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gertom 2003-02-24 15:26:00 UTC
In the case of subsequent loads from subsequent memory locations, if the base address is not loaded into a register (e.g. the loads use a label, that will be converted to pc relative loads), the corresponding peephole patterns will not optimize. The pattern will match, but multiple load instruction will not be generated. The same apply to stores.

In the attached modified assembly code the 4 load instructions are replaced by an address computation and a multiple load (note that no additional register is required).

Release:
gcc version 3.3 20030217 (prerelease)

Environment:
BUILD & HOST: Linux 2.4.20 i686 unknown
TARGET: arm-unknown-elf

How-To-Repeat:
gcc -S -Os 01.i

// 01.i

# 1 "01.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "01.c"
int f(int, int, int, int);

void foo ()
{
  f(12345,238764,2345234, 83746556);
}
Comment 1 Dara Hazeghi 2003-05-26 19:32:40 UTC
Hello,

I can confirm that this problem is still present on gcc 3.3 branch and mainline (20030512).

Dara
Comment 2 Andrew Pinski 2003-05-26 19:34:23 UTC
See Dara's comment.
Comment 3 Ramana Radhakrishnan 2009-03-13 10:54:18 UTC
(In reply to comment #2)
> See Dara's comment.

Occurs even today . 

foo:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r0, .L3
        ldr     r1, .L3+4
        ldr     r2, .L3+8
        ldr     r3, .L3+12
        b       f
.L4:
        .align  2
.L3:
        .word   12345
        .word   238764
        .word   2345234
        .word   83746556
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.4.0 20090312 (experimental)"
        .section        .note.GNU-stack,"",%progbits
Comment 4 Alexandre Pereira Nunes 2009-04-14 20:04:08 UTC
Created attachment 17638 [details]
Testcase for gcc 4.4.0
Comment 5 Alexandre Pereira Nunes 2009-04-14 20:07:29 UTC
See the attached pqp.c file.

With gcc 4.3.3, on such simplistic examples, peephole ldm and stm works:

sum:
        ldr     r2, .L3
        ldmia   r2, {r1, r3}    @ phole ldm
        add     r3, r0, r3
        add     r0, r0, r1
        stmia   r2, {r0, r3}    @ phole stm
        bx      lr


With gcc 4.4.0 branch, built on 20090413, it fails:

sum:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldr     r3, .L3
        ldr     r2, [r3, #0]
        ldr     r1, [r3, #4]
        add     r2, r0, r2
        add     r1, r0, r1
        str     r1, [r3, #4]
        str     r2, [r3, #0]
        bx      lr
Comment 6 Alexandre Pereira Nunes 2009-04-14 20:11:38 UTC
(In reply to comment #5)
> See the attached pqp.c file.
> 
> [cut]
> 
> With gcc 4.4.0 branch, built on 20090413, it fails:
> 

This seems to be caused by the register order allocation. If I replace the source code lines to operate in the reverse order:

 hehe.y += pqp;
 hehe.x += pqp;

Then 4.4.0 20090413 generates optimized code:

  ldr     r3, .L3
        ldmia   r3, {r1, r2}    @ phole ldm
        add     r2, r0, r2
        add     r1, r0, r1
        stmia   r3, {r1, r2}    @ phole stm
        bx      lr

While gcc 4.3.3 does not :-) Funny thing isn't it?

Comment 7 Ramana Radhakrishnan 2009-06-16 10:01:26 UTC
(In reply to comment #5)
> See the attached pqp.c file.
> 
> With gcc 4.3.3, on such simplistic examples, peephole ldm and stm works:
> 
> sum:
>         ldr     r2, .L3
>         ldmia   r2, {r1, r3}    @ phole ldm
>         add     r3, r0, r3
>         add     r0, r0, r1
>         stmia   r2, {r0, r3}    @ phole stm
>         bx      lr
> 
> 
> With gcc 4.4.0 branch, built on 20090413, it fails:
> 
> sum:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         ldr     r3, .L3
>         ldr     r2, [r3, #0]
>         ldr     r1, [r3, #4]
>         add     r2, r0, r2
>         add     r1, r0, r1
>         str     r1, [r3, #4]
>         str     r2, [r3, #0]
>         bx      lr
> 


We can't use stm or ldm on the second case because ldm's and stm's depend on the lowest numbered register going to the lowest memory address. It's a relic of the register allocator choosing a different order for the registers for such cases.

ldm's and stm's are not easily describable in the RTL backend and are semi-bolted on on top of the existing infrastructure using peepholes.


Comment 8 Wilco 2019-05-17 22:16:11 UTC
There doesn't appear to be anything that can be improved here. Literal pool loads can't be easily peepholed into LDM, and there aren't many opportunities anyway.