[Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM

siarhei.siamashka at gmail dot com gcc-bugzilla@gcc.gnu.org
Thu Dec 20 05:47:00 GMT 2012


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294

--- Comment #10 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 05:47:30 UTC ---
(In reply to comment #9)

And some performance measurements (for working with L1 cache):

> $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
> 
> 00000000 <fill>:
>    0:    e2511010     subs    r1, r1, #16
>    4:    412fff1e     bxmi    lr
>    8:    e2511010     subs    r1, r1, #16
>    c:    e1c020f0     strd    r2, [r0]
>   10:    e1c020f8     strd    r2, [r0, #8]
>   14:    e2800010     add    r0, r0, #16
>   18:    5afffffa     bpl    8 <fill+0x8>
>   1c:    e12fff1e     bx    lr

Cortex-A8  - 5   cycles per iteration
Cortex-A9  - 4.5 cycles per iteration
Cortex-A15 - 3   cycles per iteration

> $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
> 
> 00000000 <fill>:
>    0:    e351000f     cmp    r1, #15
>    4:    d12fff1e     bxle    lr
>    8:    e2411010     sub    r1, r1, #16
>    c:    e280c010     add    ip, r0, #16
>   10:    e3c1100f     bic    r1, r1, #15
>   14:    e08c1001     add    r1, ip, r1
>   18:    e1c020f0     strd    r2, [r0]
>   1c:    e2800010     add    r0, r0, #16
>   20:    e14020f8     strd    r2, [r0, #-8]
>   24:    e1500001     cmp    r0, r1
>   28:    1afffffa     bne    18 <fill+0x18>
>   2c:    e12fff1e     bx    lr

Cortex-A8  - 6 cycles per iteration
Cortex-A9  - 4 cycles per iteration
Cortex-A15 - 3 cycles per iteration

While we could have expected something like the following code for the inner
loop:

1:      strd    V, [BUF], #8
        subs    N, N, #16
        strd    V, [BUF], #8
        bpl    1b

Cortex-A8  - 4 cycles per iteration
Cortex-A9  - 4 cycles per iteration
Cortex-A15 - 2.5 cycles per iteration



More information about the Gcc-bugs mailing list