[Bug rtl-optimization/29294] 4.1, 4.2 (possibly 4.0?) not finding postmodify address mode on ARM
siarhei.siamashka at gmail dot com
gcc-bugzilla@gcc.gnu.org
Thu Dec 20 05:47:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
--- Comment #10 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 05:47:30 UTC ---
(In reply to comment #9)
And some performance measurements (for working with L1 cache):
> $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
>
> 00000000 <fill>:
> 0: e2511010 subs r1, r1, #16
> 4: 412fff1e bxmi lr
> 8: e2511010 subs r1, r1, #16
> c: e1c020f0 strd r2, [r0]
> 10: e1c020f8 strd r2, [r0, #8]
> 14: e2800010 add r0, r0, #16
> 18: 5afffffa bpl 8 <fill+0x8>
> 1c: e12fff1e bx lr
Cortex-A8 - 5 cycles per iteration
Cortex-A9 - 4.5 cycles per iteration
Cortex-A15 - 3 cycles per iteration
> $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c
> $ objdump -d test.o
>
> 00000000 <fill>:
> 0: e351000f cmp r1, #15
> 4: d12fff1e bxle lr
> 8: e2411010 sub r1, r1, #16
> c: e280c010 add ip, r0, #16
> 10: e3c1100f bic r1, r1, #15
> 14: e08c1001 add r1, ip, r1
> 18: e1c020f0 strd r2, [r0]
> 1c: e2800010 add r0, r0, #16
> 20: e14020f8 strd r2, [r0, #-8]
> 24: e1500001 cmp r0, r1
> 28: 1afffffa bne 18 <fill+0x18>
> 2c: e12fff1e bx lr
Cortex-A8 - 6 cycles per iteration
Cortex-A9 - 4 cycles per iteration
Cortex-A15 - 3 cycles per iteration
While we could have expected something like the following code for the inner
loop:
1: strd V, [BUF], #8
subs N, N, #16
strd V, [BUF], #8
bpl 1b
Cortex-A8 - 4 cycles per iteration
Cortex-A9 - 4 cycles per iteration
Cortex-A15 - 2.5 cycles per iteration
More information about the Gcc-bugs
mailing list