Hello,
The attached pipeline patch intends to turn following code generation
ldr r5, [r4, #12]
adds r2, r2, #16
str r5, [r3, #8]
to
ldr r5, [r4, #12]
str r5, [r3, #8]
adds r2, r2, #16
The reason is that the STR can be started from the second cycle of its
preceding LDR which takes 2 cycles, as long as the result of LDR isn't used
as memory address of STR.
Tested with various benchmarks on Cortex-M4 MPS. Except one regression
caused by register allocation, the others either show performance
improvement or no change.
Is it OK to trunk?
BR,
Terry
2013-03-29 Terry Guo <terry.guo@arm.com>
* gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR
pairs.