Compile the attached test case with options -Os -mthumb, gcc generates: add r1, r1, #40 mov r3, r0 ldrb r2, [r1] add r3, r3, #40 strb r2, [r3] @ sp needed for prologue bx lr When change the options to -O2 -mthumb, gcc generates: mov r3, #40 ldrb r2, [r1, r3] strb r2, [r0, r3] @ sp needed for prologue bx lr It is both smaller and faster. Compare the dumped IL with different options, all TREE expressions are identical. The first difference occurs after rtl expanding.
Created attachment 19184 [details] test case
I suspect it's rtx costs messed up if asked for speed vs. size metrics.
Only with trunk - this is a performance and size regression. The correct answer might be to define thumb1 specific size costs . At the minute thumb1_rtx_costs is used for both speed and size.