ARM: code size increase starting from gcc 10
Richard Earnshaw
Richard.Earnshaw@foss.arm.com
Fri Mar 11 15:20:25 GMT 2022
On 11/03/2022 09:57, Gabriele Favalessa via Gcc-help wrote:
> Hi,
>
> up to gcc 9 this function
>
> #include <stdint.h>
> #include <stdbool.h>
>
> bool f() {
> return *(volatile uint32_t*)0x42143fa8 == 0;
> }
>
> compiles (arm-none-eabi-gcc -mcpu=cortex-m4 -Os) to:
>
> 0: 4b02 ldr r3, [pc, #8] ; (c <f+0xc>)
> 2: 6818 ldr r0, [r3, #0]
> 4: fab0 f080 clz r0, r0
> 8: 0940 lsrs r0, r0, #5
> a: 4770 bx lr
> c: 42143fa8 .word 0x42143fa8
>
> Starting with gcc 10 it compiles to:
>
> 0: 4b03 ldr r3, [pc, #12] ; (10 <f+0x10>)
> 2: f8d3 0fa8 ldr.w r0, [r3, #4008] ; 0xfa8
> 6: fab0 f080 clz r0, r0
> a: 0940 lsrs r0, r0, #5
> c: 4770 bx lr
> e: bf00 nop
> 10: 42143000 .word 0x42143000
>
> Questions:
>
> 1) why newer gcc versions don't generate the smallest possible size in
> spite of -Os?
The compiler is trying to identify opportunities to generate even better
code for more common cases. For example, if your testcase is changed to:
int f() {
return (*(volatile unsigned*)0x42143fa8
+ *(volatile unsigned*)0x42143e00)== 0;
}
Then we see:
ldr r3, .L2
ldr r2, [r3, #4008]
ldr r3, [r3, #3584]
cmn r2, r3
ite eq
moveq r0, #1
movne r0, #0
bx lr
.L3:
.align 2
.L2:
.word 1108619264
being generated which is clearly better than loading two completely
different constants from the literal pool to use as bases:
(gcc-9):
ldr r3, .L2
ldr r2, .L2+4
ldr r3, [r3]
ldr r2, [r2]
cmn r3, r2
ite eq
moveq r0, #1
movne r0, #0
bx lr
.L3:
.align 2
.L2:
.word 1108623272
.word 1108622848
Unfortunately, the code that does this has limited visibility of what
other operations may be accessing nearby memory, so is not able to work
out the optimal situation for every case.
> 2) is there a way to get the smaller code with newer gcc versions?
Unfortunately, no. At least not at present.
R.
>
> Thanks
>
> Gabriele
More information about the Gcc-help
mailing list