When an array element is accessed in a loop, gcc makes the following optimization: it counts the address of the first array element before the loop, and then in/decreases this value by the element size within the loop. This makes the loop faster and shorter than in the case of -O1, but the overall size increases (as opposit to the purpose of -Os). Compare the result of arm-elf-gcc -S -g0 -Os with the output created using arm-elf-gcc -S -g0 -O1. GCC should produce the later kind of output with -Os. Release: gcc version 3.3 20030210 (prerelease) Environment: BUILD & HOST: Linux 2.4.20 i686 unknown TARGET: arm-unknown-elf How-To-Repeat: arm-elf-gcc -S -g0 -Os // 01.c: # 1 "01.c" # 1 "<built-in>" # 1 "<command line>" # 1 "01.c" typedef struct { int si1; short ss1; char sc1; int si2; char sc2; short ss2; short ss3; } st; int f1(st* p, int c, int n) { int i; for (i = c-1; i >= 0; i--) { if (p[i].si1 == n) { return 1; } } return 0; }
Same is also true on PPC on the mainline (20030529): [omni:~/src/gccPRs] pinskia% gcc -O1 -S -o - pr9723.c .text .align 2 .globl _f1 _f1: mr r2,r3 L9: addic. r4,r4,-1 blt- cr0,L8 mulli r0,r4,20 lwzx r0,r2,r0 li r3,1 cmpw cr7,r0,r5 beqlr- cr7 b L9 L8: li r3,0 blr [omni:~/src/gccPRs] pinskia% gcc -Os -S -o - pr9723.c .text .align 2 .globl _f1 _f1: addic. r0,r4,-1 mr r2,r3 blt- cr0,L8 mulli r0,r0,20 add r4,r0,r3 L6: lwz r0,0(r4) addi r4,r4,-20 cmpw cr6,r4,r2 li r3,1 cmpw cr7,r0,r5 beqlr- cr7 bge+ cr6,L6 L8: li r3,0 blr
On i686 I have the following code size (with -fomit-frame-pointer): text data bss dec hex filename 66 0 0 66 42 O2.o 48 0 0 48 30 Os.o For AMD64 I have the following: text data bss dec hex filename 118 0 0 118 76 O2.o 83 0 0 83 53 Os.o So, is there still a problem for ARM and PPC?
(In reply to comment #2) -Os is even worse on the mainline now: _f1: addi r0,r4,-1 addi r4,r4,1 cmpwi cr7,r0,-1 mulli r0,r0,20 mtctr r4 add r3,r3,r0 bge+ cr7,L2 li r0,1 mtctr r0 b L2 L3: lwz r0,0(r3) addi r3,r3,-20 cmpw cr7,r0,r5 bne+ cr7,L2 li r3,1 blr L2: bdnz L3 li r3,0 blr
+1 one more instruction on PPC: _f1: addi r0,r4,-1 cmpwi cr7,r0,-1 mulli r0,r4,20 addi r4,r4,1 add r3,r3,r0 mtctr r4 addi r3,r3,-20 bge+ cr7,L2 li r0,1 mtctr r0 b L2 L3: lwz r0,0(r3) addi r3,r3,-20 cmpw cr7,r0,r5 bne+ cr7,L2 li r3,1 blr L2: bdnz L3 li r3,0 blr
For arm and ppc, trunk today at -Os produces smaller code than -O1 and -O2.