The inlining performed with -Os has often a tendency to increase the code size for the AVR target, so it misses the entire point behind using -Os. This might be related to bug #30908.
Created attachment 13345 [details] Test case for bug 66690.
Created attachment 13346 [details] Generated assembly code with -Os.
Created attachment 13347 [details] Generated assembly code with -Os -fno-inline.
This code snippet can also be run through the i386 compiler (even though the generated code will obviously be nonsensical). I've only got an older version of that compiler at hand: gcc41 (GCC) 4.1.2 20061229 (prerelease) but even that one generates smaller code without the inlining: j@uriah 197% gcc41 -Os -fno-inline -S bug66690.c j@uriah 198% as bug66690.s j@uriah 199% size text data bss dec hex filename 141 0 0 141 8d a.out j@uriah 200% gcc41 -Os -S bug66690.c j@uriah 201% as bug66690.s j@uriah 202% size text data bss dec hex filename 182 0 0 182 b6 a.out
Inlining decisions are based on heuristics. What works for one target may not work quite as well for another. In this case, it seems that for AVR the heuristics are not the best. You can tune the heuristics for this target and let the target options override the default heuristics parameters.
(In reply to comment #5) > Inlining decisions are based on heuristics. What works for one > target may not work quite as well for another. In this case, it > seems that for AVR the heuristics are not the best. You can tune > the heuristics for this target and let the target options override > the default heuristics parameters. But what if *all* targets appear to suffer from pessimization? I think it cannot be called a bug of a particular target then. Just for a completely different test, I compiled a GCC 4.1.1 (since I had that source code around here) for a sparc64 target. The results support those of avr and i386: % /tmp/sparc64/bin/sparc64-unknown-linux-gcc -Os -c bug66690.c % /tmp/sparc64/bin/sparc64-unknown-linux-size bug66690.o text data bss dec hex filename 212 0 0 212 d4 bug66690.o % /tmp/sparc64/bin/sparc64-unknown-linux-gcc -Os -fno-inline -c bug66690.c % /tmp/sparc64/bin/sparc64-unknown-linux-size bug66690.o text data bss dec hex filename 124 0 0 124 7c bug66690.o So is there a *single* target where inlining on that code would really save space? In all cases so far, the -fno-inline code saved a dramatical amount of space, compared to the default -Os version. Also, as you mention the target code has a chance to tune this (and I know there are a lot of complaints from AVR users about pessimizations caused by GCC 4.x, compared to 3.x), can you give me a hint about where to look for these knobs? I might give it a try to see whether I can find a more optimal set of parameters.
Changed target triplet from avr-*-* to *-*-* as obviously, at least some of GCC's mainstream targets are affected by that bug as well (perhaps even *any* target).
Same results on current trunk. Early inlining is already doing it because we think putchs size (4 insns) when inlined will reduce the compilation units size by 4 insns (the out-of-line copy of putch). putchs IL before inlining looks like <bb 2>: <bb 4>: D.1182_2 ={v} *43B; D.1183_3 = (int) D.1182_2; D.1184_4 = D.1183_3 & 32; if (D.1184_4 == 0) goto <bb 4>; else goto <bb 3>; <bb 3>: ch.0_7 = (volatile unsigned char) ch_6(D); *44B ={v} ch.0_7; return; we count INDIRECT_REFs as having no cost, the only thing what counts is the BIT_AND_EXPR and the comparison. Constants and registers also have no cost. Considering the (currently) recursive structure of estimate_num_insns_1 it is non-trivial to adjust the container-like reference trees (but the INDIREC_REF ones).
gcc version 4.6.0 20100907 (experimental) is not performing the inline in -Os for the test case: #define UCSRA (*(volatile unsigned char *)0x2B) #define UDRE 5 #define UDR (*(volatile unsigned char *)0x2C) static void putch(char ch); void putch(char ch) { while (!(UCSRA & (1<<UDRE))); UDR = ch; } int main(void) { putch(0); putch(1); return 0; } The generated assembly with avr-gcc -S -Os test.c is: putch: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 .L2: sbis 43-0x20,5 rjmp .L2 out 44-0x20,r24 /* epilogue start */ ret .size putch, .-putch .global main .type main, @function main: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 ldi r24,lo8(0) rcall putch ldi r24,lo8(1) rcall putch ldi r24,lo8(0) ldi r25,hi8(0) /* epilogue start */ ret
Hi, I just comitted patch supposed to address similar problem as in Comment 9. On x86_64 I now get: main: .LFB1: .L2: movb 43, %al testb $32, %al je .L2 movb $0, 44 .L3: movb 43, %al testb $32, %al je .L3 movb $1, 44 xorl %eax, %eax ret Can you, please, test if current mainline behave more resonably?
(In reply to comment #10) > Can you, please, test if current mainline behave more resonably? Well, I had to fix bug #46426 first ... Yes, that looks good now. Compiling the original testcase with just -Os (no -fno-inline involved) results in: .type putch, @function putch: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 .L2: sbis 43-0x20,5 rjmp .L2 out 44-0x20,r24 /* epilogue start */ ret .size putch, .-putch .global main .type main, @function main: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 ldi r24,lo8(0) rcall putch ldi r24,lo8(1) rcall putch ldi r24,lo8(2) rcall putch ldi r24,lo8(3) rcall putch ldi r24,lo8(4) rcall putch ldi r24,lo8(5) rcall putch ldi r24,lo8(6) rcall putch ldi r24,lo8(7) rcall putch ldi r24,lo8(8) rcall putch ldi r24,lo8(9) rcall putch ldi r24,lo8(0) ldi r25,hi8(0) /* epilogue start */ ret .size main, .-main That's the same what has originally been compiled when applying -fno-inline. It does not solve the issue from bug #30908 though. The version compiled with -DNOINLINE there is still 4 bytes smaller than the default version (which has already been the case in GCC 4.3.4). However, both versions are now already considerably smaller than they used to be in 4.3.4 (40/44 bytes vs. 52/56 bytes in 4.3.4).
OK, fixed;The other problem is already tracked in bug #30908