Using this version/config: ~~~~~~~~~~~~~~~~~` Using built-in specs. Target: avr Configured with: ../gcc-4.1.2/configure --prefix=/c/WinAVR --target=avr --enable -languages=c,c++ --with-dwarf2 --enable-win32-registry=WinAVR-20070525 --disable -nls --with-gmp=/usr/local --with-mpfr=/usr/local --enable-doc --disable-libssp Thread model: single gcc version 4.1.2 (WinAVR 20070525) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Using this command line to compile: avr-gcc -S -Os test.c -mmcu=atmega16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The test case: extern unsigned char foo(unsigned char in); unsigned char test2(unsigned char input) { input += foo(0xA); //use input foo(0xA); //make sure input must be saved over the call return input; } The assembler output: /* prologue: frame size=0 */ push r16 push r17 <<Useless /* prologue end (size=2) */ mov r17,r24 ldi r24,lo8(10) call foo mov r16,r24 <<Why?? add r17,r24 is much better ldi r24,lo8(10) call foo add r17,r16 <<Could be gone if above statement used mov r24,r17 clr r25 /* epilogue: frame size=0 */ pop r17 pop r16 <<Useless ret The adding is delayed until after the last call, but this requires saving an extra register. So delaying introduces: an extra psh/pop extra mov instruction
Created attachment 14054 [details] Example C source showing non optimal code
4.3.0 20070817 snapshot generates this for the testcase: test2: push r16 push r17 /* prologue: function */ /* frame size = 0 */ mov r16,r24 ldi r24,lo8(10) call foo mov r17,r24 ldi r24,lo8(10) call foo mov r24,r16 add r24,r17 /* epilogue start */ pop r17 pop r16 ret
(In reply to comment #2) > 4.3.0 20070817 snapshot generates this for the testcase: > <snip> Well at least the extra clr r25 is gone... I just tried some simpler code: extern unsigned char foo(); unsigned char test(unsigned char input) { return input += foo(); } The result is: /* prologue: frame size=0 */ push r17 /* prologue end (size=1) */ mov r17,r24 call foo add r17,r24 <<Could do "add r24,r17" mov r24,r17 <<This could then be gone clr r25 <<This is maybe gone in 4.3.0?? /* epilogue: frame size=0 */ pop r17 ret /* epilogue end (size=2) */ Here the add is also done non-optimal. So maybe solving this prevents the extra register save?
(In reply to comment #3) 4.3.0 20070817 snapshot produces this for the second test case: test: push r17 /* prologue: function */ /* frame size = 0 */ mov r17,r24 call foo add r24,r17 /* epilogue start */ pop r17 ret So the second test case is optimized correctly when we get to 4.3.0.
Created attachment 24271 [details] Assembler output with 4.7.0 r173649 This code is as you expected.
Closing this issue as resolved+worksforme. 4.7.0 generates reasonable code without any overhead, see Attachement http://gcc.gnu.org/bugzilla/attachment.cgi?id=24271 generated with -Os (-O1 and -O2) are same.