Let's look at this: long foo(long a, long b, long c, uint8_t d){ if(d){ return a+b; }else{ return a-c; } } The listing reports this: long foo(long a, long b, long c, uint8_t d){ 4e: cf 92 push r12 ;All this registers are pushed 50: ef 92 push r14 ;despite it's unessecary 52: ff 92 push r15 ; 54: 0f 93 push r16 56: 1f 93 push r17 if(d){ 58: cc 20 and r12, r12 5a: 29 f0 breq .+10 ; 0x66 <foo+0x18> return a+b; 5c: 62 0f add r22, r18 5e: 73 1f adc r23, r19 60: 84 1f adc r24, r20 62: 95 1f adc r25, r21 64: 04 c0 rjmp .+8 ; 0x6e <foo+0x20> }else{ return a-c; 66: 6e 19 sub r22, r14 68: 7f 09 sbc r23, r15 6a: 80 0b sbc r24, r16 6c: 91 0b sbc r25, r17 6e: 1f 91 pop r17 ;And they are getting restored 70: 0f 91 pop r16 ;despite they are not changed. 72: ff 90 pop r15 74: ef 90 pop r14 76: cf 90 pop r12 78: 08 95 ret During all operation in the low register (r3-r17) are always zero, and they are never changed in the hole file and even not in the function itself. So it's useless to push and pop them, we're only loosing time, space and ram. Please excuse my bad bug-reporting-style. This is my first report. For further explainaition I can recomment you the german site, where this problem is beeing discussed. http://www.roboternetz.de/phpBB2/viewtopic.php?p=300953 I hope you can fix this. Michael
Konsole: ========================================================================= root@slax:/mnt/sda1_removable/avr/gcc_schlecht# make -------- begin -------- * Individual makefile for AvrLiveCD * Avr-Gcc version: avr-gcc (GCC) 4.1.2 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * --------------- avr-size -d main.elf -t avr-size: 'main.elf': No such file 0 0 0 0 0 (TOTALS) Compiling: main.c avr-gcc -c -mmcu=attiny26 -I. -g -DF_CPU=1000000UL -I -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall -Wstrict-prototypes -L /usr/local/bin/lib/gcc/avr/4.1.1 -Wa,-adhlns=main.lst -I/usr/local/bin/avr/include/ -std=gnu99 -MD -MP -MF .dep/main.o.d main.c -o main.o main.c:32:2: warning: no newline at end of file Linking: main.elf avr-gcc -mmcu=attiny26 -I. -g -DF_CPU=1000000UL -I -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall -Wstrict-prototypes -L /usr/local/bin/lib/gcc/avr/4.1.1 -Wa,-adhlns=main.o -I/usr/local/bin/avr/include/ -std=gnu99 -MD -MP -MF .dep/main.elf.d main.o --output main.elf -Wl,-Map=main.map,--cref Creating load file for Flash: main.hex avr-objcopy -O ihex -R .eeprom main.elf main.hex Creating load file for EEPROM: main.eep avr-objcopy -j .eeprom --set-section-flags .eeprom=alloc,load \ --change-section-lma .eeprom=0 -O ihex main.elf main.eep avr-objcopy: there are no sections to be copied! avr-objcopy: --change-section-lma .eeprom=0x00000000 never used make: [main.eep] Error 1 (ignored) Creating Extended Listing: main.lss avr-objdump -h -S main.elf > main.lss Creating Symbol Table: main.sym avr-nm -n main.elf > main.sym avr-size -d main.elf -t text data bss dec hex filename 228 0 0 228 e4 main.elf 228 0 0 228 e4 (TOTALS) -------- end -------- root@slax:/mnt/sda1_removable/avr/gcc_schlecht# make main.i avr-gcc -E -mmcu=attiny26 -I. -g -DF_CPU=1000000UL -I -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall -Wstrict-prototypes -L /usr/local/bin/lib/gcc/avr/4.1.1 -Wa,-adhlns=main.lst -I/usr/local/bin/avr/include/ -std=gnu99 main.c -o main.i main.c:32:2: warning: no newline at end of file root@slax:/mnt/sda1_removable/avr/gcc_schlecht# make main.s avr-gcc -S -mmcu=attiny26 -I. -g -DF_CPU=1000000UL -I -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall -Wstrict-prototypes -L /usr/local/bin/lib/gcc/avr/4.1.1 -Wa,-adhlns=main.lst -I/usr/local/bin/avr/include/ -std=gnu99 -MD -MP -MF .dep/main.s.d main.c -o main.s main.c:32:2: warning: no newline at end of file root@slax:/mnt/sda1_removable/avr/gcc_schlecht# ======================================================= main.c ======================================================= //General Avrincludes #include <avr/io.h> long foo(long a, long b, long c, uint8_t d){ if(d){ return a+b; }else{ return a-c; } } long foo_rec(long a){ if(a==4){ return foo_rec(a-1)+2; } return 1; } long foo_rec2(long a, long b){ if(!b){ return foo_rec2(a+2,b+4); }else{ return a+b+4; } } int main(void){ return 0; } ======================================================== The preprocessed file: ======================================================== # 1 "main.c" # 1 "/mnt/sda1_removable/avr/gcc_schlecht//" # 1 "<built-in>" # 1 "<command line>" # 1 "main.c" # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 1 3 # 87 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/sfr_defs.h" 1 3 # 126 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/sfr_defs.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/inttypes.h" 1 3 # 37 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/inttypes.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 1 3 # 121 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 3 typedef int int8_t __attribute__((__mode__(__QI__))); typedef unsigned int uint8_t __attribute__((__mode__(__QI__))); typedef int int16_t __attribute__ ((__mode__ (__HI__))); typedef unsigned int uint16_t __attribute__ ((__mode__ (__HI__))); typedef int int32_t __attribute__ ((__mode__ (__SI__))); typedef unsigned int uint32_t __attribute__ ((__mode__ (__SI__))); typedef int int64_t __attribute__((__mode__(__DI__))); typedef unsigned int uint64_t __attribute__((__mode__(__DI__))); # 142 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 3 typedef int16_t intptr_t; typedef uint16_t uintptr_t; # 159 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 3 typedef int8_t int_least8_t; typedef uint8_t uint_least8_t; typedef int16_t int_least16_t; typedef uint16_t uint_least16_t; typedef int32_t int_least32_t; typedef uint32_t uint_least32_t; typedef int64_t int_least64_t; typedef uint64_t uint_least64_t; # 213 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 3 typedef int8_t int_fast8_t; typedef uint8_t uint_fast8_t; typedef int16_t int_fast16_t; typedef uint16_t uint_fast16_t; typedef int32_t int_fast32_t; typedef uint32_t uint_fast32_t; typedef int64_t int_fast64_t; typedef uint64_t uint_fast64_t; # 273 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h" 3 typedef int64_t intmax_t; typedef uint64_t uintmax_t; # 38 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/inttypes.h" 2 3 # 77 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/inttypes.h" 3 typedef int32_t int_farptr_t; typedef uint32_t uint_farptr_t; # 127 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/sfr_defs.h" 2 3 # 88 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 2 3 # 312 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/iotn26.h" 1 3 # 313 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 2 3 # 360 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/portpins.h" 1 3 # 361 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 2 3 # 370 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 3 # 1 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/version.h" 1 3 # 371 "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h" 2 3 # 3 "main.c" 2 long foo(long a, long b, long c, uint8_t d){ if(d){ return a+b; }else{ return a-c; } } long foo_rec(long a){ if(a==4){ return foo_rec(a-1)+2; } return 1; } long foo_rec2(long a, long b){ if(!b){ return foo_rec2(a+2,b+4); }else{ return a+b+4; } } int main(void){ return 0; } ============================================================== The Assemblerfile: ============================================================== .file "main.c" .arch attiny26 __SREG__ = 0x3f __SP_H__ = 0x3e __SP_L__ = 0x3d __tmp_reg__ = 0 __zero_reg__ = 1 .global __do_copy_data .global __do_clear_bss .stabs "/mnt/sda1_removable/avr/gcc_schlecht/",100,0,2,.Ltext0 .stabs "main.c",100,0,2,.Ltext0 .text .Ltext0: .stabs "gcc2_compiled.",60,0,0,0 .stabs "int:t(0,1)=r(0,1);-32768;32767;",128,0,0,0 .stabs "char:t(0,2)=@s8;r(0,2);0;255;",128,0,0,0 .stabs "long int:t(0,3)=@s32;r(0,3);020000000000;017777777777;",128,0,0,0 .stabs "unsigned int:t(0,4)=r(0,4);0;0177777;",128,0,0,0 .stabs "long unsigned int:t(0,5)=@s32;r(0,5);0;037777777777;",128,0,0,0 .stabs "long long int:t(0,6)=@s64;r(0,6);01000000000000000000000;0777777777777777777777;",128,0,0,0 .stabs "long long unsigned int:t(0,7)=@s64;r(0,7);0;01777777777777777777777;",128,0,0,0 .stabs "short int:t(0,8)=r(0,8);-32768;32767;",128,0,0,0 .stabs "short unsigned int:t(0,9)=r(0,9);0;0177777;",128,0,0,0 .stabs "signed char:t(0,10)=@s8;r(0,10);-128;127;",128,0,0,0 .stabs "unsigned char:t(0,11)=@s8;r(0,11);0;255;",128,0,0,0 .stabs "float:t(0,12)=r(0,1);4;0;",128,0,0,0 .stabs "double:t(0,13)=r(0,1);4;0;",128,0,0,0 .stabs "long double:t(0,14)=r(0,1);4;0;",128,0,0,0 .stabs "void:t(0,15)=(0,15)",128,0,0,0 .stabs "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/io.h",130,0,0,0 .stabs "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/avr/sfr_defs.h",130,0,0,0 .stabs "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/inttypes.h",130,0,0,0 .stabs "/usr/local/lib/gcc/avr/4.1.2/../../../../avr/include/stdint.h",130,0,0,0 .stabs "int8_t:t(4,1)=(0,10)",128,0,121,0 .stabs "uint8_t:t(4,2)=(0,11)",128,0,122,0 .stabs "int16_t:t(4,3)=(0,1)",128,0,123,0 .stabs "uint16_t:t(4,4)=(0,4)",128,0,124,0 .stabs "int32_t:t(4,5)=(0,3)",128,0,125,0 .stabs "uint32_t:t(4,6)=(0,5)",128,0,126,0 .stabs "int64_t:t(4,7)=(0,6)",128,0,128,0 .stabs "uint64_t:t(4,8)=(0,7)",128,0,129,0 .stabs "intptr_t:t(4,9)=(4,3)",128,0,142,0 .stabs "uintptr_t:t(4,10)=(4,4)",128,0,147,0 .stabs "int_least8_t:t(4,11)=(4,1)",128,0,159,0 .stabs "uint_least8_t:t(4,12)=(4,2)",128,0,164,0 .stabs "int_least16_t:t(4,13)=(4,3)",128,0,169,0 .stabs "uint_least16_t:t(4,14)=(4,4)",128,0,174,0 .stabs "int_least32_t:t(4,15)=(4,5)",128,0,179,0 .stabs "uint_least32_t:t(4,16)=(4,6)",128,0,184,0 .stabs "int_least64_t:t(4,17)=(4,7)",128,0,192,0 .stabs "uint_least64_t:t(4,18)=(4,8)",128,0,199,0 .stabs "int_fast8_t:t(4,19)=(4,1)",128,0,213,0 .stabs "uint_fast8_t:t(4,20)=(4,2)",128,0,218,0 .stabs "int_fast16_t:t(4,21)=(4,3)",128,0,223,0 .stabs "uint_fast16_t:t(4,22)=(4,4)",128,0,228,0 .stabs "int_fast32_t:t(4,23)=(4,5)",128,0,233,0 .stabs "uint_fast32_t:t(4,24)=(4,6)",128,0,238,0 .stabs "int_fast64_t:t(4,25)=(4,7)",128,0,246,0 .stabs "uint_fast64_t:t(4,26)=(4,8)",128,0,253,0 .stabs "intmax_t:t(4,27)=(4,7)",128,0,273,0 .stabs "uintmax_t:t(4,28)=(4,8)",128,0,278,0 .stabn 162,0,0,0 .stabs "int_farptr_t:t(3,1)=(4,5)",128,0,77,0 .stabs "uint_farptr_t:t(3,2)=(4,6)",128,0,81,0 .stabn 162,0,0,0 .stabn 162,0,0,0 .stabn 162,0,0,0 .stabs "foo:F(0,3)",36,0,5,foo .stabs "a:P(0,3)",64,0,5,22 .stabs "b:P(0,3)",64,0,5,18 .stabs "c:P(0,3)",64,0,5,14 .stabs "d:P(4,2)",64,0,5,12 .global foo .type foo, @function foo: .stabd 46,0,0 .stabn 68,0,5,.LM0-foo .LM0: /* prologue: frame size=0 */ push r12 push r14 push r15 push r16 push r17 /* prologue end (size=5) */ .stabn 68,0,6,.LM1-foo .LM1: tst r12 breq .L2 .stabn 68,0,7,.LM2-foo .LM2: add r22,r18 adc r23,r19 adc r24,r20 adc r25,r21 rjmp .L4 .L2: .stabn 68,0,9,.LM3-foo .LM3: sub r22,r14 sbc r23,r15 sbc r24,r16 sbc r25,r17 .L4: /* epilogue: frame size=0 */ pop r17 pop r16 pop r15 pop r14 pop r12 ret /* epilogue end (size=6) */ /* function foo size 22 (11) */ .size foo, .-foo .Lscope0: .stabs "",36,0,0,.Lscope0-foo .stabd 78,0,0 .stabs "foo_rec:F(0,3)",36,0,13,foo_rec .stabs "a:P(0,3)",64,0,13,22 .global foo_rec .type foo_rec, @function foo_rec: .stabd 46,0,0 .stabn 68,0,13,.LM4-foo_rec .LM4: /* prologue: frame size=0 */ /* prologue end (size=0) */ .stabn 68,0,14,.LM5-foo_rec .LM5: cpi r22,lo8(4) cpc r23,__zero_reg__ cpc r24,__zero_reg__ cpc r25,__zero_reg__ breq .L8 .stabn 68,0,14,.LM6-foo_rec .LM6: ldi r22,lo8(0) ldi r23,hi8(0) ldi r24,hlo8(0) ldi r25,hhi8(0) rjmp .L10 .L8: ldi r22,lo8(2) ldi r23,hi8(2) ldi r24,hlo8(2) ldi r25,hhi8(2) .L10: subi r22,lo8(-(1)) sbci r23,hi8(-(1)) sbci r24,hlo8(-(1)) sbci r25,hhi8(-(1)) /* epilogue: frame size=0 */ ret /* epilogue end (size=1) */ /* function foo_rec size 19 (18) */ .size foo_rec, .-foo_rec .Lscope1: .stabs "",36,0,0,.Lscope1-foo_rec .stabd 78,0,0 .stabs "foo_rec2:F(0,3)",36,0,20,foo_rec2 .stabs "a:P(0,3)",64,0,20,22 .stabs "b:P(0,3)",64,0,20,18 .global foo_rec2 .type foo_rec2, @function foo_rec2: .stabd 46,0,0 .stabn 68,0,20,.LM7-foo_rec2 .LM7: /* prologue: frame size=0 */ /* prologue end (size=0) */ .stabn 68,0,21,.LM8-foo_rec2 .LM8: cp r18,__zero_reg__ cpc r19,__zero_reg__ cpc r20,__zero_reg__ cpc r21,__zero_reg__ brne .L14 .stabn 68,0,22,.LM9-foo_rec2 .LM9: subi r22,lo8(-(2)) sbci r23,hi8(-(2)) sbci r24,hlo8(-(2)) sbci r25,hhi8(-(2)) ldi r18,lo8(4) ldi r19,hi8(4) ldi r20,hlo8(4) ldi r21,hhi8(4) .L14: subi r22,lo8(-(4)) sbci r23,hi8(-(4)) sbci r24,hlo8(-(4)) sbci r25,hhi8(-(4)) add r18,r22 adc r19,r23 adc r20,r24 adc r21,r25 .stabn 68,0,26,.LM10-foo_rec2 .LM10: mov r25,r21 mov r24,r20 mov r23,r19 mov r22,r18 /* epilogue: frame size=0 */ ret /* epilogue end (size=1) */ /* function foo_rec2 size 26 (25) */ .size foo_rec2, .-foo_rec2 .Lscope2: .stabs "",36,0,0,.Lscope2-foo_rec2 .stabd 78,0,0 .stabs "main:F(0,1)",36,0,28,main .global main .type main, @function main: .stabd 46,0,0 .stabn 68,0,28,.LM11-main .LM11: /* prologue: frame size=0 */ ldi r28,lo8(__stack - 0) ldi r29,hi8(__stack - 0) out __SP_H__,r29 out __SP_L__,r28 /* prologue end (size=4) */ .stabn 68,0,32,.LM12-main .LM12: ldi r24,lo8(0) ldi r25,hi8(0) /* epilogue: frame size=0 */ rjmp exit /* epilogue end (size=1) */ /* function main size 7 (2) */ .size main, .-main .Lscope3: .stabs "",36,0,0,.Lscope3-main .stabd 78,0,0 .stabs "",100,0,0,.Letext0 .Letext0: /* File "main.c": code 74 = 0x004a ( 56), prologues 9, epilogues 9 */ ================================================= I hope this are all the necesarry files. Michael
Confirmed. 4.2.2 produces unnecessary pushes and pops. 4.3.0 causes worse code than 4.2.x and adds unnecessary moves. Adding const or pure function attributes do not seem to help in 4.3.0.
Problem is caused by bug in gcc DF or at least incorrect documentation regarding prolog/epilog register save/resotres As specified in internals manual AVR prolog/epilog uses df_regs_ever_live_p(reg) to determine which register should be saved on stack (if it is not call_used_register). However, if function has arguments that are stored in non call_used_registers (R8-R17), then this test gives incorrect result and these registers will always saved/restored by prolog/epilog. (Argument register never need to be saved/restored.) This problem only applies to targets that pass arguments in non call_used_registers. Unfortunately no part of gcc including DF appears to have proper information to use directly. In the absence of a change to gcc, the target can determine which registers are REALLY used as arguments and exclude these from save/restores. So it requires going thru all function arguments again using target argument macros. Will post patch when it's finished testing. But here is key routine: /* Returns HARD_REG_SET indicating which registers are used for arguments */ static void avr_args (HARD_REG_SET *set) { int reg; int i; rtx arg; CUMULATIVE_ARGS cum; tree decl = DECL_ARGUMENTS (current_function_decl); INIT_CUMULATIVE_ARGS (cum, TREE_TYPE (current_function_decl), NULL_RTX, decl, -1); for (; decl; decl = TREE_CHAIN (decl)) { if ( TREE_CODE (decl) == PARM_DECL && DECL_NAME (decl) && !DECL_ARTIFICIAL (decl)) { enum machine_mode mode = DECL_MODE (decl); /* Get argument RTX */ /* This target does not use named attribute */ arg = FUNCTION_ARG (cum, mode, DECL_ARG_TYPE (decl), 1); FUNCTION_ARG_ADVANCE (cum, mode, DECL_ARG_TYPE (decl), 1); if REG_P(arg) { reg = REGNO (arg); for (i = 0;i < HARD_REGNO_NREGS (reg, mode);i++) { if (set) SET_HARD_REG_BIT (*set, reg + i); } } } } }
Created attachment 15254 [details] Patch to fix bug.
Patch causes wrong code regression. See WinAVR bug #1945375 on SourceForge: <http://sourceforge.net/tracker/index.php?func=detail&aid=1945375&group_id=68108&atid=520074>
Created attachment 15540 [details] Partial solution using DF defs.
Attached is INCOMPLETE attempt to fix this issue. Register saves appear to be ok. But same function is required for Argument pointer elimination offset. It would appear DF chain info is not maintained, when global.c uses this. So offset used to access arguments on stack does not reflect final value required and will fail.
The following information from Kenny Zadeck, shows why the solution does not work. This limitation is not avoidable at the present time without causing compilation time/memory regressions on other targets. So we will have to live with the overly cautious saving of registers. > The target computes offset (INITIAL_ELIMINATION_OFFSET). This is called several times during register allocation (no doubt because something changes). Offset is a function of the number of registers saved. So I used DF_REG_DEF_CHAIN to work out precisely saved registers. But this information is out of date and so the offset is wrong. > However, info provided by df_regs_ever_live_p, is updated. > > So I think the live info is updated in global.c but not the chains. Is there a sane way around this or should I put this one on "too difficult list"? > > best regards > There are a three things to consider: 1) Incremental updating is turned off during global. This was perhaps a mistake, but what I did not want to get into was rescanning each insn that uses/defines a register whenever that register gets assigned. By turning off rescanning, each insn is only rescanned once, after all of its operands have had their registers assigned. 2) Turning this off is most likely the cause of your grief. It is possible that you could move the call to turn off scanning until later, after your bit of foolishness happens but before the actual registers are assigned, but the truth is that i really do not really understand the information flow within global/reload so i did not consider something like this. 3) Incremental scanning could be turned back on, but the cost is quite high because most insns have many operands and because reload can change the assignment of registers after global gets finished. Kenny
Closing as WONTFIX.