When using an unsigned char in a loop with limited range the variable is promoted to a 16 bit integer if the variable is passed to another function within the loop by value. Tested using -Os on ATMEGA162: #include <avr/io.h> int sub2(uint8_t); int main(void) { static uint8_t x; volatile uint8_t y; while(1) { for(x=0;x<128; x++) { y+=sub2(x); } } return 0; } int sub2(uint8_t xyz) { volatile uint8_t abc; while(xyz < 64) { xyz++; abc+=xyz; } return abc; }
I guess you want -fno-tree-loop-ivcanon.
I don't see it being promoted on x86-linux-gnu at the tree level.
Created attachment 14454 [details] Preprocessed testcase.
Created attachment 14455 [details] Assembly output of test case using 4.1.2. Maybe I'm wrong, but I don't even see it being promoted on the AVR. See attached assembly output from 4.1.2 with -dP.
Mike, can you provide additional information as to where the bug is?
(In reply to comment #5) > Mike, can you provide additional information as to where the bug is? > This is the assembly output I get: Note that r14,r15 is being reserved for variable x when only a single reg is needed. Strange enough if call sub2 with (x+1) then only one reg is used and the code decreases by 12 bytes. 000000e8 <main>: e8: ef 92 push r14 ea: ff 92 push r15 ec: 1f 93 push r17 ee: cf 93 push r28 f0: df 93 push r29 f2: cd b7 in r28, 0x3d ; 61 f4: de b7 in r29, 0x3e ; 62 f6: 21 97 sbiw r28, 0x01 ; 1 f8: 0f b6 in r0, 0x3f ; 63 fa: f8 94 cli fc: de bf out 0x3e, r29 ; 62 fe: 0f be out 0x3f, r0 ; 63 100: cd bf out 0x3d, r28 ; 61 102: ee 24 eor r14, r14 104: ff 24 eor r15, r15 106: 19 81 ldd r17, Y+1 ; 0x01 108: 8e 2d mov r24, r14 10a: 0e 94 57 00 call 0xae ; 0xae <sub2> 10e: 18 0f add r17, r24 110: 19 83 std Y+1, r17 ; 0x01 112: 08 94 sec 114: e1 1c adc r14, r1 116: f1 1c adc r15, r1 118: 80 e8 ldi r24, 0x80 ; 128 11a: e8 16 cp r14, r24 11c: f1 04 cpc r15, r1 11e: 99 f7 brne .-26 ; 0x106 <main+0x1e> 120: f0 cf rjmp .-32 ; 0x102 <main+0x1a> 00000122 <_exit>: 122: ff cf rjmp .-2 ; 0x122 <_exit>
With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC 4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the variable is still 16 bits.
(In reply to comment #7) > With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC > 4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the > variable is still 16 bits. > There is something more going on, this is the assembler output when sub2 is not in the same file, and calling sub2(x), i.e not x+1: =================================================================== push r17 push r28 push r29 in r28,__SP_L__ in r29,__SP_H__ sbiw r28,1 in __tmp_reg__,__SREG__ cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 /* prologue end (size=11) */ .L3: sts x.1261,__zero_reg__ ldi r24,lo8(0) .L4: ldd r17,Y+1 call sub2 add r17,r24 std Y+1,r17 lds r24,x.1261 subi r24,lo8(-(1)) sts x.1261,r24 sbrs r24,7 rjmp .L4 rjmp .L3 =========================================================== Note that x is now living in memory (as one would expect when it's static in a function). However it's not when looking at the original report. Is this ok? The original post assembler places x in r14,r15. Does this still adhere the static keyword? I don't know, since it's value is always set at the start what's the point of being static? Unless main is called twice? But then x may not be in r14/r15! So it seems to me this is some inline optimation problem. Note that moving the sub2 function above main also gives other results?!? Is this standard behaviour? Also note that when making sub2 static will completely remove it but also remove x as static, is this allowed? Tested against avr-gcc 4.1.2
(In reply to comment #8) > (In reply to comment #7) > > With Mike's description in comment #6, confirmed on 4.1.2 and 4.2.2. AVR GCC > > 4.2.2 is worse than 4.1.2, in that even if sub2 is called with (x+1), the > > variable is still 16 bits. > > > > There is something more going on, this is the assembler output when sub2 is not > in the same file, and calling sub2(x), i.e not x+1: > =================================================================== I think you will also find that if x is changed from ststic to auto the same problem appears.
(In reply to comment #9) > > I think you will also find that if x is changed from ststic to auto the same > problem appears. > Ok, I tried to find the minimum test case. And it has nothing todo with static/volatile/inline etc. ============================================== int sub2(unsigned char); // external function void foo(void) { unsigned char x; for(x=0;x<128; x++) { //sub2(x); //x is becomes a int (16bit) sub2(x+1); //x is char (8bit) } } All is compiled using 4.1.2 and "avr-gcc -Wall -Os -mmcu=avr5 -S test.c" The output when compiling with "sub2(x)" ================================================= /* prologue: frame size=0 */ push r28 push r29 /* prologue end (size=2) */ ldi r28,lo8(0) ldi r29,hi8(0) .L2: mov r24,r28 ; << loads as a byte! call sub2 adiw r28,1 ; << increment as a int cpi r28,128 ; << compare as a int cpc r29,__zero_reg__ brne .L2 /* epilogue: frame size=0 */ pop r29 pop r28 ret The output when compiling with "sub2(x+1)" ================================================================ /* prologue: frame size=0 */ push r17 /* prologue end (size=1) */ ldi r17,lo8(0) .L2: subi r17,lo8(-(1)) mov r24,r17 call sub2 cpi r17,lo8(-128) brne .L2 /* epilogue: frame size=0 */ pop r17 ret =========================================================== From compiling with x as a int I got a sort of hint. When using x+1, x is actually loaded with 1 (and not zero). So i tried: ====================================================== void foo(void) { unsigned char x; for(x=1;x<129; x++) sub2(x); } And it gives the assembler loop check as for the "sub(x+1)" variant of course the loop is now effectivley the same as "sub2(x+1)". But the the incrementing of x is now done after the call, so it proofs that this is not the problem. I also tried different loop lengths, thinking 128 is the nasty one, but it did not help. ====================================================== /* prologue: frame size=0 */ push r17 /* prologue end (size=1) */ ldi r17,lo8(1) .L2: mov r24,r17 call sub2 subi r17,lo8(-(1)) cpi r17,lo8(-127) brne .L2 /* epilogue: frame size=0 */ pop r17 ret I am out of knowledge now, I do not know how to further debug GCC on this. Just hoping this will help someone to pinpoint the problem.
I just realised I did not tried hard enough to find the smallest case: =========================================== volatile unsigned char bar; void foo(void) { unsigned char x; for(x=0;x<128; x++) { //bar = x+1; bar = x; } } Looks like using the loop counter inside the loop causes the problem. Because when not using x, but just e.g. calling a function is also gives the smallest loop. Looking at the RTL output of test.c.00.expand. it allready is a QI vs HI there... so it's going wrong some where inside gcc I really have no knowledge off...
I have verified the attached test case and test case with other comments and found the code generated is correct i.e. the variable is not promoted to integer in gcc-4.3.3, gcc-4.4.3, gcc-4.5.0 and also the latest head. The assembly for the following piece of code: int sub2(unsigned char); // external function void foo(void) { unsigned char x; for(x=0;x<128; x++) { sub2(x); //x is becomes a int (16bit) // sub2(x+1); //x is char (8bit) } } in gcc-4.3.3 is: foo: push r17 /* prologue: function */ /* frame size = 0 */ ldi r17,lo8(0) .L2: mov r24,r17 rcall sub2 subi r17,lo8(-(1)) cpi r17,lo8(-128) brne .L2 /* epilogue start */ pop r17 ret .size foo, .-foo
Cloased as WORKS FOR ME. Using the following test case: volatile unsigned char bar; void foo(void) { unsigned char x; for(x=0;x<128; x++) { bar = x; } } int sub2 (unsigned char); void foo2 (void) { unsigned char x; for(x=0;x<128; x++) { sub2 (x); } } void foo21 (void) { unsigned char x; for(x=0;x<128; x++) { sub2 (x+1); } } And with avr-gcc 4.7 r176517 I get the following result (-Os -mmcu=atmega8). In no case there is a 16-bit loop variable used. foo: ldi r24,lo8(0) .L5: sts bar,r24 subi r24,lo8(-(1)) cpi r24,lo8(-128) brne .L5 ret foo2: push r28 ldi r28,lo8(0) .L8: mov r24,r28 rcall sub2 subi r28,lo8(-(1)) cpi r28,lo8(-128) brne .L8 pop r28 ret foo21: push r28 ldi r28,lo8(0) .L11: subi r28,lo8(-(1)) mov r24,r28 rcall sub2 cpi r28,lo8(-128) brne .L11 pop r28 ret .ident "GCC: (GNU) 4.7.0 20110720 (experimental)"