As the avr is an 8-bit processor, it is important to handle data as 8-bit when possible, even though the C standards force 8-bit data to be promoted to 16-bit ints in many situations. Common cases are when doing logic operations on chars - when legally valid (i.e., the results are the same as for 16-bit operations), the generated code should use 8-bit operations. In particular, logic operations using constants such as 0 and 0xff should be optimised. Optimisation works well in many cases, but fails when the expressions get slightly more complex: extern uint8_t data[64]; uint8_t bar(uint8_t x, uint8_t y) { return data[y ^ (x & 0x0f)]; } uint8_t bar2(uint8_t x, uint8_t y) { return data[(y ^ x) & 0x0f]; } 40 bar: 41 /* prologue: frame size=0 */ 42 /* prologue end (size=0) */ 43 0000 E82F mov r30,r24 ; x, x 44 0002 F0E0 ldi r31,lo8(0) ; x, 45 0004 EF70 andi r30,lo8(15) ; x, 46 0006 F070 andi r31,hi8(15) ; x, 47 0008 70E0 ldi r23,lo8(0) ; y, 48 000a E627 eor r30,r22 ; x, y 49 000c F727 eor r31,r23 ; x, y 50 000e E050 subi r30,lo8(-(data)) ; x, 51 0010 F040 sbci r31,hi8(-(data)) ; x, 52 0012 8081 ld r24,Z ; tmp51, data 53 0014 90E0 ldi r25,lo8(0) ; <result>, 54 /* epilogue: frame size=0 */ 55 0016 0895 ret 56 /* epilogue end (size=1) */ 57 /* function bar size 12 (11) */ 61 bar2: 62 /* prologue: frame size=0 */ 63 /* prologue end (size=0) */ 64 0018 E62F mov r30,r22 ; y, y 65 001a E827 eor r30,r24 ; y, x 66 001c F0E0 ldi r31,lo8(0) ; , 67 001e EF70 andi r30,lo8(15) ; tmp46, 68 0020 F070 andi r31,hi8(15) ; tmp46, 69 0022 E050 subi r30,lo8(-(data)) ; tmp46, 70 0024 F040 sbci r31,hi8(-(data)) ; tmp46, 71 0026 8081 ld r24,Z ; tmp50, data 72 0028 90E0 ldi r25,lo8(0) ; <result>, 73 /* epilogue: frame size=0 */ 74 002a 0895 ret 75 /* epilogue end (size=1) */ 76 /* function bar2 size 10 (9) */ The first function "bar" has several clear improvements - it does all the logic operations as 16-bit. In the second case, the "eor" is nicely handled as 8-bit, but the "& 0x0f" is done as 16-bit - there is an unnecessary "andi r31, hi8(15)" instruction.
There are many other cases where 8-bit optimisation is missing - C's integer promotion gets in the way. This is particularly common when dealing with a compile-time constant - there is no way in C to say that "0x3f" is 8-bit rather than a 16-bit int. Another example of code with this problem is: void foo(void) { static unsigned char count; if (++count & 0x3f) { PORTC &= ~0x01; } else { PORTC |= 0x01; } } Both the "&" and the comparison with zero are done as 16-bit. One work-around is to use this macro: #define const_byte(x) ({ static const __attribute__((__progmem__)) \ unsigned char v = x; v; }) Then we can write: #define const_byte(x) ({ static const __attribute__((__progmem__)) \ unsigned char v = x; v; }) uint8_t bar3(uint8_t x, uint8_t y) { return data[y ^ (x & const_byte(0x0f))]; } 147 bar3: 148 /* prologue: function */ 149 /* frame size = 0 */ 150 008c 8F70 andi r24,lo8(15) ; tmp45, 151 008e 8627 eor r24,r22 ; tmp45, y 152 0090 E0E0 ldi r30,lo8(data) ; tmp48, 153 0092 F0E0 ldi r31,hi8(data) ; tmp48, 154 0094 E80F add r30,r24 ; tmp48, tmp45 155 0096 F11D adc r31,__zero_reg__ ; tmp48 156 0098 8081 ld r24,Z ; , data 157 /* epilogue start */ 158 009a 0895 ret 160 As far as I can see, this generated code is optimal. The macro works because it forces the value to be 8-bit, rather than a 16-bit compile-time constant. However, the compiler is still smart enough to see that since it's a "const" with known value, it's value can be used directly. As a side effect, the static "variable" must be created somewhere - by using __progmen__, we create it in flash rather than wasting ram. Even that waste could be spared by garbage-collection linking, or by using a dedicated segment rather than .progmem.data.
I still see this in 4.6.1. (In reply to comment #1) > There are many other cases where 8-bit optimisation is missing - C's integer > promotion gets in the way. This is particularly common when dealing with a > compile-time constant - there is no way in C to say that "0x3f" is 8-bit rather > than a 16-bit int. > > Another example of code with this problem is: > > void foo(void) { > static unsigned char count; > > if (++count & 0x3f) { > PORTC &= ~0x01; > } else { > PORTC |= 0x01; > } > } > > Both the "&" and the comparison with zero are done as 16-bit. Please, don't use stuff like PORTC that is nor available to the general GCC developer. They won't be able to reproduce this artifact! This is not specific to the avr part and related to PR43088. Write yout test case like that: typedef unsigned char uint8_t; #define PORTC (*((uint8_t volatile*) 0x28)) void foo(void) { static unsigned char count; if (++count & 0x3f) { PORTC &= ~0x01; } else { PORTC |= 0x01; } } void foo3e(void) { static unsigned char count; if (++count & 0x3e) { PORTC &= ~0x01; } else { PORTC |= 0x01; } }
Compiled the following code from commant #0 and commant #1 with avr-gcc 4.7 (SVN 179594) #define uint8_t unsigned char #define PORTC (*((uint8_t volatile*) 0x28)) extern uint8_t data[64]; uint8_t bar(uint8_t x, uint8_t y) { return data[y ^ (x & 0x0f)]; } uint8_t bar2(uint8_t x, uint8_t y) { return data[(y ^ x) & 0x0f]; } void foo(void) { static unsigned char count; if (++count & 0x3f) { PORTC &= ~0x01; } else { PORTC |= 0x01; } } With -Os -dp yields the following result: bar: andi r24,lo8(15) eor r24,r22 mov r30,r24 ldi r31,lo8(0) subi r30,lo8(-(data)) sbci r31,hi8(-(data)) ld r24,Z ret bar2: eor r22,r24 andi r22,lo8(15) mov r30,r22 ldi r31,lo8(0) subi r30,lo8(-(data)) sbci r31,hi8(-(data)) ld r24,Z ret foo: lds r24,count.1232 subi r24,lo8(-(1)) sts count.1232,r24 andi r24,lo8(63) breq .L4 cbi 40-0x20,0 ret .L4: sbi 40-0x20,0 ret Thus, closing this PR as FIXED because the code is optimal and nothing remains to be improved.