Given the following input: char greater_than_64(long long x, long long y) { return x > y; } char greater_than_32(long x, long y) { return x > y; } And running "avr-gcc -Os -c lol.c -o lol.o", I get the following output 00000064 <greater_than_32>: 64: ef 92 push r14 66: ff 92 push r15 68: 0f 93 push r16 6a: 1f 93 push r17 6c: e2 2e mov r14, r18 6e: f3 2e mov r15, r19 70: 04 2f mov r16, r20 72: 15 2f mov r17, r21 74: 31 e0 ldi r19, 0x01 ; 1 76: e6 16 cp r14, r22 78: f7 06 cpc r15, r23 7a: 08 07 cpc r16, r24 7c: 19 07 cpc r17, r25 7e: 04 f0 brlt .+0 ; 0x80 <greater_than_32+0x1c> 80: 30 e0 ldi r19, 0x00 ; 0 82: 83 2f mov r24, r19 84: 1f 91 pop r17 86: 0f 91 pop r16 88: ff 90 pop r15 8a: ef 90 pop r14 8c: 08 95 ret This is much longer than necessary; the input values could be left in their original registers, rather than moved before comparison. I'm not sure why GCC does this. The 8-byte version is even worse: 00000000 <greater_than_64>: 0: af 92 push r10 2: bf 92 push r11 4: cf 92 push r12 6: df 92 push r13 8: ef 92 push r14 a: ff 92 push r15 c: 0f 93 push r16 e: 1f 93 push r17 10: e1 e0 ldi r30, 0x01 ; 1 12: 19 17 cp r17, r25 14: 04 f0 brlt .+0 ; 0x16 <greater_than_64+0x16> 16: 91 17 cp r25, r17 18: 01 f4 brne .+0 ; 0x1a <greater_than_64+0x1a> 1a: 08 17 cp r16, r24 1c: 00 f0 brcs .+0 ; 0x1e <greater_than_64+0x1e> 1e: 80 17 cp r24, r16 20: 01 f4 brne .+0 ; 0x22 <greater_than_64+0x22> 22: f7 16 cp r15, r23 24: 00 f0 brcs .+0 ; 0x26 <greater_than_64+0x26> 26: 7f 15 cp r23, r15 28: 01 f4 brne .+0 ; 0x2a <greater_than_64+0x2a> 2a: e6 16 cp r14, r22 2c: 00 f0 brcs .+0 ; 0x2e <greater_than_64+0x2e> 2e: 6e 15 cp r22, r14 30: 01 f4 brne .+0 ; 0x32 <greater_than_64+0x32> 32: d5 16 cp r13, r21 34: 00 f0 brcs .+0 ; 0x36 <greater_than_64+0x36> 36: 5d 15 cp r21, r13 38: 01 f4 brne .+0 ; 0x3a <greater_than_64+0x3a> 3a: c4 16 cp r12, r20 3c: 00 f0 brcs .+0 ; 0x3e <greater_than_64+0x3e> 3e: 4c 15 cp r20, r12 40: 01 f4 brne .+0 ; 0x42 <greater_than_64+0x42> 42: b3 16 cp r11, r19 44: 00 f0 brcs .+0 ; 0x46 <greater_than_64+0x46> 46: 3b 15 cp r19, r11 48: 01 f4 brne .+0 ; 0x4a <greater_than_64+0x4a> 4a: a2 16 cp r10, r18 4c: 00 f0 brcs .+0 ; 0x4e <greater_than_64+0x4e> 4e: e0 e0 ldi r30, 0x00 ; 0 50: 8e 2f mov r24, r30 52: 1f 91 pop r17 54: 0f 91 pop r16 56: ff 90 pop r15 58: ef 90 pop r14 5a: df 90 pop r13 5c: cf 90 pop r12 5e: bf 90 pop r11 60: af 90 pop r10 62: 08 95 ret Here there are many redundant jumps as well as pushes and pops. It seems like these could be trivially eliminated with a peephole pass. I'm running a build straight out of trunk as of today: Using built-in specs. COLLECT_GCC=avr-gcc COLLECT_LTO_WRAPPER=/usr/local/avr/libexec/gcc/avr/4.5.0/lto-wrapper Target: avr Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c --disable-nls --disable-libssp --with-dwarf2 : (reconfigured) ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c --disable-nls --disable-libssp --with-dwarf2 Thread model: single gcc version 4.5.0 20100216 (experimental) (GCC)
Closed as WORKS FOR ME I compiled that with $ avr-gcc-4.6.1 -dp -Os -S -mmcu=atmega88 and the result is: greater_than_32: ldi r30,lo8(1) ; 7 *movqi/2 [length = 1] cp r18,r22 ; 8 *cmpsi/2 [length = 4] cpc r19,r23 cpc r20,r24 cpc r21,r25 brlt .L2 ; 9 branch [length = 1] ldi r30,lo8(0) ; 10 *movqi/1 [length = 1] .L2: mov r24,r30 ; 16 *movqi/1 [length = 1] ret ; 28 return [length = 1] For the 64-bit case, user must be aware that avr-gcc will produce bulky code. Because of shortness of contributors to avr-gcc, the avr port maintainers regard missed 64-bit optimizations as WON'T FIX.