GCC version: avr-gcc (GCC) 4.8.1 Compiling with either -O1 or -O2 optimizations enabled, tested with some ATMega and XMega targets. Test case: #include <stdint.h> uint8_t volatile tmp; __attribute__((noinline)) void test_64(uint64_t d64) { if ((d64 & 0xFF800000UL) == 0xFF800000UL){ tmp ++; } } __attribute__((noinline)) void test_32(uint32_t d32) { if ((d32 & 0xFF800000UL) == 0xFF800000UL){ tmp ++; } } int main(void) { test_64(0); test_32(0); while(1); } A cut from the output assembly, showing the critical part (the code generated for test_64, test_32 and main): 00000228 <test_64>: 228: 08 95 ret 0000022a <test_32>: 22a: 66 27 eor r22, r22 22c: 77 27 eor r23, r23 22e: 80 78 andi r24, 0x80 ; 128 230: 61 15 cp r22, r1 232: 71 05 cpc r23, r1 234: 80 48 sbci r24, 0x80 ; 128 236: 9f 4f sbci r25, 0xFF ; 255 238: 09 f0 breq .+2 ; 0x23c <test_32+0x12> 23a: 08 95 ret 23c: 80 91 00 20 lds r24, 0x2000 240: 8f 5f subi r24, 0xFF ; 255 242: 80 93 00 20 sts 0x2000, r24 246: 08 95 ret 00000248 <main>: 248: 20 e0 ldi r18, 0x00 ; 0 24a: 30 e0 ldi r19, 0x00 ; 0 24c: 40 e0 ldi r20, 0x00 ; 0 24e: 50 e0 ldi r21, 0x00 ; 0 250: 60 e0 ldi r22, 0x00 ; 0 252: 70 e0 ldi r23, 0x00 ; 0 254: 80 e0 ldi r24, 0x00 ; 0 256: 90 e0 ldi r25, 0x00 ; 0 258: 0e 94 14 01 call 0x228 ; 0x228 <test_64> 25c: 60 e0 ldi r22, 0x00 ; 0 25e: 70 e0 ldi r23, 0x00 ; 0 260: cb 01 movw r24, r22 262: 0e 94 15 01 call 0x22a ; 0x22a <test_32> 266: ff cf rjmp .-2 ; 0x266 <main+0x1e> It seems like the compiler incorrectly determines that the "if" is always false in the 64 bit case, and produces a corresponding result (a function doing nothing). A native version of GCC produces the expected code (the comparison being performed).
Note GCC 4.8 is long out of maintainance and GCC 4.8.1 is a particularly old version from the branch. Please try a still supported compiler which would be GCC 6.4, GCC 7.3 or GCC 8.1. If you can't do that please try at least the latest GCC 4.8 based compiler which is GCC 4.8.5.
Tested on a different machine: avr-gcc (GCC) 4.9.2 This is what comes with Debian Jessie. The behavior is present (function compiles to a single "ret").
I don't have reasonably easy access to a newer version as it doesn't seem like there were precompiled binaries available for Linux which I could try without much hassle. If someone had one laying around, I would appreciate if he gave a go to the sample with an avr-gcc -S -O1 sample.c where sample.c contained the code above. The resulting assembly file would show whether the particular GCC version was affected.
I tried it with the package offered by Microchip, which has avr-gcc 5.4.0, the behavior is the same, bug is present.
I received a test report with avr-gcc 8.1.0 , -O2 optimization level: The behavior is present ( https://www.avrfreaks.net/comment/2477081#comment-2477081 ).
Lokks like a bug in insn combiner, hence rtl optimization issue, not a target issue. Test case: typedef __UINT64_TYPE__ uint64_t; char tmp; void test_64 (uint64_t d64) { if ((d64 & 0xFF800000UL) == 0xFF800000UL) tmp++; } Compiling with v8.0.1 $ avr-gcc foo.c -Os -c -mmcu=avr5 -save-temps -dap .combine dump reads: Trying 30 -> 31: 30: {cc0=cmp(r18:DI,0xff800000);clobber scratch;} REG_DEAD r18:DI 31: pc={(cc0!=0)?L38:pc} REG_BR_PROB 708669604 Successfully matched this instruction: (set (pc) (label_ref:HI 38)) allowing combination of insns 30 and 31 i.e. combiner combines the 64-bit comparison of reg:DI 18 against the constant with the conditional jump on CC0 to an UNCONDITIONAL jump. Hence anything that is used to set CC0 becomes unused and is thrown away in the remainder... with -fdisable-rtl-combine the final asm looks correct and reads: test_64: andi r20,lo8(-128) ; 16 [c=4 l=1] andqi3/1 ldi r18,0 ; 22 [c=4 l=1] movqi_insn/0 ldi r19,0 ; 23 [c=4 l=1] movqi_insn/0 ldi r22,0 ; 26 [c=4 l=1] movqi_insn/0 ldi r23,0 ; 27 [c=4 l=1] movqi_insn/0 ldi r24,0 ; 28 [c=4 l=1] movqi_insn/0 ldi r25,0 ; 29 [c=4 l=1] movqi_insn/0 cp r18,__zero_reg__ ; 30 [c=4 l=8] compare_const_di2 cpc r19,__zero_reg__ sbci r20,-128 sbci r21,-1 cpc r22,__zero_reg__ cpc r23,__zero_reg__ cpc r24,__zero_reg__ cpc r25,__zero_reg__ brne .L1 ; 31 [c=16 l=1] branch lds r24,tmp ; 33 [c=4 l=2] movqi_insn/3 subi r24,lo8(-(1)) ; 34 [c=4 l=1] addqi3/1 sts tmp,r24 ; 35 [c=4 l=2] movqi_insn/2 .L1: ret ; 53 [c=0 l=1] return Insns 16..29 perform the AND of the 64-bit value held in r18...r25, insn 30 performs the comparisons against the constant and sets CC0.
I suspect this is because we have hard regs here, not pseudos. Not a good idea in general, which is why other targets don't do this. Perhaps it is a mode mixup in the known value tracking? Confirmed.
Author: segher Date: Thu Jul 26 10:16:48 2018 New Revision: 262994 URL: https://gcc.gnu.org/viewcvs?rev=262994&root=gcc&view=rev Log: combine: Another hard register problem (PR85805) The current code in reg_nonzero_bits_for_combine allows using the reg_stat info when last_set_mode is a different integer mode. This is completely wrong for non-pseudos. For example, as in the PR, a value in a DImode hard register is set by eight writes to its constituent QImode parts. The value written to the DImode is not the same as that written to the lowest-numbered QImode! PR rtl-optimization/85805 * combine.c (reg_nonzero_bits_for_combine): Only use the last set value for hard registers if that was written in the same mode. Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c
Fixed for 9+ then?
It is fixed for 9 yes, and I am still pondering it for 8. I guess that's not going to happen.
Segher: Then let's close it?
Yup, closing.
Author: segher Date: Thu Feb 14 18:46:18 2019 New Revision: 268888 URL: https://gcc.gnu.org/viewcvs?rev=268888&root=gcc&view=rev Log: Backport from trunk 2018-07-26 Segher Boessenkool <segher@kernel.crashing.org> PR rtl-optimization/85805 * combine.c (reg_nonzero_bits_for_combine): Only use the last set value for hard registers if that was written in the same mode. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/combine.c
I backported it anyway.