See below. This bug was found using Avrora bug is reproducible in AVR studio. The problem exists in HEAD but not apparently in the 4.3 series. The generated code looks to be wrong at -O2 and -Os. regehr@john-home:~$ avr-gcc -v Using built-in specs. Target: avr Configured with: ../configure --prefix=/home/regehr/avrgcc440/install --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp : (reconfigured) ../configure --prefix=/home/regehr/avrgcc440/install --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp : (reconfigured) ../configure --prefix=/home/regehr/avrgcc440/install --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp Thread model: single gcc version 4.5.0 20090403 (experimental) (GCC) regehr@john-home:~$ avr-gcc -O1 -mmcu=atmega128 small.c -o small.elf regehr@john-home:~$ java avrora.Main -colors=false -monitors=break small.elf Avrora [Beta 1.7.107] - (c) 2003-2007 UCLA Compilers Group Loading small.elf...[OK: 0.060 seconds] =={ Simulation events }======================================================= Node Time Event ------------------------------------------------------------------------------ 0 178 break instruction @ 0x0130, r30:r31 = 0x0001 ============================================================================== Simulated time: 179 cycles Time for simulation: 0.061 seconds Total throughput: 0.0029344263 mhz regehr@john-home:~$ avr-gcc -O2 -mmcu=atmega128 small.c -o small.elf regehr@john-home:~$ java avrora.Main -colors=false -monitors=break small.elf Avrora [Beta 1.7.107] - (c) 2003-2007 UCLA Compilers Group Loading small.elf...[OK: 0.057 seconds] =={ Simulation events }======================================================= Node Time Event ------------------------------------------------------------------------------ 0 166 break instruction @ 0x0122, r30:r31 = 0x0C00 ============================================================================== Simulated time: 167 cycles Time for simulation: 0.061 seconds Total throughput: 0.002737705 mhz regehr@john-home:~$ cat small.c #include <stdint.h> #include <limits.h> #define safe_mod_macro_uint8_t_u_u(ui1,ui2) \ ((((uint8_t)(ui2)) == ((uint8_t)0)) \ ? ((uint8_t)(ui1)) \ : (((uint8_t)(ui1)) % ((uint8_t)(ui2)))) static uint8_t safe_mod_func_uint8_t_u_u (uint8_t _ui1, uint8_t _ui2) { return safe_mod_macro_uint8_t_u_u(_ui1,_ui2); } #define safe_lshift_macro_uint16_t_u_u(left,right) \ (((((unsigned int)(right)) >= sizeof(uint16_t)*CHAR_BIT) \ || (((uint16_t)(left)) > ((UINT16_MAX) >> ((unsigned int)(right))))) \ ? ((uint16_t)(left)) \ : (((uint16_t)(left)) << ((unsigned int)(right)))) static uint16_t safe_lshift_func_uint16_t_u_u(uint16_t _left, unsigned int _right) { return safe_lshift_macro_uint16_t_u_u(_left,_right); } int8_t func_7 (uint8_t p_8); int8_t func_7 (uint8_t p_8) { if (safe_mod_func_uint8_t_u_u (0xC, safe_lshift_func_uint16_t_u_u (p_8, p_8))) { return 0; } return 1; } static inline void platform_main_end(int8_t crc); static inline void platform_main_end(int8_t crc) { asm volatile("cli" "\n\t" "mov r30, %A0" "\n\t" "mov r31, %B0" "\n\t" "break" : : "r" (crc) : "memory" ); } int main (void) { platform_main_end (func_7(1)); return 0; }
To be clear: func_7() should return 1, but does not at -O2 and -Os. In the Avrora output the result of the function looks like this: r30:r31 = 0x0001 or r30:r31 = 0x0C00
The problem is it uses R22 as loop counter and as the shift counter. I could not (yet) reproduce the problem in a smaller example. This could be the same problem as: http://lists.gnu.org/archive/html/avr-gcc-list/2009-03/msg00203.html or http://lists.gnu.org/archive/html/avr-gcc-list/2009-03/msg00158.html ; (insn 16 15 17 main.c:24 (set (reg/v:HI 22 r22 [orig:42 _left ] [42]) ; (ashift:HI (reg/v:HI 22 r22 [orig:42 _left ] [42]) ; (reg:QI 22 r22 [orig:54 _right.2 ] [54]))) 54 {ashlhi3} (expr_list:REG_DEAD (reg:QI 18 r18 [orig:54 _right.2 ] [54]) ; (expr_list:REG_UNUSED (reg:QI 23 r23) ; (nil)))) rjmp 2f ; 16 ashlhi3/1 [length = 6] 1: lsl r22 rol r23 2: dec r22 brpl 1b An other interresting point is the rtl stating reg 23 is unused. Unfortunatly the avr backend can not remove the useless shift. But that is another issue.
(In reply to comment #2) > The problem is it uses R22 as loop counter and as the shift counter. I could > not (yet) reproduce the problem in a smaller example. This is my minimal test case: static uint16_t safe_lshift_func_uint16_t_u_u(uint16_t _left, unsigned int _right) { return _left << _right; } char func_7 (uint8_t p_8) { if(safe_lshift_func_uint16_t_u_u (p_8, p_8)) { return 0; } return 1; } Not using the function but shifting it directly solves the problem in this case. It seems to me this problem is triggered when the function is inlined. Note that this also fails for 4.1.2, 4.2.2, 4.3.0 and 4.3.2. (WinAvr20070525, 20071221, 20080610, 20081205) > This could be the same problem as: > http://lists.gnu.org/archive/html/avr-gcc-list/2009-03/msg00203.html > or > http://lists.gnu.org/archive/html/avr-gcc-list/2009-03/msg00158.html > > ; (insn 16 15 17 main.c:24 (set (reg/v:HI 22 r22 [orig:42 _left ] [42]) > ; (ashift:HI (reg/v:HI 22 r22 [orig:42 _left ] [42]) > ; (reg:QI 22 r22 [orig:54 _right.2 ] [54]))) 54 {ashlhi3} > (expr_list:REG_DEAD (reg:QI 18 r18 [orig:54 _right.2 ] [54]) > ; (expr_list:REG_UNUSED (reg:QI 23 r23) > ; (nil)))) > rjmp 2f ; 16 ashlhi3/1 [length = 6] > 1: lsl r22 > rol r23 > 2: dec r22 > brpl 1b > > An other interresting point is the rtl stating reg 23 is unused. Unfortunatly > the avr backend can not remove the useless shift. But that is another issue. >
This looks like a duplicate of bug #39386. *** This bug has been marked as a duplicate of 39386 ***
It looks like most of AVR shift/rotates are messed up. For the case we where we have non constant shifts, the peephole may grab a scratch register. In this case it look like it grabs one that is free afterwards and not before. Hence overlap issue The rotate split pattern problem is different as noted in message links. . In this case it is not apparent why the split is used for only different source/destination registers. If the split were constrained so that src=dest the overlap would be much easily to handle and it would seemingly produce better code for the common case where src=dest.