I've found what looks like a missed optimization in avr-gcc 3.3 20030512 uint8_t Encoder; Encoder = (PORTD >> 4) & 12; 0: 82 b3 in r24, 0x12 ; 18 2: 82 95 swap r24 4: 8f 70 andi r24, 0x0F ; 15 6: 8c 70 andi r24, 0x0C ; 12 The 4 bit shift is implemented as a swap/andi , but with an explicit andi afterwards, it seems like an obvious optimization to combine them. Trivial example code below, compiled with: avr-gcc -c e.c -g -O2 -mmcu=at90s8535 -o e.o #include <avr/io.h> #include <inttypes.h> int8_t test () { uint8_t Encoder; Encoder = (PORTD >> 4) & 12; return(Encoder); }
Can you provide the preprocessed source?
Subject: Re: [avr] gcc Double 'andi' missed optimization On Sat, 6 Jul 2003, pinskia at physics dot uc dot edu wrote: > ------- Additional Comments From pinskia at physics dot uc dot edu 2003-07-06 01:32 ------- > Can you provide the preprocessed source? Sure -- how do I do that ? ========================================================== Chris Candreva -- chris@westnet.com -- (914) 967-7816 WestNet Internet Services of Westchester http://www.westnet.com/
Read <http://gcc.gnu.org/bugs.html>. Also you can get the preprocessed source by adding - save-temps.
Subject: Re: [avr] gcc Double 'andi' missed optimization e.i -------------------------------- # 1 "e.c" # 1 "<built-in>" # 1 "<command line>" # 1 "e.c" # 1 "/usr/local/avr/include/avr/io.h" 1 3 # 81 "/usr/local/avr/include/avr/io.h" 3 # 1 "/usr/local/avr/include/avr/sfr_defs.h" 1 3 # 82 "/usr/local/avr/include/avr/io.h" 2 3 # 189 "/usr/local/avr/include/avr/io.h" 3 # 1 "/usr/local/avr/include/avr/io8535.h" 1 3 # 190 "/usr/local/avr/include/avr/io.h" 2 3 # 2 "e.c" 2 # 1 "/usr/local/avr/include/inttypes.h" 1 3 # 60 "/usr/local/avr/include/inttypes.h" 3 typedef signed char int8_t; typedef unsigned char uint8_t; # 83 "/usr/local/avr/include/inttypes.h" 3 typedef int int16_t; typedef unsigned int uint16_t; # 99 "/usr/local/avr/include/inttypes.h" 3 typedef long int32_t; typedef unsigned long uint32_t; # 117 "/usr/local/avr/include/inttypes.h" 3 typedef long long int64_t; typedef unsigned long long uint64_t; # 134 "/usr/local/avr/include/inttypes.h" 3 typedef int16_t intptr_t; typedef uint16_t uintptr_t; # 3 "e.c" 2 int8_t test () { uint8_t Encoder; Encoder = ((*(volatile unsigned char *)((0x12) + 0x20)) >> 4) & 12; return(Encoder); }
I can confirm this on the mainline (20030705): in r24,50-0x20 swap r24 andi r24,0x0f andi r24,lo8(12)
The AVR backend defines its lshrqi3 patterns as opaque "macro" sequences, hence the fact that "x >> 4" is implemented as a swap/rotate followed by an and instruction is never exposed to GCC's RTL optimizers. If it were, the two ands would be optimized away by CSE, combine or peephole, as they are for other targets. As there's nothing the middle-end can do unless the AVR backend is rewritten to use RTL expanders, I'm changing this PR's component to target/11259.
Bug still exists on 4.2.1.
Created attachment 14658 [details] example patch This patch is an example of the suggestion in comment #6. When compiling with -S -dp, it is clear why the code isn't optimized: "swap+andi" is one instruction. test: /* prologue: function */ /* frame size = 0 */ in r24,50-0x20 ; 6 *movqi/4 [length = 1] swap r24 ; 7 lshrqi3/5 [length = 2] andi r24,0x0f andi r24,lo8(12) ; 13 andqi3/2 [length = 1] /* epilogue start */ ret ; 26 return [length = 1] The patch makes two instructions out of "swap"+"andi" of which "andi" is optimized away: test: /* prologue: function */ /* frame size = 0 */ in r24,50-0x20 ; 6 *movqi/4 [length = 1] swap r24 ; 7 _rotlqi3_const4 [length = 1] andi r24,lo8(12) ; 14 andqi3/2 [length = 1] /* epilogue start */ ret ; 27 return [length = 1]
Subject: Bug 11259 Author: aesok Date: Fri Aug 22 21:24:56 2008 New Revision: 139502 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=139502 Log: PR target/11259 * config/avr/avr.md (UNSPEC_SWAP): New constants. (*swap): New insn pattern. (*ashlqi3): Rename from ashlqi3 insn pattern. (ashlqi3): New expanders. (*lshrqi3): Rename from lshrqi3 insn pattern. (lshrqi3): New expanders. (ashlqi3_const4, ashlqi3_const5, ashlqi3_const6, lshrqi3_const4, lshrqi3_const5, lshrqi3_const6): New splitters. (andi, ashlqi3_l_const4, ashlqi3_l_const5, ashlqi3_l_const6, lshrqi3_l_const4, lshrqi3_l_const5, lshrqi3_l_const6): Define peephole2 patterns. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr.md
Fixed on 4.4.