This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint


Shaun Jackman wrote:

I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR.
Phew, I just tried this:

================================
extern unsigned char foo2(char *);

unsigned char bar2(char *p)
{
  unsigned char tmp;
  tmp  = foo2(p++);
  tmp += foo2(p++);
  tmp += foo2(p++);
  tmp += foo2(p++);
  return tmp;
}
================================

Note this is compiled against avr-gcc-4.2.2 using:
avr-gcc -Wall -Os -mmcu=atmega16 -dp -S

================================
bar2:
/* prologue: frame size=0 */
   push r13
   push r14
   push r15
   push r16
   push r17
/* prologue end (size=5) */
   movw r16,r24     ;  52    *movhi/1    [length = 1]
   subi r16,lo8(-(1))     ;  11    *addhi3/4    [length = 2]
   sbci r17,hi8(-(1))
   call foo2     ;  13    call_value_insn/3    [length = 2]
   mov r13,r24     ;  14    *movqi/1    [length = 1]
   movw r14,r16     ;  53    *movhi/1    [length = 1]
   sec     ;  16    *addhi3/5    [length = 3]
   adc r14,__zero_reg__
   adc r15,__zero_reg__
   movw r24,r16     ;  17    *movhi/1    [length = 1]
   call foo2     ;  18    call_value_insn/3    [length = 2]
   mov r17,r24     ;  19    *movqi/1    [length = 1]
   movw r24,r14     ;  21    *movhi/1    [length = 1]
   call foo2     ;  22    call_value_insn/3    [length = 2]
   mov r16,r24     ;  23    *movqi/1    [length = 1]
   movw r24,r14     ;  54    *movhi/1    [length = 1]
   adiw r24,1     ;  26    *addhi3/2    [length = 1]
   call foo2     ;  27    call_value_insn/3    [length = 2]
   add r17,r13     ;  30    addqi3/1    [length = 1]
   add r17,r16     ;  32    addqi3/1    [length = 1]
   add r17,r24     ;  33    addqi3/1    [length = 1]
   mov r24,r17     ;  41    zero_extendqihi2/2    [length = 2]
   clr r25
/* epilogue: frame size=0 */
   pop r17
   pop r16
   pop r15
   pop r14
   pop r13
   ret

================================

What is going on here? I can imagine gcc not finding the (register allocation wise) optimal pattern:

movw r24, rtmp
adiw r24, 1
move rtmp, r24

But now it has the pointer twice! Why?!? It gets ok when doing the last call, but there the ++ is useless. Note that it's functional equivalent using ++p is slightly better.

It would of course be most optimal if using Y, then it would be a simple:

adiw 28, 1
movw r24, r28
call foo2
add 17, r24
adiw 28, 1
movw r24, r28
call foo2

Saving stack and code, but that's probably hard to figure out for gcc since r28:r29 is normally the frame pointer... so only if it's unused it could (and probably is always best) to allocate it for a 16 bit var or pointer. Maybe this idea is worth a look?
I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.
This could also be the avr implementation not being open enough about the movhi insn for constants.
Since it's quite bad incrementing a 16 bit for non immediate capable registers it actually is not such a bad idea to load it. But then again we are talking about r30:r31 here... nevermind...


HTH,

Wouter


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]