This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
- From: Wouter van Gulik <avrmail at xs4all dot nl>
- To: Shaun Jackman <sjackman at gmail dot com>
- Cc: avr-libc-dev at nongnu dot org, gcc at gcc dot gnu dot org
- Date: Wed, 21 Nov 2007 22:47:58 +0100
- Subject: Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
- References: <20071120-230614.sv63780.63200@savannah.nongnu.org> <20071120-201939.sv9370.67476@savannah.nongnu.org> <20071121-034953.sv63780.88444@savannah.nongnu.org> <4743E89B.90806@xs4all.nl> <4743FC78.9010401@xs4all.nl> <7f45d9390711211004h464ad2d3h743c6e8b90b3873e@mail.gmail.com>
Shaun Jackman wrote:
I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR.
Phew, I just tried this:
================================
extern unsigned char foo2(char *);
unsigned char bar2(char *p)
{
unsigned char tmp;
tmp = foo2(p++);
tmp += foo2(p++);
tmp += foo2(p++);
tmp += foo2(p++);
return tmp;
}
================================
Note this is compiled against avr-gcc-4.2.2 using:
avr-gcc -Wall -Os -mmcu=atmega16 -dp -S
================================
bar2:
/* prologue: frame size=0 */
push r13
push r14
push r15
push r16
push r17
/* prologue end (size=5) */
movw r16,r24 ; 52 *movhi/1 [length = 1]
subi r16,lo8(-(1)) ; 11 *addhi3/4 [length = 2]
sbci r17,hi8(-(1))
call foo2 ; 13 call_value_insn/3 [length = 2]
mov r13,r24 ; 14 *movqi/1 [length = 1]
movw r14,r16 ; 53 *movhi/1 [length = 1]
sec ; 16 *addhi3/5 [length = 3]
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16 ; 17 *movhi/1 [length = 1]
call foo2 ; 18 call_value_insn/3 [length = 2]
mov r17,r24 ; 19 *movqi/1 [length = 1]
movw r24,r14 ; 21 *movhi/1 [length = 1]
call foo2 ; 22 call_value_insn/3 [length = 2]
mov r16,r24 ; 23 *movqi/1 [length = 1]
movw r24,r14 ; 54 *movhi/1 [length = 1]
adiw r24,1 ; 26 *addhi3/2 [length = 1]
call foo2 ; 27 call_value_insn/3 [length = 2]
add r17,r13 ; 30 addqi3/1 [length = 1]
add r17,r16 ; 32 addqi3/1 [length = 1]
add r17,r24 ; 33 addqi3/1 [length = 1]
mov r24,r17 ; 41 zero_extendqihi2/2 [length = 2]
clr r25
/* epilogue: frame size=0 */
pop r17
pop r16
pop r15
pop r14
pop r13
ret
================================
What is going on here? I can imagine gcc not finding the (register
allocation wise) optimal pattern:
movw r24, rtmp
adiw r24, 1
move rtmp, r24
But now it has the pointer twice! Why?!? It gets ok when doing the last
call, but there the ++ is useless. Note that it's functional equivalent
using ++p is slightly better.
It would of course be most optimal if using Y, then it would be a simple:
adiw 28, 1
movw r24, r28
call foo2
add 17, r24
adiw 28, 1
movw r24, r28
call foo2
Saving stack and code, but that's probably hard to figure out for gcc
since r28:r29 is normally the frame pointer... so only if it's unused it
could (and probably is always best) to allocate it for a 16 bit var or
pointer. Maybe this idea is worth a look?
I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.
This could also be the avr implementation not being open enough about
the movhi insn for constants.
Since it's quite bad incrementing a 16 bit for non immediate capable
registers it actually is not such a bad idea to load it. But then again
we are talking about r30:r31 here... nevermind...
HTH,
Wouter