This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
- From: "Shaun Jackman" <sjackman at gmail dot com>
- To: avr-libc-dev at nongnu dot org, gcc at gcc dot gnu dot org
- Date: Wed, 21 Nov 2007 11:04:33 -0700
- Subject: Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint
- References: <20071120-230614.sv63780.63200@savannah.nongnu.org> <20071120-201939.sv9370.67476@savannah.nongnu.org> <20071121-034953.sv63780.88444@savannah.nongnu.org> <4743E89B.90806@xs4all.nl> <4743FC78.9010401@xs4all.nl>
- Reply-to: "Shaun Jackman" <sjackman at gmail dot com>
(cc'ing gcc@gcc.gnu.org)
On Nov 21, 2007 2:38 AM, Wouter van Gulik <avrmail@xs4all.nl> wrote:
> Also consider the fuse bit get routine. This scheme gives more knowledge
> to the compiler, unfortunately gcc fails to see the loading of r31 can
> done once:
>
> using this:
>
> =========================================================================
> static inline uint8_t boot_lock_fuse_bits_new(uint16_t address)
> {
> uint8_t result;
> register uint16_t adr asm("r30") = address; //make sure it's in z
> register aka r30:r31
>
> asm volatile(
> "sts %1, %2\n\t"
> "lpm %0, Z"
> : "=r" (result)
> : "i" (_SFR_MEM_ADDR(__SPM_REG)),
> "r" ((uint8_t)__BOOT_LOCK_BITS_SET),
> "z" (adr)
> : "r0"
> );
> return result;
> }
>
> uint8_t bar(void)
> {
> uint8_t temp;
> uint16_t adr = 0;
> temp = boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> return temp;
> }
>
> =========================================================================
>
> It gives this assembler output:
> .global bar
> .type bar, @function
> bar:
> /* prologue: frame size=0 */
> /* prologue end (size=0) */
> ldi r30,lo8(0) ; 8 *movhi/4 [length = 2]
> ldi r31,hi8(0)
> ldi r25,lo8(9) ; 10 *movqi/2 [length = 1]
> /* #APP */
> sts 87, r25
> lpm r24, Z
> /* #NOAPP */
> ldi r30,lo8(1) ; 16 *movhi/4 [length = 2]
> ldi r31,hi8(1)
> /* #APP */
> sts 87, r25
> lpm r30, Z
> /* #NOAPP */
> add r24,r30 ; 22 addqi3/1 [length = 1]
> ldi r30,lo8(2) ; 24 *movhi/4 [length = 2]
> ldi r31,hi8(2)
> /* #APP */
> sts 87, r25
> lpm r18, Z
> /* #NOAPP */
> ldi r30,lo8(3) ; 29 *movhi/4 [length = 2]
> ldi r31,hi8(3)
> /* #APP */
> sts 87, r25
> lpm r25, Z
> /* #NOAPP */
> add r25,r18 ; 36 addqi3/1 [length = 1]
> add r24,r25 ; 37 addqi3/1 [length = 1]
> clr r25 ; 45 zero_extendqihi2/1 [length = 1]
> /* epilogue: frame size=0 */
> ret
> /* epilogue end (size=1) */
> /* function bar size 30 (29) */
> .size bar, .-bar
>
>
> This is not smaller nor faster but it could have been. If gcc would
> leave r31, or do a adiw
> I tried against 4.1.2 using -Wall -Os -mmcu=atmega16. Maybe 4.2.2 or
> 4.3.0 is better?
>
> It does however use r30 as output which could save some speed and code
> when no other register is available.
>
> HTH,
>
> Wouter
I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR. I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.
Cheers,
Shaun