This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint


(cc'ing gcc@gcc.gnu.org)

On Nov 21, 2007 2:38 AM, Wouter van Gulik <avrmail@xs4all.nl> wrote:
> Also consider the fuse bit get routine. This scheme gives more knowledge
> to the compiler, unfortunately gcc fails to see the loading of r31 can
> done once:
>
> using this:
>
> =========================================================================
> static inline uint8_t boot_lock_fuse_bits_new(uint16_t address)
> {
>      uint8_t result;
>      register uint16_t adr asm("r30") = address; //make sure it's in z
> register aka r30:r31
>
>      asm volatile(
>          "sts %1, %2\n\t"
>          "lpm %0, Z"
>          : "=r" (result)
>          : "i" (_SFR_MEM_ADDR(__SPM_REG)),
>            "r" ((uint8_t)__BOOT_LOCK_BITS_SET),
>            "z" (adr)
>          : "r0"
>      );
>      return result;
> }
>
> uint8_t bar(void)
> {
>         uint8_t temp;
>         uint16_t adr = 0;
>         temp  = boot_lock_fuse_bits_new(adr++);
>         temp += boot_lock_fuse_bits_new(adr++);
>         temp += boot_lock_fuse_bits_new(adr++);
>         temp += boot_lock_fuse_bits_new(adr++);
>         return temp;
> }
>
> =========================================================================
>
> It gives this assembler output:
> .global bar
>         .type   bar, @function
> bar:
> /* prologue: frame size=0 */
> /* prologue end (size=0) */
>         ldi r30,lo8(0)   ;  8   *movhi/4        [length = 2]
>         ldi r31,hi8(0)
>         ldi r25,lo8(9)   ;  10  *movqi/2        [length = 1]
> /* #APP */
>         sts 87, r25
>         lpm r24, Z
> /* #NOAPP */
>         ldi r30,lo8(1)   ;  16  *movhi/4        [length = 2]
>         ldi r31,hi8(1)
> /* #APP */
>         sts 87, r25
>         lpm r30, Z
> /* #NOAPP */
>         add r24,r30      ;  22  addqi3/1        [length = 1]
>         ldi r30,lo8(2)   ;  24  *movhi/4        [length = 2]
>         ldi r31,hi8(2)
> /* #APP */
>         sts 87, r25
>         lpm r18, Z
> /* #NOAPP */
>         ldi r30,lo8(3)   ;  29  *movhi/4        [length = 2]
>         ldi r31,hi8(3)
> /* #APP */
>         sts 87, r25
>         lpm r25, Z
> /* #NOAPP */
>         add r25,r18      ;  36  addqi3/1        [length = 1]
>         add r24,r25      ;  37  addqi3/1        [length = 1]
>         clr r25  ;  45  zero_extendqihi2/1      [length = 1]
> /* epilogue: frame size=0 */
>         ret
> /* epilogue end (size=1) */
> /* function bar size 30 (29) */
>         .size   bar, .-bar
>
>
> This is not smaller nor faster but it could have been. If gcc would
> leave r31, or do a adiw
> I tried against 4.1.2 using -Wall -Os -mmcu=atmega16. Maybe 4.2.2 or
> 4.3.0 is better?
>
> It does however use r30 as output which could save some speed and code
> when no other register is available.
>
> HTH,
>
> Wouter

I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR. I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.

Cheers,
Shaun


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]