Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 39819
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: David Brown <david.brown@hesbynett.no>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 39819 depends on: Show dependency tree
Show dependency graph
Bug 39819 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2009-08-21 19:28 Opened: 2009-04-19 21:09
avr-gcc misses a number of optimisations when copying 4-byte values or
assigning a single byte value to 4 byte values.  The issue actually applies to
other sized values as well, but since 4 byte values are common (such as for
32-bit ints, and for floats) the issue is especially relevant.

In summary, the compiler tends to produce code that is either a series of
direct memory accesses, or uses indirect access (through Z) in a loop.  A
better choice would often be to set up Z as a pointer, then unroll the indirect
pointer loop.

All code was compiled using avr-gcc 4.3.2 from winavr-20090313, using -Os.

Look at the code:

typedef unsigned char uint8_t;
typedef unsigned long int uint32_t;

static uint8_t as[4];
static uint8_t bs[4];

void foo1(void) {
        for (uint8_t i = 0; i < sz; i++) {
                bs[i] = as[1];
        }
}

void foo2(void) {
        for (uint8_t i = 0; i < sz; i++) {
                *(bs + i) = *(as + 1);
        }
}

foo1 compiles to:

lds r24, as+1
sts bs, r24
sts bs+1, r24
sts bs+2, r24
sts bs+3, r24
ret

Excluding the "ret", this is 10 words and 10 cycles.

foo2 is logically identical (array access and pointer access are the same
thing), but compiles to:

lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
.L1:
st Z+, r24
ldi r25, hi8(bs+4)
cpi r30, lo8(bs+4)
cpc r31, r25
brne L1
ret

Excluding the "ret", this is 9 words and 31 cycles (27 on the XMega).  Hoisting
the "ldi r25, hi8(bs+4)" above the label would save four cycles.

An implementation that is smaller than both of these, and slightly slower on
the Mega and slightly faster on the XMega, is:

lds r24, as+1
ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, r24
st Z+, r24
st Z+, r24
st Z+, r24
ret

Excluding the "ret" this is 8 words, and 12 cycles (8 on the XMega).


For the code:

static uint32_t al, bl;
static float af;

void foo3(void) {
        al = 0;
}

void foo4(void) {
        af = 0;
}

we get:

foo3:
sts al, __zero_reg__
sts (al)+1, __zero_reg__
sts (al)+2, __zero_reg__
sts (al)+3, __zero_reg__
ret

That's 8 words and 8 cycles (plus "ret").  Using

ldi r30, lo8(bs)
ldi r31, hi8(bs)
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
st Z+, __zero_reg__
ret

Gives 6 words and 10 cycles, or 6 cycles on the XMega (plus "ret")

Function foo4() should of course give the same code, but instead compiles to
the very inefficient:

foo4:
ldi r24, lo8(0x00)
ldi r25, hi8(0x00)
ldi r26, hlo8(0x00)
ldi r27, hhi8(0x00)
sts af, __zero_reg__
sts (af)+1, __zero_reg__
sts (af)+2, __zero_reg__
sts (af)+3, __zero_reg__
ret

That's 12 words and 12 cycles, and uses 4 registers unnecessarily.


Similar code is produced when copying values:

void foo5(void) {
        al = bl;
}

compiles to:

foo5:
lds r24, bl
lds r25, (bl) + 1
lds r26, (bl) + 2
lds r27, (bl) + 3
sts al, r24
sts (al) + 1, r25
sts (al) + 2, r26
sts (al) + 3, r27

Using the Z and either X or Y pointers would make this code slightly smaller
but marginally slower on the Mega (and marginally faster on the XMega).  Even
without that, re-arranging the code would allow a single register to be used
rather than four.

ret

------- Comment #1 From Eric Weddington 2009-08-21 19:28 -------
Confirmed on 4.3.2.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug