Bug 34791 - [avr] optimisation of 8-bit logic sometimes fails
Summary: [avr] optimisation of 8-bit logic sometimes fails
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.2.2
: P3 enhancement
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-01-15 08:56 UTC by David Brown
Modified: 2011-10-09 20:40 UTC (History)
3 users (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail: 4.6.1
Last reconfirmed: 2011-07-09 11:55:16


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Brown 2008-01-15 08:56:38 UTC
As the avr is an 8-bit processor, it is important to handle data as 8-bit when possible, even though the C standards force 8-bit data to be promoted to 16-bit ints in many situations.  Common cases are when doing logic operations on chars - when legally valid (i.e., the results are the same as for 16-bit operations), the generated code should use 8-bit operations.  In particular, logic operations using constants such as 0 and 0xff should be optimised.

Optimisation works well in many cases, but fails when the expressions get slightly more complex:

extern uint8_t data[64];

uint8_t bar(uint8_t x, uint8_t y) {
	return data[y ^ (x & 0x0f)];
}

uint8_t bar2(uint8_t x, uint8_t y) {
	return data[(y ^ x) & 0x0f];
}

  40               	bar:
  41               	/* prologue: frame size=0 */
  42               	/* prologue end (size=0) */
  43 0000 E82F      		mov r30,r24	 ;  x, x
  44 0002 F0E0      		ldi r31,lo8(0)	 ;  x,
  45 0004 EF70      		andi r30,lo8(15)	 ;  x,
  46 0006 F070      		andi r31,hi8(15)	 ;  x,
  47 0008 70E0      		ldi r23,lo8(0)	 ;  y,
  48 000a E627      		eor r30,r22	 ;  x, y
  49 000c F727      		eor r31,r23	 ;  x, y
  50 000e E050      		subi r30,lo8(-(data))	 ;  x,
  51 0010 F040      		sbci r31,hi8(-(data))	 ;  x,
  52 0012 8081      		ld r24,Z	 ;  tmp51, data
  53 0014 90E0      		ldi r25,lo8(0)	 ;  <result>,
  54               	/* epilogue: frame size=0 */
  55 0016 0895      		ret
  56               	/* epilogue end (size=1) */
  57               	/* function bar size 12 (11) */

  61               	bar2:
  62               	/* prologue: frame size=0 */
  63               	/* prologue end (size=0) */
  64 0018 E62F      		mov r30,r22	 ;  y, y
  65 001a E827      		eor r30,r24	 ;  y, x
  66 001c F0E0      		ldi r31,lo8(0)	 ; ,
  67 001e EF70      		andi r30,lo8(15)	 ;  tmp46,
  68 0020 F070      		andi r31,hi8(15)	 ;  tmp46,
  69 0022 E050      		subi r30,lo8(-(data))	 ;  tmp46,
  70 0024 F040      		sbci r31,hi8(-(data))	 ;  tmp46,
  71 0026 8081      		ld r24,Z	 ;  tmp50, data
  72 0028 90E0      		ldi r25,lo8(0)	 ;  <result>,
  73               	/* epilogue: frame size=0 */
  74 002a 0895      		ret
  75               	/* epilogue end (size=1) */
  76               	/* function bar2 size 10 (9) */

The first function "bar" has several clear improvements - it does all the logic operations as 16-bit.  In the second case, the "eor" is nicely handled as 8-bit, but the "& 0x0f" is done as 16-bit - there is an unnecessary "andi r31, hi8(15)" instruction.
Comment 1 David Brown 2010-02-16 10:46:04 UTC
There are many other cases where 8-bit optimisation is missing - C's integer promotion gets in the way.  This is particularly common when dealing with a compile-time constant - there is no way in C to say that "0x3f" is 8-bit rather than a 16-bit int.

Another example of code with this problem is:

void foo(void) {
    static unsigned char count;

    if (++count & 0x3f) {
        PORTC &= ~0x01;
    } else {
        PORTC |= 0x01;
    }
}

Both the "&" and the comparison with zero are done as 16-bit.


One work-around is to use this macro:

#define const_byte(x) ({ static const __attribute__((__progmem__)) \
                         unsigned char v = x; v; })

Then we can write:

#define const_byte(x) ({ static const __attribute__((__progmem__)) \
                         unsigned char v = x; v; })

uint8_t bar3(uint8_t x, uint8_t y) {
        return data[y ^ (x & const_byte(0x0f))];
} 

147               	bar3:
 148               	/* prologue: function */
 149               	/* frame size = 0 */
 150 008c 8F70      		andi r24,lo8(15)	 ;  tmp45,
 151 008e 8627      		eor r24,r22	 ;  tmp45, y
 152 0090 E0E0      		ldi r30,lo8(data)	 ;  tmp48,
 153 0092 F0E0      		ldi r31,hi8(data)	 ;  tmp48,
 154 0094 E80F      		add r30,r24	 ;  tmp48, tmp45
 155 0096 F11D      		adc r31,__zero_reg__	 ;  tmp48
 156 0098 8081      		ld r24,Z	 ; , data
 157               	/* epilogue start */
 158 009a 0895      		ret
 160 


As far as I can see, this generated code is optimal.

The macro works because it forces the value to be 8-bit, rather than a 16-bit compile-time constant.  However, the compiler is still smart enough to see that since it's a "const" with known value, it's value can be used directly.  As a side effect, the static "variable" must be created somewhere - by using __progmen__, we create it in flash rather than wasting ram.  Even that waste could be spared by garbage-collection linking, or by using a dedicated segment rather than .progmem.data.

Comment 2 Georg-Johann Lay 2011-07-09 11:55:16 UTC
I still see this in 4.6.1.

(In reply to comment #1)
> There are many other cases where 8-bit optimisation is missing - C's integer
> promotion gets in the way.  This is particularly common when dealing with a
> compile-time constant - there is no way in C to say that "0x3f" is 8-bit rather
> than a 16-bit int.
> 
> Another example of code with this problem is:
> 
> void foo(void) {
>     static unsigned char count;
> 
>     if (++count & 0x3f) {
>         PORTC &= ~0x01;
>     } else {
>         PORTC |= 0x01;
>     }
> }
> 
> Both the "&" and the comparison with zero are done as 16-bit.

Please, don't use stuff like PORTC that is nor available to the general GCC developer.  They won't be able to reproduce this artifact!  This is not specific to the avr part and related to PR43088.

Write yout test case like that:

typedef unsigned char uint8_t;
#define PORTC (*((uint8_t volatile*) 0x28))

void foo(void) {
    static unsigned char count;

    if (++count & 0x3f) {
        PORTC &= ~0x01;
    } else {
        PORTC |= 0x01;
    }
}

void foo3e(void) {
    static unsigned char count;

    if (++count & 0x3e) {
        PORTC &= ~0x01;
    } else {
        PORTC |= 0x01;
    }
}
Comment 3 Georg-Johann Lay 2011-10-09 20:40:51 UTC
Compiled the following code from commant #0 and commant #1 with avr-gcc 4.7 (SVN 179594)

#define uint8_t unsigned char
#define PORTC (*((uint8_t volatile*) 0x28))

extern uint8_t data[64];

uint8_t bar(uint8_t x, uint8_t y) {
    return data[y ^ (x & 0x0f)];
}

uint8_t bar2(uint8_t x, uint8_t y) {
    return data[(y ^ x) & 0x0f];
}

void foo(void) {
    static unsigned char count;

    if (++count & 0x3f) {
        PORTC &= ~0x01;
    } else {
        PORTC |= 0x01;
    }
}

With -Os -dp yields the following result:

bar:
	andi r24,lo8(15)
	eor r24,r22
	mov r30,r24
	ldi r31,lo8(0)
	subi r30,lo8(-(data))
	sbci r31,hi8(-(data))
	ld r24,Z
	ret

bar2:
	eor r22,r24
	andi r22,lo8(15)
	mov r30,r22
	ldi r31,lo8(0)
	subi r30,lo8(-(data))
	sbci r31,hi8(-(data))
	ld r24,Z
	ret

foo:
	lds r24,count.1232
	subi r24,lo8(-(1))
	sts count.1232,r24
	andi r24,lo8(63)
	breq .L4
	cbi 40-0x20,0
	ret
.L4:
	sbi 40-0x20,0
	ret

Thus, closing this PR as FIXED because the code is optimal and nothing remains to be improved.