Bug 43095

Summary: [avr] GCC produces poor code for 4- and 8-byte values
Product: gcc Reporter: Jacob Potter <jdpotter>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED WORKSFORME    
Severity: normal CC: eric.weddington, gcc-bugs, gjl
Priority: P3 Keywords: missed-optimization
Version: 4.5.0   
Target Milestone: ---   
Host: Target: avr-*-*
Build: Known to work: 4.6.1
Known to fail: Last reconfirmed:

Description Jacob Potter 2010-02-16 16:59:59 UTC
Given the following input:

char greater_than_64(long long x, long long y) {
        return x > y;
}
char greater_than_32(long x, long y) {
        return x > y;
}

And running "avr-gcc -Os -c lol.c -o lol.o", I get the following output

00000064 <greater_than_32>:
  64:   ef 92           push    r14
  66:   ff 92           push    r15
  68:   0f 93           push    r16
  6a:   1f 93           push    r17
  6c:   e2 2e           mov     r14, r18
  6e:   f3 2e           mov     r15, r19
  70:   04 2f           mov     r16, r20
  72:   15 2f           mov     r17, r21
  74:   31 e0           ldi     r19, 0x01       ; 1
  76:   e6 16           cp      r14, r22
  78:   f7 06           cpc     r15, r23
  7a:   08 07           cpc     r16, r24
  7c:   19 07           cpc     r17, r25
  7e:   04 f0           brlt    .+0             ; 0x80 <greater_than_32+0x1c>
  80:   30 e0           ldi     r19, 0x00       ; 0
  82:   83 2f           mov     r24, r19
  84:   1f 91           pop     r17
  86:   0f 91           pop     r16
  88:   ff 90           pop     r15
  8a:   ef 90           pop     r14
  8c:   08 95           ret

This is much longer than necessary; the input values could be left in their original registers, rather than moved before comparison. I'm not sure why GCC does this. The 8-byte version is even worse:

00000000 <greater_than_64>:
   0:	af 92       	push	r10
   2:	bf 92       	push	r11
   4:	cf 92       	push	r12
   6:	df 92       	push	r13
   8:	ef 92       	push	r14
   a:	ff 92       	push	r15
   c:	0f 93       	push	r16
   e:	1f 93       	push	r17
  10:	e1 e0       	ldi	r30, 0x01	; 1
  12:	19 17       	cp	r17, r25
  14:	04 f0       	brlt	.+0      	; 0x16 <greater_than_64+0x16>
  16:	91 17       	cp	r25, r17
  18:	01 f4       	brne	.+0      	; 0x1a <greater_than_64+0x1a>
  1a:	08 17       	cp	r16, r24
  1c:	00 f0       	brcs	.+0      	; 0x1e <greater_than_64+0x1e>
  1e:	80 17       	cp	r24, r16
  20:	01 f4       	brne	.+0      	; 0x22 <greater_than_64+0x22>
  22:	f7 16       	cp	r15, r23
  24:	00 f0       	brcs	.+0      	; 0x26 <greater_than_64+0x26>
  26:	7f 15       	cp	r23, r15
  28:	01 f4       	brne	.+0      	; 0x2a <greater_than_64+0x2a>
  2a:	e6 16       	cp	r14, r22
  2c:	00 f0       	brcs	.+0      	; 0x2e <greater_than_64+0x2e>
  2e:	6e 15       	cp	r22, r14
  30:	01 f4       	brne	.+0      	; 0x32 <greater_than_64+0x32>
  32:	d5 16       	cp	r13, r21
  34:	00 f0       	brcs	.+0      	; 0x36 <greater_than_64+0x36>
  36:	5d 15       	cp	r21, r13
  38:	01 f4       	brne	.+0      	; 0x3a <greater_than_64+0x3a>
  3a:	c4 16       	cp	r12, r20
  3c:	00 f0       	brcs	.+0      	; 0x3e <greater_than_64+0x3e>
  3e:	4c 15       	cp	r20, r12
  40:	01 f4       	brne	.+0      	; 0x42 <greater_than_64+0x42>
  42:	b3 16       	cp	r11, r19
  44:	00 f0       	brcs	.+0      	; 0x46 <greater_than_64+0x46>
  46:	3b 15       	cp	r19, r11
  48:	01 f4       	brne	.+0      	; 0x4a <greater_than_64+0x4a>
  4a:	a2 16       	cp	r10, r18
  4c:	00 f0       	brcs	.+0      	; 0x4e <greater_than_64+0x4e>
  4e:	e0 e0       	ldi	r30, 0x00	; 0
  50:	8e 2f       	mov	r24, r30
  52:	1f 91       	pop	r17
  54:	0f 91       	pop	r16
  56:	ff 90       	pop	r15
  58:	ef 90       	pop	r14
  5a:	df 90       	pop	r13
  5c:	cf 90       	pop	r12
  5e:	bf 90       	pop	r11
  60:	af 90       	pop	r10
  62:	08 95       	ret

Here there are many redundant jumps as well as pushes and pops. It seems like these could be trivially eliminated with a peephole pass.

I'm running a build straight out of trunk as of today:
Using built-in specs.
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/usr/local/avr/libexec/gcc/avr/4.5.0/lto-wrapper
Target: avr
Configured with: ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c --disable-nls --disable-libssp --with-dwarf2 : (reconfigured) ../configure --prefix=/usr/local/avr --target=avr --enable-languages=c --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.5.0 20100216 (experimental) (GCC)
Comment 1 Georg-Johann Lay 2011-07-09 08:26:26 UTC
Closed as WORKS FOR ME

I compiled that with 

$ avr-gcc-4.6.1 -dp -Os -S -mmcu=atmega88

and the result is:

greater_than_32:
	ldi r30,lo8(1)	 ;  7	*movqi/2	[length = 1]
	cp r18,r22	 ;  8	*cmpsi/2	[length = 4]
	cpc r19,r23
	cpc r20,r24
	cpc r21,r25
	brlt .L2	 ;  9	branch	[length = 1]
	ldi r30,lo8(0)	 ;  10	*movqi/1	[length = 1]
.L2:
	mov r24,r30	 ;  16	*movqi/1	[length = 1]
	ret	 ;  28	return	[length = 1]


For the 64-bit case, user must be aware that avr-gcc will produce bulky code.

Because of shortness of contributors to avr-gcc, the avr port maintainers regard missed 64-bit optimizations as WON'T FIX.