Bug 33050

Summary: [avr] unnessary register save
Product: gcc Reporter: Wouter van Gulik <wvangulik>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED WORKSFORME    
Severity: normal CC: eric.weddington, gcc-bugs, gjl, sascha-web-gcc.gnu.org, wvangulik
Priority: P3 Keywords: missed-optimization
Version: 4.1.2   
Target Milestone: 4.7.0   
Host: Target: avr-*-*
Build: Known to work: 4.7.0
Known to fail: 4.1.2, 4.3.0 Last reconfirmed: 2007-08-24 20:43:28
Attachments: Example
Assembler output with 4.7.0 r173649

Description Wouter van Gulik 2007-08-11 18:13:37 UTC
Using this version/config:

~~~~~~~~~~~~~~~~~`
Using built-in specs.
Target: avr
Configured with: ../gcc-4.1.2/configure --prefix=/c/WinAVR --target=avr --enable
-languages=c,c++ --with-dwarf2 --enable-win32-registry=WinAVR-20070525 --disable
-nls --with-gmp=/usr/local --with-mpfr=/usr/local --enable-doc --disable-libssp
Thread model: single
gcc version 4.1.2 (WinAVR 20070525)

~~~~~~~~~~~~~~~~~~~~~~~~~~
Using this command line to compile:

avr-gcc -S -Os test.c -mmcu=atmega16

~~~~~~~~~~~~~~~~~~~~~~~~~~~

The test case:

extern unsigned char foo(unsigned char in);
unsigned char test2(unsigned char input) {
  
  input += foo(0xA); //use input
  foo(0xA);          //make sure input must be saved over the call
  return input;
}


The assembler output:
/* prologue: frame size=0 */
	push r16
	push r17        <<Useless
/* prologue end (size=2) */
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r16,r24     <<Why?? add r17,r24 is much better 
	ldi r24,lo8(10)
	call foo        
	add r17,r16     <<Could be gone if above statement used
	mov r24,r17    
	clr r25
/* epilogue: frame size=0 */
	pop r17
	pop r16         <<Useless
	ret

The adding is delayed until after the last call, but this requires saving an extra register.

So delaying introduces:
an extra psh/pop
extra mov instruction
Comment 1 Wouter van Gulik 2007-08-11 18:14:54 UTC
Created attachment 14054 [details]
Example

C source showing non optimal code
Comment 2 Eric Weddington 2007-08-22 17:09:37 UTC
4.3.0 20070817 snapshot generates this for the testcase:

test2:
	push r16
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r16,r24
	ldi r24,lo8(10)
	call foo
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r24,r16
	add r24,r17
/* epilogue start */
	pop r17
	pop r16
	ret
Comment 3 Wouter van Gulik 2007-08-24 19:36:26 UTC
(In reply to comment #2)
> 4.3.0 20070817 snapshot generates this for the testcase:
> 

<snip>

Well at least the extra clr r25 is gone...


I just tried some simpler code:

extern unsigned char foo();
unsigned char test(unsigned char input) {
  return input += foo();
}

The result is:
/* prologue: frame size=0 */
	push r17
/* prologue end (size=1) */
	mov r17,r24
	call foo
	add r17,r24            <<Could do "add r24,r17"
	mov r24,r17            <<This could then be gone
	clr r25                <<This is maybe gone in 4.3.0??
/* epilogue: frame size=0 */
	pop r17
	ret
/* epilogue end (size=2) */

Here the add is also done non-optimal. So maybe solving this prevents the extra register save?
Comment 4 Eric Weddington 2007-08-24 20:41:51 UTC
(In reply to comment #3)

4.3.0 20070817 snapshot produces this for the second test case:

test:
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r17,r24
	call foo
	add r24,r17
/* epilogue start */
	pop r17
	ret


So the second test case is optimized correctly when we get to 4.3.0.
Comment 5 Georg-Johann Lay 2011-05-17 19:28:11 UTC
Created attachment 24271 [details]
Assembler output with 4.7.0 r173649

This code is as you expected.
Comment 6 Georg-Johann Lay 2011-05-17 19:30:52 UTC
Closing this issue as resolved+worksforme. 4.7.0 generates reasonable code without any overhead, see Attachement

http://gcc.gnu.org/bugzilla/attachment.cgi?id=24271

generated with -Os (-O1 and -O2) are same.