33050 – [avr] unnessary register save

Bug 33050 - [avr] unnessary register save

Summary: [avr] unnessary register save

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.1.2

Importance:	P3 normal
Target Milestone:	4.7.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2007-08-11 18:13 UTC by Wouter van Gulik
Modified:	2011-05-17 19:30 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	avr--
Build:
Known to work:	4.7.0
Known to fail:	4.1.2, 4.3.0
Last reconfirmed:	2007-08-24 20:43:28

Attachments
Example (139 bytes, text/plain) 2007-08-11 18:14 UTC, Wouter van Gulik	Details
Assembler output with 4.7.0 r173649 (336 bytes, application/octet-stream) 2011-05-17 19:28 UTC, Georg-Johann Lay	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Wouter van Gulik 2007-08-11 18:13:37 UTC

Using this version/config:

~~~~~~~~~~~~~~~~~`
Using built-in specs.
Target: avr
Configured with: ../gcc-4.1.2/configure --prefix=/c/WinAVR --target=avr --enable
-languages=c,c++ --with-dwarf2 --enable-win32-registry=WinAVR-20070525 --disable
-nls --with-gmp=/usr/local --with-mpfr=/usr/local --enable-doc --disable-libssp
Thread model: single
gcc version 4.1.2 (WinAVR 20070525)

~~~~~~~~~~~~~~~~~~~~~~~~~~
Using this command line to compile:

avr-gcc -S -Os test.c -mmcu=atmega16

~~~~~~~~~~~~~~~~~~~~~~~~~~~

The test case:

extern unsigned char foo(unsigned char in);
unsigned char test2(unsigned char input) {
  
  input += foo(0xA); //use input
  foo(0xA);          //make sure input must be saved over the call
  return input;
}


The assembler output:
/* prologue: frame size=0 */
	push r16
	push r17        <<Useless
/* prologue end (size=2) */
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r16,r24     <<Why?? add r17,r24 is much better 
	ldi r24,lo8(10)
	call foo        
	add r17,r16     <<Could be gone if above statement used
	mov r24,r17    
	clr r25
/* epilogue: frame size=0 */
	pop r17
	pop r16         <<Useless
	ret

The adding is delayed until after the last call, but this requires saving an extra register.

So delaying introduces:
an extra psh/pop
extra mov instruction

Comment 1 Wouter van Gulik 2007-08-11 18:14:54 UTC

Created attachment 14054 [details]
Example

C source showing non optimal code

Comment 2 Eric Weddington 2007-08-22 17:09:37 UTC

4.3.0 20070817 snapshot generates this for the testcase:

test2:
	push r16
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r16,r24
	ldi r24,lo8(10)
	call foo
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r24,r16
	add r24,r17
/* epilogue start */
	pop r17
	pop r16
	ret

Comment 3 Wouter van Gulik 2007-08-24 19:36:26 UTC

(In reply to comment #2)
> 4.3.0 20070817 snapshot generates this for the testcase:
> 

<snip>

Well at least the extra clr r25 is gone...


I just tried some simpler code:

extern unsigned char foo();
unsigned char test(unsigned char input) {
  return input += foo();
}

The result is:
/* prologue: frame size=0 */
	push r17
/* prologue end (size=1) */
	mov r17,r24
	call foo
	add r17,r24            <<Could do "add r24,r17"
	mov r24,r17            <<This could then be gone
	clr r25                <<This is maybe gone in 4.3.0??
/* epilogue: frame size=0 */
	pop r17
	ret
/* epilogue end (size=2) */

Here the add is also done non-optimal. So maybe solving this prevents the extra register save?

Comment 4 Eric Weddington 2007-08-24 20:41:51 UTC

(In reply to comment #3)

4.3.0 20070817 snapshot produces this for the second test case:

test:
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r17,r24
	call foo
	add r24,r17
/* epilogue start */
	pop r17
	ret


So the second test case is optimized correctly when we get to 4.3.0.

Comment 5 Georg-Johann Lay 2011-05-17 19:28:11 UTC

Created attachment 24271 [details]
Assembler output with 4.7.0 r173649

This code is as you expected.

Comment 6 Georg-Johann Lay 2011-05-17 19:30:52 UTC

Closing this issue as resolved+worksforme. 4.7.0 generates reasonable code without any overhead, see Attachement

http://gcc.gnu.org/bugzilla/attachment.cgi?id=24271

generated with -Os (-O1 and -O2) are same.