User account creation filtered due to spam.

Bug 33050 - [avr] unnessary register save
Summary: [avr] unnessary register save
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.2
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-08-11 18:13 UTC by Wouter van Gulik
Modified: 2011-05-17 19:30 UTC (History)
5 users (show)

See Also:
Host:
Target: avr-*-*
Build:
Known to work: 4.7.0
Known to fail: 4.1.2, 4.3.0
Last reconfirmed: 2007-08-24 20:43:28


Attachments
Example (139 bytes, text/plain)
2007-08-11 18:14 UTC, Wouter van Gulik
Details
Assembler output with 4.7.0 r173649 (336 bytes, application/octet-stream)
2011-05-17 19:28 UTC, Georg-Johann Lay
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wouter van Gulik 2007-08-11 18:13:37 UTC
Using this version/config:

~~~~~~~~~~~~~~~~~`
Using built-in specs.
Target: avr
Configured with: ../gcc-4.1.2/configure --prefix=/c/WinAVR --target=avr --enable
-languages=c,c++ --with-dwarf2 --enable-win32-registry=WinAVR-20070525 --disable
-nls --with-gmp=/usr/local --with-mpfr=/usr/local --enable-doc --disable-libssp
Thread model: single
gcc version 4.1.2 (WinAVR 20070525)

~~~~~~~~~~~~~~~~~~~~~~~~~~
Using this command line to compile:

avr-gcc -S -Os test.c -mmcu=atmega16

~~~~~~~~~~~~~~~~~~~~~~~~~~~

The test case:

extern unsigned char foo(unsigned char in);
unsigned char test2(unsigned char input) {
  
  input += foo(0xA); //use input
  foo(0xA);          //make sure input must be saved over the call
  return input;
}


The assembler output:
/* prologue: frame size=0 */
	push r16
	push r17        <<Useless
/* prologue end (size=2) */
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r16,r24     <<Why?? add r17,r24 is much better 
	ldi r24,lo8(10)
	call foo        
	add r17,r16     <<Could be gone if above statement used
	mov r24,r17    
	clr r25
/* epilogue: frame size=0 */
	pop r17
	pop r16         <<Useless
	ret

The adding is delayed until after the last call, but this requires saving an extra register.

So delaying introduces:
an extra psh/pop
extra mov instruction
Comment 1 Wouter van Gulik 2007-08-11 18:14:54 UTC
Created attachment 14054 [details]
Example

C source showing non optimal code
Comment 2 Eric Weddington 2007-08-22 17:09:37 UTC
4.3.0 20070817 snapshot generates this for the testcase:

test2:
	push r16
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r16,r24
	ldi r24,lo8(10)
	call foo
	mov r17,r24
	ldi r24,lo8(10)
	call foo
	mov r24,r16
	add r24,r17
/* epilogue start */
	pop r17
	pop r16
	ret
Comment 3 Wouter van Gulik 2007-08-24 19:36:26 UTC
(In reply to comment #2)
> 4.3.0 20070817 snapshot generates this for the testcase:
> 

<snip>

Well at least the extra clr r25 is gone...


I just tried some simpler code:

extern unsigned char foo();
unsigned char test(unsigned char input) {
  return input += foo();
}

The result is:
/* prologue: frame size=0 */
	push r17
/* prologue end (size=1) */
	mov r17,r24
	call foo
	add r17,r24            <<Could do "add r24,r17"
	mov r24,r17            <<This could then be gone
	clr r25                <<This is maybe gone in 4.3.0??
/* epilogue: frame size=0 */
	pop r17
	ret
/* epilogue end (size=2) */

Here the add is also done non-optimal. So maybe solving this prevents the extra register save?
Comment 4 Eric Weddington 2007-08-24 20:41:51 UTC
(In reply to comment #3)

4.3.0 20070817 snapshot produces this for the second test case:

test:
	push r17
/* prologue: function */
/* frame size = 0 */
	mov r17,r24
	call foo
	add r24,r17
/* epilogue start */
	pop r17
	ret


So the second test case is optimized correctly when we get to 4.3.0.
Comment 5 Georg-Johann Lay 2011-05-17 19:28:11 UTC
Created attachment 24271 [details]
Assembler output with 4.7.0 r173649

This code is as you expected.
Comment 6 Georg-Johann Lay 2011-05-17 19:30:52 UTC
Closing this issue as resolved+worksforme. 4.7.0 generates reasonable code without any overhead, see Attachement

http://gcc.gnu.org/bugzilla/attachment.cgi?id=24271

generated with -Os (-O1 and -O2) are same.