Bug 70677 - Suboptimal cond on AVR: unneeded stack frame
Summary: Suboptimal cond on AVR: unneeded stack frame
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.3.1
: P5 normal
Target Milestone: 6.2
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 52025 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-04-15 09:08 UTC by night_ghost
Modified: 2017-01-07 12:18 UTC (History)
1 user (show)

See Also:
Host:
Target: avr
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-05-21 00:00:00


Attachments
testcase in attachment (380.92 KB, application/x-gzip)
2016-05-24 11:10 UTC, night_ghost
Details

Note You need to log in before you can comment on or make changes to this bug.
Description night_ghost 2016-04-15 09:08:47 UTC
here a lot of commands to deal with stack frame for only one byte

//(out of scope:
struct Point {
    byte x;
    byte y;
};

static boolean inline is_alt(point p){
    return (p.y & 0x40);
}
//)

static void panVel(point p){
    1e14:       cf 93           push    r28
    1e16:       df 93           push    r29
    1e18:       1f 92           push    r1
    1e1a:       cd b7           in      r28, 0x3d       ; 61
    1e1c:       de b7           in      r29, 0x3e       ; 62
    
    printSpeed(cnvGroundSpeed(),is_alt(p));
    1e1e:       96 fb           bst     r25, 6
    1e20:       22 27           eor     r18, r18
    1e22:       20 f9           bld     r18, 0
    1e24:       29 83           std     Y+1, r18        ; 0x01
    1e26:       0e 94 33 09     call    0x1266; 0x1266 <_ZL14cnvGroundSpeedv>
    1e2a:       ab 01           movw    r20, r22
    1e2c:       bc 01           movw    r22, r24
}

    1e2e:       29 81           ldd     r18, Y+1        ; 0x01
    1e30:       87 e8           ldi     r24, 0x87       ; 135
    1e32:       92 e0           ldi     r25, 0x02       ; 2
    1e34:       0e 94 c6 0e     call    0x1d8c; 0x1d8c <_ZL10printSpeedPKcfh>

    1e38:       0f 90           pop     r0
    1e3a:       df 91           pop     r29
    1e3c:       cf 91           pop     r28
    1e3e:       08 95           ret

R18 can be PUSH'ed, moved to r28 or even calculated after call to x1266
Comment 1 Georg-Johann Lay 2016-05-21 11:12:42 UTC
Would you provide a test case, please?  Cf. https://gcc.gnu.org/bugs/#need
For example there is nothing like "panVel" in your code.
Comment 2 night_ghost 2016-05-23 06:32:17 UTC
all code samples from https://github.com/night-ghost/minimosd-extra/tree/master/MinimOsd_Extra. When I try to isolate a piece of code with a bug in one file, it is compiled differently than in their native place.

Same stack usage in this code:

#include <Arduino.h>

#define SCK_PIN   13
#define MISO_PIN  12
#define MOSI_PIN  11

class SPI
{
  public:
    SPI(void);
     static byte transfer(byte);
};

SPI::SPI()
{
  // initialize the SPI pins
  pinMode(SCK_PIN, OUTPUT);
  pinMode(MOSI_PIN, OUTPUT);
  pinMode(MISO_PIN, INPUT);
}

byte SPI::transfer(byte value){
  SPDR = value;
  while (!(SPSR & (1<<SPIF))) ;
  return SPDR;
}

void MAX_write(byte addr, byte data){
    register byte d=data;
    SPI::transfer(addr);
    SPI::transfer(d);
}

will be 

void MAX_write(byte addr, byte data){
     f06:<----->cf 93       <-->push<-->r28
     f08:<----->df 93       <-->push<-->r29
     f0a:<----->1f 92       <-->push<-->r1
     f0c:<----->cd b7       <-->in<---->r28, 0x3d<----->; 61
     f0e:<----->de b7       <-->in<---->r29, 0x3e<----->; 62
    register byte d=data;
    SPI::transfer(addr);
     f10:<----->69 83       <-->std<--->Y+1, r22<------>; 0x01
     f12:<----->0e 94 e2 2e <-->call<-->0x5dc4<>; 0x5dc4 <_ZN3SPI8transferEh>
    SPI::transfer(d);
     f16:<----->69 81       <-->ldd<--->r22, Y+1<------>; 0x01
     f18:<----->86 2f       <-->mov<--->r24, r22
     f1a:<----->0e 94 e2 2e <-->call<-->0x5dc4<>; 0x5dc4 <_ZN3SPI8transferEh>
}
     f1e:<----->0f 90       <-->pop<--->r0
     f20:<----->df 91       <-->pop<--->r29
     f22:<----->cf 91       <-->pop<--->r28
     f24:<----->08 95       <-->ret

as one can see, "data" variable saved in stack, not in register
Comment 3 Georg-Johann Lay 2016-05-23 14:34:23 UTC
You can follow the bug reporting instructions an provide the preprocessed code and the compiler output as described in https://gcc.gnu.org/bugs/#need

Just add -v -save-temps to the compiler's command line options, re-build the project, and supply the output of the compiler and the generated *.i file (in case of C) resp. *.ii file (in case of C++).

This is needed because you have to resolve the non-standard stuff like "INPUT" or "Ardiuno.h".  There's no need that you change the suorces to accomplish this (if you can manage to find a smalle test case, this is nice but not mandatory).
Comment 4 night_ghost 2016-05-24 11:10:08 UTC
Created attachment 38550 [details]
testcase in attachment

for AVR platform arduino.h is the most that neither is a standard :)
Comment 5 Georg-Johann Lay 2016-05-25 12:47:08 UTC
Maybe -fno-caller-saves is what you are looking for?

Here is a C test case guessed from your first code snipped:

typedef struct
{
    unsigned char x, y;
} point;

extern void printSpeed (long, unsigned char);
extern long cnvGroundSpeed (void);

void panVel (point p)
{
    printSpeed (cnvGroundSpeed(), p.y & 0x40);
}

compiled with avr-gcc 5.x

$ avr-gcc -S -Os

we'll get

panVel:
	push   r28
	push   r29
	push   __zero_reg__
	in r28,__SP_L__
	in r29,__SP_H__

	mov   r20,r25
	andi  r20,lo8(64)
	std   Y+1,r20
	rcall cnvGroundSpeed
	ldd   r20,Y+1

	pop   __tmp_reg__
	pop   r29
	pop   r28
	rjmp  printSpeed


Adding -fno-caller-saves:

panVel:
	push  r28

	mov   r28,r25
	andi  r28,lo8(64)
	rcall cnvGroundSpeed
	mov   r20,r28

	pop   r28
	rjmp  printSpeed


I actually don't know whether this is a flaw in the avr backend (like a cost issue) or wrong assumptions in the middle-end.  Saving / Restoring in the frame is actually not more costly than saving in a call-saved register;  what makes it expensive is the frame setup in prologue and epilogue...
Comment 6 night_ghost 2016-05-25 13:53:27 UTC
Thank you again for such good key - it saves me 150 bytes. Moreover this key is not listed in GCC's possible optimisations for AVR. 


PS. Why it not included by default to -Os? What can be affected by it?
Comment 7 Georg-Johann Lay 2016-08-03 15:46:44 UTC
Author: gjl
Date: Wed Aug  3 15:46:11 2016
New Revision: 239080

URL: https://gcc.gnu.org/viewcvs?rev=239080&root=gcc&view=rev
Log:
	PR 70677
	* common/config/avr/avr-common.c (avr_option_optimization_table)
	[OPT_LEVELS_ALL]: Turn off -fcaller-saves.


Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/common/config/avr/avr-common.c
Comment 8 Georg-Johann Lay 2016-08-04 07:53:10 UTC
Author: gjl
Date: Thu Aug  4 07:52:38 2016
New Revision: 239117

URL: https://gcc.gnu.org/viewcvs?rev=239117&root=gcc&view=rev
Log:
	PR 70677
	* common/config/avr/avr-common.c (avr_option_optimization_table)
	[OPT_LEVELS_ALL]: Turn off -fcaller-saves.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common/config/avr/avr-common.c
Comment 9 Georg-Johann Lay 2016-08-04 08:00:27 UTC
For 6.2+, use -fno-caller-saves by default.  It still can be re-enabled by explicit -fcaller-saves.
Comment 10 Georg-Johann Lay 2017-01-07 12:18:48 UTC
*** Bug 52025 has been marked as a duplicate of this bug. ***