31528 – Inlining with -Os increases code size

Bug 31528 - Inlining with -Os increases code size

Summary: Inlining with -Os increases code size

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.1.2

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:	30908
Blocks:
	Show dependency tree / graph

Reported:	2007-04-10 14:36 UTC by Joerg Wunsch
Modified:	2011-01-11 17:58 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:	4.1.2, 4.3.0
Last reconfirmed:	2007-11-22 22:00:55

Attachments
Test case for bug 66690. (190 bytes, text/plain) 2007-04-10 14:38 UTC, Joerg Wunsch	Details
Generated assembly code with -Os. (410 bytes, text/plain) 2007-04-10 14:40 UTC, Joerg Wunsch	Details
Generated assembly code with -Os -fno-inline. (391 bytes, text/plain) 2007-04-10 14:40 UTC, Joerg Wunsch	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Joerg Wunsch 2007-04-10 14:36:21 UTC

The inlining performed with -Os has often a tendency to
increase the code size for the AVR target, so it misses the
entire point behind using -Os.

This might be related to bug #30908.

Comment 1 Joerg Wunsch 2007-04-10 14:38:15 UTC

Created attachment 13345 [details]
Test case for bug 66690.

Comment 2 Joerg Wunsch 2007-04-10 14:40:26 UTC

Created attachment 13346 [details]
Generated assembly code with -Os.

Comment 3 Joerg Wunsch 2007-04-10 14:40:56 UTC

Created attachment 13347 [details]
Generated assembly code with -Os -fno-inline.

Comment 4 Joerg Wunsch 2007-04-10 15:38:31 UTC

This code snippet can also be run through the i386 compiler (even though
the generated code will obviously be nonsensical).  I've only got an older
version of that compiler at hand:

gcc41 (GCC) 4.1.2 20061229 (prerelease)

but even that one generates smaller code without the inlining:

j@uriah 197% gcc41 -Os -fno-inline -S bug66690.c
j@uriah 198% as bug66690.s
j@uriah 199% size
   text    data     bss     dec     hex filename
    141       0       0     141      8d a.out
j@uriah 200% gcc41 -Os -S bug66690.c
j@uriah 201% as bug66690.s
j@uriah 202% size
   text    data     bss     dec     hex filename
    182       0       0     182      b6 a.out

Comment 5 Steven Bosscher 2007-04-10 16:38:31 UTC

Inlining decisions are based on heuristics. What works for one target may not work quite as well for another. In this case, it seems that for AVR the heuristics are not the best.  You can tune the heuristics for this target and let the target options override the default heuristics parameters.

Comment 6 Joerg Wunsch 2007-04-10 17:15:27 UTC

(In reply to comment #5)

> Inlining decisions are based on heuristics. What works for one
> target may not work quite as well for another. In this case, it
> seems that for AVR the heuristics are not the best.  You can tune
> the heuristics for this target and let the target options override
> the default heuristics parameters.

But what if *all* targets appear to suffer from pessimization?  I
think it cannot be called a bug of a particular target then.

Just for a completely different test, I compiled a GCC 4.1.1 (since I
had that source code around here) for a sparc64 target.  The results
support those of avr and i386:

% /tmp/sparc64/bin/sparc64-unknown-linux-gcc -Os -c bug66690.c
% /tmp/sparc64/bin/sparc64-unknown-linux-size bug66690.o
   text    data     bss     dec     hex filename
    212       0       0     212      d4 bug66690.o
% /tmp/sparc64/bin/sparc64-unknown-linux-gcc -Os -fno-inline -c bug66690.c
% /tmp/sparc64/bin/sparc64-unknown-linux-size bug66690.o
   text    data     bss     dec     hex filename
    124       0       0     124      7c bug66690.o

So is there a *single* target where inlining on that code would really
save space?  In all cases so far, the -fno-inline code saved a
dramatical amount of space, compared to the default -Os version.


Also, as you mention the target code has a chance to tune this (and I
know there are a lot of complaints from AVR users about pessimizations
caused by GCC 4.x, compared to 3.x), can you give me a hint about
where to look for these knobs?  I might give it a try to see whether I
can find a more optimal set of parameters.

Comment 7 Joerg Wunsch 2007-04-10 21:04:18 UTC

Changed target triplet from avr-*-* to *-*-* as obviously, at least some
of GCC's mainstream targets are affected by that bug as well (perhaps
even *any* target).

Comment 8 Richard Biener 2007-11-22 22:00:53 UTC

Same results on current trunk.  Early inlining is already doing it because we
think putchs size (4 insns) when inlined will reduce the compilation units
size by 4 insns (the out-of-line copy of putch).

putchs IL before inlining looks like

<bb 2>:

<bb 4>:
  D.1182_2 ={v} *43B;
  D.1183_3 = (int) D.1182_2;
  D.1184_4 = D.1183_3 & 32;
  if (D.1184_4 == 0)
    goto <bb 4>;
  else
    goto <bb 3>;

<bb 3>:
  ch.0_7 = (volatile unsigned char) ch_6(D);
  *44B ={v} ch.0_7;
  return;

we count INDIRECT_REFs as having no cost, the only thing what counts
is the BIT_AND_EXPR and the comparison.  Constants and registers also
have no cost.

Considering the (currently) recursive structure of estimate_num_insns_1
it is non-trivial to adjust the container-like reference trees (but
the INDIREC_REF ones).

Comment 9 abnikant 2010-09-07 11:36:38 UTC

gcc version 4.6.0 20100907 (experimental) is not performing the inline in -Os for the test case:

#define UCSRA (*(volatile unsigned char *)0x2B)
#define UDRE 5
#define UDR   (*(volatile unsigned char *)0x2C)

static void putch(char ch);

void putch(char ch)
{
  while (!(UCSRA & (1<<UDRE)));
  UDR = ch;
}

int main(void)
{
  putch(0); 
  putch(1); 
  return 0;
}
The generated assembly with avr-gcc -S -Os test.c is:

putch:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
.L2:
        sbis 43-0x20,5
        rjmp .L2
        out 44-0x20,r24
/* epilogue start */
        ret
        .size   putch, .-putch
.global main
        .type   main, @function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
        ldi r24,lo8(0)
        rcall putch
        ldi r24,lo8(1)
        rcall putch
        ldi r24,lo8(0)
        ldi r25,hi8(0)
/* epilogue start */
        ret

Comment 10 Jan Hubicka 2010-11-10 13:51:24 UTC

Hi,
I just comitted patch supposed to address similar problem as in Comment 9. On x86_64 I now get:

main:
.LFB1:
.L2:
        movb    43, %al
        testb   $32, %al
        je      .L2
        movb    $0, 44
.L3:
        movb    43, %al
        testb   $32, %al
        je      .L3
        movb    $1, 44
        xorl    %eax, %eax
        ret

Can you, please, test if current mainline behave more resonably?

Comment 11 Joerg Wunsch 2010-11-11 10:59:42 UTC

(In reply to comment #10)

> Can you, please, test if current mainline behave more resonably?

Well, I had to fix bug #46426 first ...

Yes, that looks good now.  Compiling the original testcase with just
-Os (no -fno-inline involved) results in:

        .type   putch, @function
putch:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
.L2:
        sbis 43-0x20,5
        rjmp .L2
        out 44-0x20,r24
/* epilogue start */
        ret
        .size   putch, .-putch
.global main
        .type   main, @function
main:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
        ldi r24,lo8(0)
        rcall putch
        ldi r24,lo8(1)
        rcall putch
        ldi r24,lo8(2)
        rcall putch
        ldi r24,lo8(3)
        rcall putch
        ldi r24,lo8(4)
        rcall putch
        ldi r24,lo8(5)
        rcall putch
        ldi r24,lo8(6)
        rcall putch
        ldi r24,lo8(7)
        rcall putch
        ldi r24,lo8(8)
        rcall putch
        ldi r24,lo8(9)
        rcall putch
        ldi r24,lo8(0)
        ldi r25,hi8(0)
/* epilogue start */
        ret
        .size   main, .-main

That's the same what has originally been compiled when applying
-fno-inline.

It does not solve the issue from bug #30908 though.  The version
compiled with -DNOINLINE there is still 4 bytes smaller than the
default version (which has already been the case in GCC 4.3.4).
However, both versions are now already considerably smaller than they
used to be in 4.3.4 (40/44 bytes vs. 52/56 bytes in 4.3.4).

Comment 12 Jan Hubicka 2011-01-11 17:58:21 UTC

OK, fixed;The other problem is already tracked in bug #30908