Bug 29782 - Incorrect inlining failure
Summary: Incorrect inlining failure
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.2
: P3 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-09 16:29 UTC by Panagiotis Issaris
Modified: 2007-04-08 23:46 UTC (History)
2 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build: i486-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Call a function and recommend to inline it (157 bytes, text/plain)
2006-11-09 16:31 UTC, Panagiotis Issaris
Details
Call a function and disallow inlining it (168 bytes, text/plain)
2006-11-09 16:32 UTC, Panagiotis Issaris
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Panagiotis Issaris 2006-11-09 16:29:45 UTC
GCC sometimes does not inline code claiming the function has grown to large, while inlining it would have _decreased_ the codesize.

For example, the following block of code, will result in read_time being inlined:
#include <stdio.h>
static inline long long read_time(void) {
        long long l;
        asm volatile(   "rdtsc\n\t"
                : "=A" (l)
        );
        return l;
}
int main()
{
    long long l = read_time();
    printf("%Ld\n", l);
}

The following block will not inline read_time:
#include <stdio.h>
static __attribute__ ((noinline)) long long read_time(void) {
        long long l;
        asm volatile(   "rdtsc\n\t"
                : "=A" (l)
        );
        return l;
}
int main() {
    long long l = read_time();
    printf("%Ld\n", l);
}

As read_time is really small, its codesize will always be smaller if it gets inlined. Nonetheless, in some cases the compiler gives a warning that the code has grown to large, and that it will _disable_ inlining because of this:
"warning: inlining failed in call to ‘read_time’: --param large-function-growth limit reached"

This seems wrong to me as the non-inlined code would be larger then the inlined code.

Compiling it with:
gcc -c -I. -fomit-frame-pointer -g -Wdeclaration-after-statement -Wall
-Wno-switch -Wdisabled-optimization -Wpointer-arith -Wredundant-decls
-Winline -O3  rdtsc.c

Shows that the inlined version is indeed smaller:
size inlinerdtsc.o 
   text    data     bss     dec     hex filename
     51       0       0      51      33 inlinerdtsc.o
size rdtsc.o 
   text    data     bss     dec     hex filename
     68       0       0      68      44 rdtsc.o

I do not think it is specific to this short block of code, as
the generated assembly shows rdtsc being only 2 bytes long, while
the call instruction by itself already occupies 5 bytes:

Not inlined:
00000000 <read_time>:
   0:   0f 31                   rdtsc  
   2:   c3                      ret    
   3:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
   9:   8d bc 27 00 00 00 00    lea    0x0(%edi),%edi

00000010 <main>:
  10:   8d 4c 24 04             lea    0x4(%esp),%ecx
  14:   83 e4 f0                and    $0xfffffff0,%esp
  17:   ff 71 fc                pushl  0xfffffffc(%ecx)
  1a:   51                      push   %ecx
  1b:   83 ec 18                sub    $0x18,%esp
  1e:   e8 dd ff ff ff          call   0 <read_time>
  23:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  2a:   89 44 24 04             mov    %eax,0x4(%esp)
  2e:   89 54 24 08             mov    %edx,0x8(%esp)
  32:   e8 fc ff ff ff          call   33 <main+0x23>
  37:   83 c4 18                add    $0x18,%esp
  3a:   31 c0                   xor    %eax,%eax
  3c:   59                      pop    %ecx
  3d:   8d 61 fc                lea    0xfffffffc(%ecx),%esp
  40:   c3                      ret    

Inlined:
00000000 <main>:
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  0xfffffffc(%ecx)
   a:   51                      push   %ecx
   b:   83 ec 18                sub    $0x18,%esp
   e:   0f 31                   rdtsc  
  10:   89 44 24 04             mov    %eax,0x4(%esp)
  14:   89 54 24 08             mov    %edx,0x8(%esp)
  18:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  1f:   e8 fc ff ff ff          call   20 <main+0x20>
  24:   83 c4 18                add    $0x18,%esp
  27:   31 c0                   xor    %eax,%eax
  29:   59                      pop    %ecx
  2a:   8d 61 fc                lea    0xfffffffc(%ecx),%esp
  2d:   c3                      ret    

Does GCC just disable all inlining as soon as a certain limit in codesize is reached? Or does it actually try to determine whether inlining will increase or decrease the codesize? If so, is an heuristic used or an exact calculation (if possible)? If an heuristic is used, what is the heuristic?

Thanks for any reply! :)

System info:
* Ubuntu Edgy Eft 6.10
* Linux issaris 2.6.17-10-generic #2 SMP Fri Oct 13 18:45:35 UTC 2006 i686 GNU/Linux
* Intel(R) Pentium(R) 4 CPU 3.20GHz
* Compiler:
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)

With friendly regards,
Takis
Comment 1 Panagiotis Issaris 2006-11-09 16:31:09 UTC
Created attachment 12577 [details]
Call a function and recommend to inline it
Comment 2 Panagiotis Issaris 2006-11-09 16:32:14 UTC
Created attachment 12578 [details]
Call a function and disallow inlining it
Comment 3 Richard Biener 2006-11-09 17:10:59 UTC
It has a heuristic to tell the result in code-size difference.  Of course no heuristic is perfect - see tree-inline.c:estimate_num_insns().
Comment 4 Panagiotis Issaris 2006-11-09 17:28:36 UTC
(In reply to comment #3)
> It has a heuristic to tell the result in code-size difference.  Of course no
> heuristic is perfect - see tree-inline.c:estimate_num_insns().
Ofcourse! Thanks for your reply!

So, I guess that if I were to move ASM_EXPR to the list of zero cost cases, GCC would always inline my code. I'll see if this works. Thanks again! :)

Still, I think it is weird I'm seeing this behavior, as with my untrained eyes, it seems as if inline assembly would only get 1 assigned as cost, while a function call probably costs 4+something (I guess from estimate_move_cost() although it can also return another value of which I am currently not capable of determining the value). This would mean inlining of functions containing only inline assembly blocks would always succeed, right? Hmm... Unless the else in estimate_move_cost() can return 0 or 1 in some cases.



Comment 5 Andrew Pinski 2006-11-09 17:31:16 UTC
Can you give your full testcase as right now the above testcases don't show what your code looks like and why we are reaching the large-function-growth limit.
Comment 6 Andrew Pinski 2007-04-08 23:46:20 UTC
No real testcase in over 3 months so closing.