Bug 54965 - [4.6 Regression] sorry, unimplemented: inlining failed in call to 'foo': function not considered for inlining
Summary: [4.6 Regression] sorry, unimplemented: inlining failed in call to 'foo': func...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.4
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-18 01:50 UTC by Siarhei Siamashka
Modified: 2013-04-12 16:18 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.5.4, 4.7.2
Known to fail: 4.6.4
Last reconfirmed: 2012-10-18 00:00:00


Attachments
reduced.i (440 bytes, application/octet-stream)
2012-10-18 01:50 UTC, Siarhei Siamashka
Details
pixman-combine-float.i.gz - the original full preprocessed source (19.94 KB, application/octet-stream)
2012-10-18 01:56 UTC, Siarhei Siamashka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Siarhei Siamashka 2012-10-18 01:50:14 UTC
Created attachment 28474 [details]
reduced.i

GCC 4.6 fails when compiling current git versions of pixman: https://bugs.freedesktop.org/show_bug.cgi?id=55630

Bisecting shows that this problem started occurring in 4.6 branch after the following commit: http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3d5f815b529fe4b8b79d4f2a04e6eb670faee04d

3d5f815b529fe4b8b79d4f2a04e6eb670faee04d is the first bad commit
commit 3d5f815b529fe4b8b79d4f2a04e6eb670faee04d
Author: hubicka <hubicka@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Nov 11 22:08:26 2010 +0000

    	PR tree-optimize/40436
    	* gcc.dg/tree-ssa/inline-5.c: New testcase.
    	* gcc.dg/tree-ssa/inline-6.c: New testcase.
    
    	* ipa-inline.c (likely_eliminated_by_inlining_p): Rename to ...
    	(eliminated_by_inlining_prob): ... this one; return 50% probability for
    	SRA.
    	(estimate_function_body_sizes): Update use of eliminated_by_inlining_prob;
    	estimate static function size for 2 instructions.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@166624 138bc75d-0d04-0410-961f-82ee72b054a4


The problem disappears in 4.7 branch after: http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=526b36a8a249c8c8698ca48ffeb8bff552f5a6fd

526b36a8a249c8c8698ca48ffeb8bff552f5a6fd is the first bad commit
commit 526b36a8a249c8c8698ca48ffeb8bff552f5a6fd
Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Mar 25 11:59:19 2011 +0000

    2011-03-25  Richard Guenther  <rguenther@suse.de>
    
    	* passes.c (init_optimization_passes): Add FRE pass after
    	early SRA.
    
    	* g++.dg/tree-ssa/pr41186.C: Scan the appropriate FRE dump.
    	* g++.dg/tree-ssa/pr8781.C: Likewise.
    	* gcc.dg/ipa/ipa-pta-13.c: Likewise.
    	* gcc.dg/ipa/ipa-pta-3.c: Likewise.
    	* gcc.dg/ipa/ipa-pta-4.c: Likewise.
    	* gcc.dg/tree-ssa/20041122-1.c: Likewise.
    	* gcc.dg/tree-ssa/alias-18.c: Likewise.
    	* gcc.dg/tree-ssa/foldstring-1.c: Likewise.
    	* gcc.dg/tree-ssa/forwprop-10.c: Likewise.
    	* gcc.dg/tree-ssa/forwprop-9.c: Likewise.
    	* gcc.dg/tree-ssa/fre-vce-1.c: Likewise.
    	* gcc.dg/tree-ssa/loadpre6.c: Likewise.
    	* gcc.dg/tree-ssa/pr21574.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-dom-cse-1.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-1.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-11.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-12.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-13.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-14.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-15.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-16.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-17.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-18.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-19.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-2.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-21.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-22.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-23.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-24.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-25.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-26.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-27.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-3.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-4.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-5.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-6.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-7.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-8.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-9.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-pre-10.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-pre-26.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-pre-7.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-pre-8.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-pre-9.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-sccvn-1.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-sccvn-2.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-sccvn-3.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-sccvn-4.c: Likewise.
    	* gcc.dg/tree-ssa/struct-aliasing-1.c: Likewise.
    	* gcc.dg/tree-ssa/struct-aliasing-2.c: Likewise.
    	* c-c++-common/pr46562-2.c: Likewise.
    	* gfortran.dg/pr42108.f90: Likewise.
    	* gcc.dg/torture/pta-structcopy-1.c: Scan ealias dump, force
    	foo to be inlined even at -O1.
    	* gcc.dg/tree-ssa/ssa-dce-4.c: Disable FRE.
    	* gcc.dg/ipa/ipa-pta-14.c: Likewise.
    	* gcc.dg/tree-ssa/ssa-fre-1.c: Adjust.
    	* gcc.dg/matrix/matrix.exp: Disable FRE.
    
 
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@171450 138bc75d-0d04-0410-961f-82ee72b054a4


Reducing the testcase does not seem to be an easy task. The smallest that I could get is the attached 'reduced.i' file. It triggers something that looks like a sibling '--param large-function-growth limit reached' problem:

$ gcc-4.6.4-20121016 -O1 -c reduced.i 
reduced.i: In function ‘combine_conjoint_xor_ca_float’:
reduced.i:41:50: sorry, unimplemented: inlining failed in call to ‘pd_combine_conjoint_xor’: --param large-function-growth limit reached
reduced.i:54:14: sorry, unimplemented: called from here
Comment 1 Siarhei Siamashka 2012-10-18 01:56:34 UTC
Created attachment 28476 [details]
pixman-combine-float.i.gz - the original full preprocessed source

Applying the changes from http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=526b36a8a249c8c8698ca48ffeb8bff552f5a6fd to 'passes.c' in GCC 4.6 branch "fixes" the reduced testcase. But pixman still can't be compiled successfully, so also attaching the original full preprocessed source.
Comment 2 Richard Biener 2012-10-18 09:52:27 UTC
void combine_conjoint_xor_ca_float ()
{
    combine_channel_t j = pd_combine_conjoint_xor, k = pd_combine_conjoint_xor;
    a[0] = k (0, b, 0, a[0]);
    a[0] = k (0, b, 0, a[0]);
    a[0] = k (0, b, 0, a[0]);
    a[0] = j (0, c[0], 0, a[0]);
    a[0] = k (0, c[0], 0, a[0]);
    a[0] = k (0, c[0], 0, a[0]);
    a[0] = k (0, c[0], 0, a[0]);

you are using indirect function calls here, GCC in 4.6 is not smart enough
to transform them to direct calls before inlining.  Inlining of
always-inline indirect function calls is not going to work reliably.

Don't use always-inline or don't use indirect function calls to always-inline
functions.  It makes always-inline function calls survive until IPA inlining
where we seem to honor limits even though we say we should disregard them.

Honza?

Considering pd_combine_conjoint_xor with 383 size
 to be inlined into combine_conjoint_xor_ca_float in t.i:53
 Estimated growth after inlined into all callees is +748 insns.
 Estimated badness is -2147483648, frequency 1.00.
Processing frequency pd_combine_conjoint_xor
  Called by combine_conjoint_xor_ca_float that is normal or hot
 Inlined into combine_conjoint_xor_ca_float which now has time 1402 and size 2342,net change of +374.

Considering pd_combine_conjoint_xor with 383 size
 to be inlined into combine_conjoint_xor_ca_float in t.i:54
 Estimated growth after inlined into all callees is +374 insns.
 Estimated badness is -2147483648, frequency 1.00.
 Not inlining into combine_conjoint_xor_ca_float:--param large-function-growth limit reached.
Comment 3 Siarhei Siamashka 2012-10-18 10:47:51 UTC
(In reply to comment #2)
> void combine_conjoint_xor_ca_float ()
> {
>     combine_channel_t j = pd_combine_conjoint_xor, k = pd_combine_conjoint_xor;
>     a[0] = k (0, b, 0, a[0]);
>     a[0] = k (0, b, 0, a[0]);
>     a[0] = k (0, b, 0, a[0]);
>     a[0] = j (0, c[0], 0, a[0]);
>     a[0] = k (0, c[0], 0, a[0]);
>     a[0] = k (0, c[0], 0, a[0]);
>     a[0] = k (0, c[0], 0, a[0]);
> 
> you are using indirect function calls here, GCC in 4.6 is not smart enough
> to transform them to direct calls before inlining.  Inlining of
> always-inline indirect function calls is not going to work reliably.

Does this only apply to GCC 4.6?

> Don't use always-inline or don't use indirect function calls to always-inline
> functions.

This looks like it might be really inconvenient. Pixman relies on this functionality in a number of places by doing something like this:

void always_inline per_pixel_operation_a(...)
{
    ...
}

void always_inline per_pixel_operation_b(...)
{
    ...
}

void always_inline big_function_template(..., per_pixel_operation_ptr foo)
{
    ...
    /* do some calls to foo() in an inner loop */
    ...
}

void big_function_a(...)
{
    big_function_template(..., per_pixel_operation_a);
}

void big_function_b(...)
{
    big_function_template(..., per_pixel_operation_b);
}

Needless to say that we want to be absolutely sure that per-pixel operations are always inlined. Otherwise the performance gets really bad if the compiler ever makes a bad inlining decision.

The same functionality can be probably achieved by replacing always_inline functions with macros. But the code becomes less readable, more error prone and somewhat more difficult to maintain.
Comment 4 rguenther@suse.de 2012-10-18 10:58:56 UTC
On Thu, 18 Oct 2012, siarhei.siamashka at gmail dot com wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54965
> 
> --- Comment #3 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-10-18 10:47:51 UTC ---
> (In reply to comment #2)
> > void combine_conjoint_xor_ca_float ()
> > {
> >     combine_channel_t j = pd_combine_conjoint_xor, k = pd_combine_conjoint_xor;
> >     a[0] = k (0, b, 0, a[0]);
> >     a[0] = k (0, b, 0, a[0]);
> >     a[0] = k (0, b, 0, a[0]);
> >     a[0] = j (0, c[0], 0, a[0]);
> >     a[0] = k (0, c[0], 0, a[0]);
> >     a[0] = k (0, c[0], 0, a[0]);
> >     a[0] = k (0, c[0], 0, a[0]);
> > 
> > you are using indirect function calls here, GCC in 4.6 is not smart enough
> > to transform them to direct calls before inlining.  Inlining of
> > always-inline indirect function calls is not going to work reliably.
> 
> Does this only apply to GCC 4.6?

No, that applies in general.  If GCC isn't able to figure out which
function is called it cannot make sure always-inline functions are
always inlined.  always-inline is an attribute that should be used
if it is incorrect to not inline, not if that's just good for 
optimziation.

> > Don't use always-inline or don't use indirect function calls to always-inline
> > functions.
> 
> This looks like it might be really inconvenient. Pixman relies on this
> functionality in a number of places by doing something like this:
> 
> void always_inline per_pixel_operation_a(...)
> {
>     ...
> }
> 
> void always_inline per_pixel_operation_b(...)
> {
>     ...
> }
> 
> void always_inline big_function_template(..., per_pixel_operation_ptr foo)
> {
>     ...
>     /* do some calls to foo() in an inner loop */
>     ...
> }
> 
> void big_function_a(...)
> {
>     big_function_template(..., per_pixel_operation_a);
> }
> 
> void big_function_b(...)
> {
>     big_function_template(..., per_pixel_operation_b);
> }
> 
> Needless to say that we want to be absolutely sure that per-pixel operations
> are always inlined. Otherwise the performance gets really bad if the compiler
> ever makes a bad inlining decision.
> 
> The same functionality can be probably achieved by replacing always_inline
> functions with macros. But the code becomes less readable, more error prone and
> somewhat more difficult to maintain.

In the above case you probably want big_function_a to have all
calls inlined.  You can then conveniently use the flatten attribute:

void __attribute__((flatten)) big_function_b (...)
{
  big_function_template(..., per_pixel_operation_b);
}

GCC will then inline all calls in that function but not ICE
when it fails to inline one case for some weird reason.

Richard.
Comment 5 Siarhei Siamashka 2012-10-19 00:17:13 UTC
(In reply to comment #4)
> In the above case you probably want big_function_a to have all
> calls inlined.  You can then conveniently use the flatten attribute:
> 
> void __attribute__((flatten)) big_function_b (...)
> {
>   big_function_template(..., per_pixel_operation_b);
> }
> 
> GCC will then inline all calls in that function but not ICE
> when it fails to inline one case for some weird reason.

That's nice, but "flatten" attribute does not seem to be widely supported by the compilers. For example, clang-3.1 does not support it yet and the enhancement request is still open since 2010 - http://llvm.org/bugs/show_bug.cgi?id=7559

As far as I know, a few different compilers are currently in real use for building pixman for various systems: GCC, Clang, Solaris Studio and MSVC. All of them have some sort of "always_inline" attribute support, which makes it more universal than "flatten".

> Don't use always-inline or don't use indirect function calls to
> always-inline functions.  It makes always-inline function calls
> survive until IPA inlining where we seem to honor limits even
> though we say we should disregard them.

Is it too intrusive to fix GCC so that it would disregard limits in this case? Or maybe introduce one more attribute which would be a strong inlining hint, but still not cause compilation failure if some function can't be really inlined?
Comment 6 Richard Biener 2012-10-19 08:36:06 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > In the above case you probably want big_function_a to have all
> > calls inlined.  You can then conveniently use the flatten attribute:
> > 
> > void __attribute__((flatten)) big_function_b (...)
> > {
> >   big_function_template(..., per_pixel_operation_b);
> > }
> > 
> > GCC will then inline all calls in that function but not ICE
> > when it fails to inline one case for some weird reason.
> 
> That's nice, but "flatten" attribute does not seem to be widely supported by
> the compilers. For example, clang-3.1 does not support it yet and the
> enhancement request is still open since 2010 -
> http://llvm.org/bugs/show_bug.cgi?id=7559
> 
> As far as I know, a few different compilers are currently in real use for
> building pixman for various systems: GCC, Clang, Solaris Studio and MSVC. All
> of them have some sort of "always_inline" attribute support, which makes it
> more universal than "flatten".
> 
> > Don't use always-inline or don't use indirect function calls to
> > always-inline functions.  It makes always-inline function calls
> > survive until IPA inlining where we seem to honor limits even
> > though we say we should disregard them.
> 
> Is it too intrusive to fix GCC so that it would disregard limits in this case?
> Or maybe introduce one more attribute which would be a strong inlining hint,
> but still not cause compilation failure if some function can't be really
> inlined?

I think the particular case is a bug in GCC, I was just mentioning that
using indirect function calls to always-inline functions is always prone
to this kind of error.
Comment 7 Jan Hubicka 2012-10-23 14:03:06 UTC
> you are using indirect function calls here, GCC in 4.6 is not smart enough
> to transform them to direct calls before inlining.  Inlining of
> always-inline indirect function calls is not going to work reliably.
> 
> Don't use always-inline or don't use indirect function calls to always-inline
> functions.  It makes always-inline function calls survive until IPA inlining
> where we seem to honor limits even though we say we should disregard them.

Yes, we do. It was actually your change to move always_inline function handling
out of the small functions inlining into inline_always_inline_functions.  The
motivation was that when done as part of small function inlining other inlining
may close cycles in the callgraph making always_inline inlining impossible.

In the presence of indirect calls, I do not think it is possible to honnor
always_inline completely because of the ordering issue.

Honza
Comment 8 Jakub Jelinek 2013-04-12 16:18:03 UTC
The 4.6 branch has been closed, fixed in GCC 4.7.0.