This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Early inlining and function references from static const struct (bug?)


On Fri, Feb 5, 2016 at 3:12 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
> Hi Richard,
>
> I'm not quite following you. I understand the non-cp issue. But are
> you saying it's somehow related to the non-inlining issue (in that
> case I'm unable to see the relationship, because the compound literal
> case always gets inlined)? Or maybe you consider there is no
> non-inlining problem to address here, despite the different outcomes
> for the static const and the compound literal cases, and you're just
> pointing out a different, unrelated, issue.

I was saying that early inlining is not supposed to catch this case
but IPA inlining.
it shouldn't need to inline g early to end up inlining the calls to sq.  IPA CP
should clone g for the case of it calling sq and then inlining should
just do its job.

Richard.

> Cheers
> --
> Carlos
>
> On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>> PS 2 (last one, I swear): I've isolated what I think is the root of
>>> the problem. When einline expands g, there is plenty of call sites for
>>> f.call, so the full redundancy elimination pass replaces sum for
>>> f.call, making things easy for the late ipa inliner. But when g is not
>>> early inlined, there is only one call site for the global f.call and
>>> another one for the local/literal f.call, so the fre pass just lets
>>> them be. This is innocuous from the fre point of view, but disables
>>> further inlining as described above.
>>>
>>> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>>> PS: I framed the issue between the inline_param and fixup_cfg passes
>>>> because I was only looking at the tree passes, but the really relevant
>>>> passes are tree-einline (obviously) and ipa-inline, which happens
>>>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the
>>>> problem: if early inline is not happening, late inline will miss the
>>>> chance to inline the reference from the static const struct member.
>>>>
>>>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I've been trying to understand some bizarre interaction between
>>>>> optimizing passes I've observed while compiling a heavily nested
>>>>> inlined numerical code of mine. I managed to reduce the issue down to
>>>>> this simple code:
>>>>>
>>>>> ``` test.c
>>>>>
>>>>> typedef struct F {
>>>>>   int (*call)(int);
>>>>> } F;
>>>>>
>>>>> static int g(F f, int x) {
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   x = f.call(x);
>>>>>   return x;
>>>>> }
>>>>>
>>>>> static int sq(int x) {
>>>>>   return x * x;
>>>>> }
>>>>>
>>>>> static const F f = {sq};
>>>>>
>>>>> void dosomething(int);
>>>>>
>>>>> int h(int x) {
>>>>>   dosomething(g(f, x));
>>>>>   dosomething(g((F){sq}, x));
>>>>> }
>>>>>
>>>>> ```
>>>>>
>>>>> Here we have a driver function h calling the workhorse g which
>>>>> delegates some simple task to the inline-wannabe f. The distinctive
>>>>> aspect of the above scheme is that f is referenced from a struct
>>>>> member. The first call to g passes a static const struct while the
>>>>> second call passes a compound literal (alternatively, a local version
>>>>> of the struct will have the same effect regarding what follows).
>>>>>
>>>>> Now, say I compile this code with:
>>>>>
>>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>>>>>
>>>>> The einline pass will not be able to inline calls to g with such a low
>>>>> value for early-inlining-insns.
>>>>>
>>>>> The inline_param2 pass still shows:
>>>>>
>>>>> ```
>>>>> h (int x)
>>>>> {
>>>>>   struct F D.1847;
>>>>>   int _4;
>>>>>   int _8;
>>>>>
>>>>>   <bb 2>:
>>>>>   _4 = g (f, x_2(D));
>>>>>   dosomething (_4);
>>>>>   D.1847.call = sq;
>>>>>   _8 = g (D.1847, x_2(D));
>>>>>   dosomething (_8);
>>>>>   return;
>>>>>
>>>>> }
>>>>>
>>>>> ```
>>>>>
>>>>> The next tree pass is fixup_cfg4, which does the inline but just for
>>>>> the second all to g:
>>>>>
>>>>> ```
>>>>> h (int x)
>>>>> {
>>>>>   ....
>>>>>
>>>>>   <bb 2>:
>>>>>   f = f;
>>>>>   f$call_7 = MEM[(struct F *)&f];
>>>>>   x_19 = f$call_7 (x_2(D));
>>>>>   x_20 = f$call_7 (x_19);
>>>>>   x_21 = f$call_7 (x_20);
>>>>>   x_22 = f$call_7 (x_21);
>>>>>   x_23 = f$call_7 (x_22);
>>>>>   x_24 = f$call_7 (x_23);
>>>>>   x_25 = f$call_7 (x_24);
>>>>>   x_26 = f$call_7 (x_25);
>>>>>   _43 = x_26;
>>>>>   _4 = _43;
>>>>>   dosomething (_4);
>>>>>   D.1847.call = sq;
>>>>>   f = D.1847;
>>>>>   f$call_10 = MEM[(struct F *)&f];
>>>>>   _33 = x_2(D) * x_2(D);
>>>>>   _45 = _33;
>>>>>   x_11 = _45;
>>>>>   _32 = x_11 * x_11;
>>>>>   _46 = _32;
>>>>>   x_12 = _46;
>>>>>   _31 = x_12 * x_12;
>>>>>   _47 = _31;
>>>>>   x_13 = _47;
>>>>>   _30 = x_13 * x_13;
>>>>>   _48 = _30;
>>>>>   x_14 = _48;
>>>>>   _29 = x_14 * x_14;
>>>>>   _49 = _29;
>>>>>   x_15 = _49;
>>>>>   _28 = x_15 * x_15;
>>>>>   _50 = _28;
>>>>>   x_16 = _50;
>>>>>   _27 = x_16 * x_16;
>>>>>   _51 = _27;
>>>>>   x_17 = _51;
>>>>>   _3 = x_17 * x_17;
>>>>>   _52 = _3;
>>>>>   x_18 = _52;
>>>>>   _53 = x_18;
>>>>>   _8 = _53;
>>>>>   dosomething (_8);
>>>>>   return;
>>>>>
>>>>> }
>>>>> ```
>>>>>
>>>>> Now, say I recompile the code with a larger early-inlining-insns, so
>>>>> that einline is able to early inline both calls to g:
>>>>>
>>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>>>>>
>>>>> After inline_param2 (that is, before fixup_cfg4), we now have:
>>>>>
>>>>> ```
>>>>> h (int x)
>>>>> {
>>>>>   int x;
>>>>>   int x;
>>>>>
>>>>>   <bb 2>:
>>>>>   x_13 = sq (x_2(D));
>>>>>   x_14 = sq (x_13);
>>>>>   x_15 = sq (x_14);
>>>>>   x_16 = sq (x_15);
>>>>>   x_17 = sq (x_16);
>>>>>   x_18 = sq (x_17);
>>>>>   x_19 = sq (x_18);
>>>>>   x_20 = sq (x_19);
>>>>>   dosomething (x_20);
>>>>>   x_5 = sq (x_2(D));
>>>>>   x_6 = sq (x_5);
>>>>>   x_7 = sq (x_6);
>>>>>   x_8 = sq (x_7);
>>>>>   x_9 = sq (x_8);
>>>>>   x_10 = sq (x_9);
>>>>>   x_11 = sq (x_10);
>>>>>   x_12 = sq (x_11);
>>>>>   dosomething (x_12);
>>>>>   return;
>>>>>
>>>>> }
>>>>> ```
>>>>>
>>>>> And fixup_cfg4 is able to do its job for both calls:
>>>>>
>>>>> ```
>>>>> h (int x)
>>>>> {
>>>>>   ....
>>>>>
>>>>>   <bb 2>:
>>>>>   _36 = x_2(D) * x_2(D);
>>>>>   _37 = _36;
>>>>>   x_13 = _37;
>>>>>   _35 = x_13 * x_13;
>>>>>   _38 = _35;
>>>>>   x_14 = _38;
>>>>>   _34 = x_14 * x_14;
>>>>>   _39 = _34;
>>>>>   x_15 = _39;
>>>>>   _33 = x_15 * x_15;
>>>>>   _40 = _33;
>>>>>   x_16 = _40;
>>>>>   _32 = x_16 * x_16;
>>>>>   _41 = _32;
>>>>>   x_17 = _41;
>>>>>   _31 = x_17 * x_17;
>>>>>   _42 = _31;
>>>>>   x_18 = _42;
>>>>>   _30 = x_18 * x_18;
>>>>>   _43 = _30;
>>>>>   x_19 = _43;
>>>>>   _29 = x_19 * x_19;
>>>>>   _44 = _29;
>>>>>   x_20 = _44;
>>>>>   dosomething (x_20);
>>>>>   _28 = x_2(D) * x_2(D);
>>>>>   _45 = _28;
>>>>>   x_5 = _45;
>>>>>   _27 = x_5 * x_5;
>>>>>   _46 = _27;
>>>>>   x_6 = _46;
>>>>>   _26 = x_6 * x_6;
>>>>>   _47 = _26;
>>>>>   x_7 = _47;
>>>>>   _25 = x_7 * x_7;
>>>>>   _48 = _25;
>>>>>   x_8 = _48;
>>>>>   _24 = x_8 * x_8;
>>>>>   _49 = _24;
>>>>>   x_9 = _49;
>>>>>   _23 = x_9 * x_9;
>>>>>   _50 = _23;
>>>>>   x_10 = _50;
>>>>>   _22 = x_10 * x_10;
>>>>>   _51 = _22;
>>>>>   x_11 = _51;
>>>>>   _21 = x_11 * x_11;
>>>>>   _52 = _21;
>>>>>   x_12 = _52;
>>>>>   dosomething (x_12);
>>>>>   return;
>>>>>
>>>>> }
>>>>> ```
>>>>>
>>>>> The bottom line is that I get full inlining if einline manages to
>>>>> early inline both g calls, but I get incomplete inlining otherwise. I
>>>>> guess the problem is that fixup_cfg4 is not able to infer that
>>>>> f$call_7 is just sq in disguise when f is the global static const
>>>>> struct but it is able to get it when it's a local or literal one. In
>>>>> case einline expands the code early the successive passes will make
>>>>> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
>>>>> matter. But if einline hits its hard limits, fixup_cfg4 will have to
>>>>> figure out that f$call is sq by itself.
>>>>>
>>>>> I'm not sure whether this should be considered a proper bug or more of
>>>>> a quirk of the inlining system one must learn to live with. In the
>>>>> first case, I'll report it if you ask me to do it. In the second case,
>>>>> I would like to ask for some advice about the best way to cope with
>>>>> this scenario (besides blindly incrementing early-inlining-insns); I
>>>>> can provide more background regarding my real use case if necessary.
>>
>> I think the real missed optimization is that for the passed aggregate D.1772
>>
>>  D.1772.call = sq;
>>   _8 = g (D.1772, x_2(D));
>>
>> IPA CP cannot figure out a jump-function with the sq function as
>> value.  It _can_
>> for the call passing f:
>>
>>     callsite  h/3 -> g/0 :
>>        param 0: UNKNOWN
>>          Aggregate passed by value:
>>            offset: 0, cst: sq
>>          Unknown alignment
>>        param 1: PASS THROUGH: 0, op nop_expr
>>          Unknown alignment
>>     callsite  h/3 -> dosomething/4 :
>>     callsite  h/3 -> g/0 :
>>        param 0: UNKNOWN
>>          Unknown alignment
>>        param 1: PASS THROUGH: 0, op nop_expr
>>          Unknown alignment
>>
>> not sure how hard to fix that is, but it can't be too difficult.  Martin?
>>
>> You may want to open a bugreport.
>>
>> Richard.
>>
>>>>> Cheers
>>>>> --
>>>>> Carlos


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]