This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Early inlining and function references from static const struct (bug?)


Hi Richard,

I'm not quite following you. I understand the non-cp issue. But are
you saying it's somehow related to the non-inlining issue (in that
case I'm unable to see the relationship, because the compound literal
case always gets inlined)? Or maybe you consider there is no
non-inlining problem to address here, despite the different outcomes
for the static const and the compound literal cases, and you're just
pointing out a different, unrelated, issue.

Cheers
--
Carlos

On Fri, Feb 5, 2016 at 8:28 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Feb 4, 2016 at 9:10 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>> PS 2 (last one, I swear): I've isolated what I think is the root of
>> the problem. When einline expands g, there is plenty of call sites for
>> f.call, so the full redundancy elimination pass replaces sum for
>> f.call, making things easy for the late ipa inliner. But when g is not
>> early inlined, there is only one call site for the global f.call and
>> another one for the local/literal f.call, so the fre pass just lets
>> them be. This is innocuous from the fre point of view, but disables
>> further inlining as described above.
>>
>> On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>> PS: I framed the issue between the inline_param and fixup_cfg passes
>>> because I was only looking at the tree passes, but the really relevant
>>> passes are tree-einline (obviously) and ipa-inline, which happens
>>> between tree-inline_param2 and tree-fixup_cfg2. So, restating the
>>> problem: if early inline is not happening, late inline will miss the
>>> chance to inline the reference from the static const struct member.
>>>
>>> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I've been trying to understand some bizarre interaction between
>>>> optimizing passes I've observed while compiling a heavily nested
>>>> inlined numerical code of mine. I managed to reduce the issue down to
>>>> this simple code:
>>>>
>>>> ``` test.c
>>>>
>>>> typedef struct F {
>>>>   int (*call)(int);
>>>> } F;
>>>>
>>>> static int g(F f, int x) {
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   x = f.call(x);
>>>>   return x;
>>>> }
>>>>
>>>> static int sq(int x) {
>>>>   return x * x;
>>>> }
>>>>
>>>> static const F f = {sq};
>>>>
>>>> void dosomething(int);
>>>>
>>>> int h(int x) {
>>>>   dosomething(g(f, x));
>>>>   dosomething(g((F){sq}, x));
>>>> }
>>>>
>>>> ```
>>>>
>>>> Here we have a driver function h calling the workhorse g which
>>>> delegates some simple task to the inline-wannabe f. The distinctive
>>>> aspect of the above scheme is that f is referenced from a struct
>>>> member. The first call to g passes a static const struct while the
>>>> second call passes a compound literal (alternatively, a local version
>>>> of the struct will have the same effect regarding what follows).
>>>>
>>>> Now, say I compile this code with:
>>>>
>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>>>>
>>>> The einline pass will not be able to inline calls to g with such a low
>>>> value for early-inlining-insns.
>>>>
>>>> The inline_param2 pass still shows:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   struct F D.1847;
>>>>   int _4;
>>>>   int _8;
>>>>
>>>>   <bb 2>:
>>>>   _4 = g (f, x_2(D));
>>>>   dosomething (_4);
>>>>   D.1847.call = sq;
>>>>   _8 = g (D.1847, x_2(D));
>>>>   dosomething (_8);
>>>>   return;
>>>>
>>>> }
>>>>
>>>> ```
>>>>
>>>> The next tree pass is fixup_cfg4, which does the inline but just for
>>>> the second all to g:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   ....
>>>>
>>>>   <bb 2>:
>>>>   f = f;
>>>>   f$call_7 = MEM[(struct F *)&f];
>>>>   x_19 = f$call_7 (x_2(D));
>>>>   x_20 = f$call_7 (x_19);
>>>>   x_21 = f$call_7 (x_20);
>>>>   x_22 = f$call_7 (x_21);
>>>>   x_23 = f$call_7 (x_22);
>>>>   x_24 = f$call_7 (x_23);
>>>>   x_25 = f$call_7 (x_24);
>>>>   x_26 = f$call_7 (x_25);
>>>>   _43 = x_26;
>>>>   _4 = _43;
>>>>   dosomething (_4);
>>>>   D.1847.call = sq;
>>>>   f = D.1847;
>>>>   f$call_10 = MEM[(struct F *)&f];
>>>>   _33 = x_2(D) * x_2(D);
>>>>   _45 = _33;
>>>>   x_11 = _45;
>>>>   _32 = x_11 * x_11;
>>>>   _46 = _32;
>>>>   x_12 = _46;
>>>>   _31 = x_12 * x_12;
>>>>   _47 = _31;
>>>>   x_13 = _47;
>>>>   _30 = x_13 * x_13;
>>>>   _48 = _30;
>>>>   x_14 = _48;
>>>>   _29 = x_14 * x_14;
>>>>   _49 = _29;
>>>>   x_15 = _49;
>>>>   _28 = x_15 * x_15;
>>>>   _50 = _28;
>>>>   x_16 = _50;
>>>>   _27 = x_16 * x_16;
>>>>   _51 = _27;
>>>>   x_17 = _51;
>>>>   _3 = x_17 * x_17;
>>>>   _52 = _3;
>>>>   x_18 = _52;
>>>>   _53 = x_18;
>>>>   _8 = _53;
>>>>   dosomething (_8);
>>>>   return;
>>>>
>>>> }
>>>> ```
>>>>
>>>> Now, say I recompile the code with a larger early-inlining-insns, so
>>>> that einline is able to early inline both calls to g:
>>>>
>>>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>>>>
>>>> After inline_param2 (that is, before fixup_cfg4), we now have:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   int x;
>>>>   int x;
>>>>
>>>>   <bb 2>:
>>>>   x_13 = sq (x_2(D));
>>>>   x_14 = sq (x_13);
>>>>   x_15 = sq (x_14);
>>>>   x_16 = sq (x_15);
>>>>   x_17 = sq (x_16);
>>>>   x_18 = sq (x_17);
>>>>   x_19 = sq (x_18);
>>>>   x_20 = sq (x_19);
>>>>   dosomething (x_20);
>>>>   x_5 = sq (x_2(D));
>>>>   x_6 = sq (x_5);
>>>>   x_7 = sq (x_6);
>>>>   x_8 = sq (x_7);
>>>>   x_9 = sq (x_8);
>>>>   x_10 = sq (x_9);
>>>>   x_11 = sq (x_10);
>>>>   x_12 = sq (x_11);
>>>>   dosomething (x_12);
>>>>   return;
>>>>
>>>> }
>>>> ```
>>>>
>>>> And fixup_cfg4 is able to do its job for both calls:
>>>>
>>>> ```
>>>> h (int x)
>>>> {
>>>>   ....
>>>>
>>>>   <bb 2>:
>>>>   _36 = x_2(D) * x_2(D);
>>>>   _37 = _36;
>>>>   x_13 = _37;
>>>>   _35 = x_13 * x_13;
>>>>   _38 = _35;
>>>>   x_14 = _38;
>>>>   _34 = x_14 * x_14;
>>>>   _39 = _34;
>>>>   x_15 = _39;
>>>>   _33 = x_15 * x_15;
>>>>   _40 = _33;
>>>>   x_16 = _40;
>>>>   _32 = x_16 * x_16;
>>>>   _41 = _32;
>>>>   x_17 = _41;
>>>>   _31 = x_17 * x_17;
>>>>   _42 = _31;
>>>>   x_18 = _42;
>>>>   _30 = x_18 * x_18;
>>>>   _43 = _30;
>>>>   x_19 = _43;
>>>>   _29 = x_19 * x_19;
>>>>   _44 = _29;
>>>>   x_20 = _44;
>>>>   dosomething (x_20);
>>>>   _28 = x_2(D) * x_2(D);
>>>>   _45 = _28;
>>>>   x_5 = _45;
>>>>   _27 = x_5 * x_5;
>>>>   _46 = _27;
>>>>   x_6 = _46;
>>>>   _26 = x_6 * x_6;
>>>>   _47 = _26;
>>>>   x_7 = _47;
>>>>   _25 = x_7 * x_7;
>>>>   _48 = _25;
>>>>   x_8 = _48;
>>>>   _24 = x_8 * x_8;
>>>>   _49 = _24;
>>>>   x_9 = _49;
>>>>   _23 = x_9 * x_9;
>>>>   _50 = _23;
>>>>   x_10 = _50;
>>>>   _22 = x_10 * x_10;
>>>>   _51 = _22;
>>>>   x_11 = _51;
>>>>   _21 = x_11 * x_11;
>>>>   _52 = _21;
>>>>   x_12 = _52;
>>>>   dosomething (x_12);
>>>>   return;
>>>>
>>>> }
>>>> ```
>>>>
>>>> The bottom line is that I get full inlining if einline manages to
>>>> early inline both g calls, but I get incomplete inlining otherwise. I
>>>> guess the problem is that fixup_cfg4 is not able to infer that
>>>> f$call_7 is just sq in disguise when f is the global static const
>>>> struct but it is able to get it when it's a local or literal one. In
>>>> case einline expands the code early the successive passes will make
>>>> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
>>>> matter. But if einline hits its hard limits, fixup_cfg4 will have to
>>>> figure out that f$call is sq by itself.
>>>>
>>>> I'm not sure whether this should be considered a proper bug or more of
>>>> a quirk of the inlining system one must learn to live with. In the
>>>> first case, I'll report it if you ask me to do it. In the second case,
>>>> I would like to ask for some advice about the best way to cope with
>>>> this scenario (besides blindly incrementing early-inlining-insns); I
>>>> can provide more background regarding my real use case if necessary.
>
> I think the real missed optimization is that for the passed aggregate D.1772
>
>  D.1772.call = sq;
>   _8 = g (D.1772, x_2(D));
>
> IPA CP cannot figure out a jump-function with the sq function as
> value.  It _can_
> for the call passing f:
>
>     callsite  h/3 -> g/0 :
>        param 0: UNKNOWN
>          Aggregate passed by value:
>            offset: 0, cst: sq
>          Unknown alignment
>        param 1: PASS THROUGH: 0, op nop_expr
>          Unknown alignment
>     callsite  h/3 -> dosomething/4 :
>     callsite  h/3 -> g/0 :
>        param 0: UNKNOWN
>          Unknown alignment
>        param 1: PASS THROUGH: 0, op nop_expr
>          Unknown alignment
>
> not sure how hard to fix that is, but it can't be too difficult.  Martin?
>
> You may want to open a bugreport.
>
> Richard.
>
>>>> Cheers
>>>> --
>>>> Carlos


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]