This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Early inlining and function references from static const struct (bug?)


PS 2 (last one, I swear): I've isolated what I think is the root of
the problem. When einline expands g, there is plenty of call sites for
f.call, so the full redundancy elimination pass replaces sum for
f.call, making things easy for the late ipa inliner. But when g is not
early inlined, there is only one call site for the global f.call and
another one for the local/literal f.call, so the fre pass just lets
them be. This is innocuous from the fre point of view, but disables
further inlining as described above.

On Thu, Feb 4, 2016 at 3:05 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
> PS: I framed the issue between the inline_param and fixup_cfg passes
> because I was only looking at the tree passes, but the really relevant
> passes are tree-einline (obviously) and ipa-inline, which happens
> between tree-inline_param2 and tree-fixup_cfg2. So, restating the
> problem: if early inline is not happening, late inline will miss the
> chance to inline the reference from the static const struct member.
>
> On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
>> Hi all,
>>
>> I've been trying to understand some bizarre interaction between
>> optimizing passes I've observed while compiling a heavily nested
>> inlined numerical code of mine. I managed to reduce the issue down to
>> this simple code:
>>
>> ``` test.c
>>
>> typedef struct F {
>>   int (*call)(int);
>> } F;
>>
>> static int g(F f, int x) {
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   x = f.call(x);
>>   return x;
>> }
>>
>> static int sq(int x) {
>>   return x * x;
>> }
>>
>> static const F f = {sq};
>>
>> void dosomething(int);
>>
>> int h(int x) {
>>   dosomething(g(f, x));
>>   dosomething(g((F){sq}, x));
>> }
>>
>> ```
>>
>> Here we have a driver function h calling the workhorse g which
>> delegates some simple task to the inline-wannabe f. The distinctive
>> aspect of the above scheme is that f is referenced from a struct
>> member. The first call to g passes a static const struct while the
>> second call passes a compound literal (alternatively, a local version
>> of the struct will have the same effect regarding what follows).
>>
>> Now, say I compile this code with:
>>
>> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>>
>> The einline pass will not be able to inline calls to g with such a low
>> value for early-inlining-insns.
>>
>> The inline_param2 pass still shows:
>>
>> ```
>> h (int x)
>> {
>>   struct F D.1847;
>>   int _4;
>>   int _8;
>>
>>   <bb 2>:
>>   _4 = g (f, x_2(D));
>>   dosomething (_4);
>>   D.1847.call = sq;
>>   _8 = g (D.1847, x_2(D));
>>   dosomething (_8);
>>   return;
>>
>> }
>>
>> ```
>>
>> The next tree pass is fixup_cfg4, which does the inline but just for
>> the second all to g:
>>
>> ```
>> h (int x)
>> {
>>   ....
>>
>>   <bb 2>:
>>   f = f;
>>   f$call_7 = MEM[(struct F *)&f];
>>   x_19 = f$call_7 (x_2(D));
>>   x_20 = f$call_7 (x_19);
>>   x_21 = f$call_7 (x_20);
>>   x_22 = f$call_7 (x_21);
>>   x_23 = f$call_7 (x_22);
>>   x_24 = f$call_7 (x_23);
>>   x_25 = f$call_7 (x_24);
>>   x_26 = f$call_7 (x_25);
>>   _43 = x_26;
>>   _4 = _43;
>>   dosomething (_4);
>>   D.1847.call = sq;
>>   f = D.1847;
>>   f$call_10 = MEM[(struct F *)&f];
>>   _33 = x_2(D) * x_2(D);
>>   _45 = _33;
>>   x_11 = _45;
>>   _32 = x_11 * x_11;
>>   _46 = _32;
>>   x_12 = _46;
>>   _31 = x_12 * x_12;
>>   _47 = _31;
>>   x_13 = _47;
>>   _30 = x_13 * x_13;
>>   _48 = _30;
>>   x_14 = _48;
>>   _29 = x_14 * x_14;
>>   _49 = _29;
>>   x_15 = _49;
>>   _28 = x_15 * x_15;
>>   _50 = _28;
>>   x_16 = _50;
>>   _27 = x_16 * x_16;
>>   _51 = _27;
>>   x_17 = _51;
>>   _3 = x_17 * x_17;
>>   _52 = _3;
>>   x_18 = _52;
>>   _53 = x_18;
>>   _8 = _53;
>>   dosomething (_8);
>>   return;
>>
>> }
>> ```
>>
>> Now, say I recompile the code with a larger early-inlining-insns, so
>> that einline is able to early inline both calls to g:
>>
>> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>>
>> After inline_param2 (that is, before fixup_cfg4), we now have:
>>
>> ```
>> h (int x)
>> {
>>   int x;
>>   int x;
>>
>>   <bb 2>:
>>   x_13 = sq (x_2(D));
>>   x_14 = sq (x_13);
>>   x_15 = sq (x_14);
>>   x_16 = sq (x_15);
>>   x_17 = sq (x_16);
>>   x_18 = sq (x_17);
>>   x_19 = sq (x_18);
>>   x_20 = sq (x_19);
>>   dosomething (x_20);
>>   x_5 = sq (x_2(D));
>>   x_6 = sq (x_5);
>>   x_7 = sq (x_6);
>>   x_8 = sq (x_7);
>>   x_9 = sq (x_8);
>>   x_10 = sq (x_9);
>>   x_11 = sq (x_10);
>>   x_12 = sq (x_11);
>>   dosomething (x_12);
>>   return;
>>
>> }
>> ```
>>
>> And fixup_cfg4 is able to do its job for both calls:
>>
>> ```
>> h (int x)
>> {
>>   ....
>>
>>   <bb 2>:
>>   _36 = x_2(D) * x_2(D);
>>   _37 = _36;
>>   x_13 = _37;
>>   _35 = x_13 * x_13;
>>   _38 = _35;
>>   x_14 = _38;
>>   _34 = x_14 * x_14;
>>   _39 = _34;
>>   x_15 = _39;
>>   _33 = x_15 * x_15;
>>   _40 = _33;
>>   x_16 = _40;
>>   _32 = x_16 * x_16;
>>   _41 = _32;
>>   x_17 = _41;
>>   _31 = x_17 * x_17;
>>   _42 = _31;
>>   x_18 = _42;
>>   _30 = x_18 * x_18;
>>   _43 = _30;
>>   x_19 = _43;
>>   _29 = x_19 * x_19;
>>   _44 = _29;
>>   x_20 = _44;
>>   dosomething (x_20);
>>   _28 = x_2(D) * x_2(D);
>>   _45 = _28;
>>   x_5 = _45;
>>   _27 = x_5 * x_5;
>>   _46 = _27;
>>   x_6 = _46;
>>   _26 = x_6 * x_6;
>>   _47 = _26;
>>   x_7 = _47;
>>   _25 = x_7 * x_7;
>>   _48 = _25;
>>   x_8 = _48;
>>   _24 = x_8 * x_8;
>>   _49 = _24;
>>   x_9 = _49;
>>   _23 = x_9 * x_9;
>>   _50 = _23;
>>   x_10 = _50;
>>   _22 = x_10 * x_10;
>>   _51 = _22;
>>   x_11 = _51;
>>   _21 = x_11 * x_11;
>>   _52 = _21;
>>   x_12 = _52;
>>   dosomething (x_12);
>>   return;
>>
>> }
>> ```
>>
>> The bottom line is that I get full inlining if einline manages to
>> early inline both g calls, but I get incomplete inlining otherwise. I
>> guess the problem is that fixup_cfg4 is not able to infer that
>> f$call_7 is just sq in disguise when f is the global static const
>> struct but it is able to get it when it's a local or literal one. In
>> case einline expands the code early the successive passes will make
>> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
>> matter. But if einline hits its hard limits, fixup_cfg4 will have to
>> figure out that f$call is sq by itself.
>>
>> I'm not sure whether this should be considered a proper bug or more of
>> a quirk of the inlining system one must learn to live with. In the
>> first case, I'll report it if you ask me to do it. In the second case,
>> I would like to ask for some advice about the best way to cope with
>> this scenario (besides blindly incrementing early-inlining-insns); I
>> can provide more background regarding my real use case if necessary.
>>
>> Cheers
>> --
>> Carlos


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]