This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Early inlining and function references from static const struct (bug?)
- From: Carlos Pita <carlosjosepita at gmail dot com>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 4 Feb 2016 13:08:25 -0300
- Subject: Early inlining and function references from static const struct (bug?)
- Authentication-results: sourceware.org; auth=none
Hi all,
I've been trying to understand some bizarre interaction between
optimizing passes I've observed while compiling a heavily nested
inlined numerical code of mine. I managed to reduce the issue down to
this simple code:
``` test.c
typedef struct F {
int (*call)(int);
} F;
static int g(F f, int x) {
x = f.call(x);
x = f.call(x);
x = f.call(x);
x = f.call(x);
x = f.call(x);
x = f.call(x);
x = f.call(x);
x = f.call(x);
return x;
}
static int sq(int x) {
return x * x;
}
static const F f = {sq};
void dosomething(int);
int h(int x) {
dosomething(g(f, x));
dosomething(g((F){sq}, x));
}
```
Here we have a driver function h calling the workhorse g which
delegates some simple task to the inline-wannabe f. The distinctive
aspect of the above scheme is that f is referenced from a struct
member. The first call to g passes a static const struct while the
second call passes a compound literal (alternatively, a local version
of the struct will have the same effect regarding what follows).
Now, say I compile this code with:
gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
The einline pass will not be able to inline calls to g with such a low
value for early-inlining-insns.
The inline_param2 pass still shows:
```
h (int x)
{
struct F D.1847;
int _4;
int _8;
<bb 2>:
_4 = g (f, x_2(D));
dosomething (_4);
D.1847.call = sq;
_8 = g (D.1847, x_2(D));
dosomething (_8);
return;
}
```
The next tree pass is fixup_cfg4, which does the inline but just for
the second all to g:
```
h (int x)
{
....
<bb 2>:
f = f;
f$call_7 = MEM[(struct F *)&f];
x_19 = f$call_7 (x_2(D));
x_20 = f$call_7 (x_19);
x_21 = f$call_7 (x_20);
x_22 = f$call_7 (x_21);
x_23 = f$call_7 (x_22);
x_24 = f$call_7 (x_23);
x_25 = f$call_7 (x_24);
x_26 = f$call_7 (x_25);
_43 = x_26;
_4 = _43;
dosomething (_4);
D.1847.call = sq;
f = D.1847;
f$call_10 = MEM[(struct F *)&f];
_33 = x_2(D) * x_2(D);
_45 = _33;
x_11 = _45;
_32 = x_11 * x_11;
_46 = _32;
x_12 = _46;
_31 = x_12 * x_12;
_47 = _31;
x_13 = _47;
_30 = x_13 * x_13;
_48 = _30;
x_14 = _48;
_29 = x_14 * x_14;
_49 = _29;
x_15 = _49;
_28 = x_15 * x_15;
_50 = _28;
x_16 = _50;
_27 = x_16 * x_16;
_51 = _27;
x_17 = _51;
_3 = x_17 * x_17;
_52 = _3;
x_18 = _52;
_53 = x_18;
_8 = _53;
dosomething (_8);
return;
}
```
Now, say I recompile the code with a larger early-inlining-insns, so
that einline is able to early inline both calls to g:
gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
After inline_param2 (that is, before fixup_cfg4), we now have:
```
h (int x)
{
int x;
int x;
<bb 2>:
x_13 = sq (x_2(D));
x_14 = sq (x_13);
x_15 = sq (x_14);
x_16 = sq (x_15);
x_17 = sq (x_16);
x_18 = sq (x_17);
x_19 = sq (x_18);
x_20 = sq (x_19);
dosomething (x_20);
x_5 = sq (x_2(D));
x_6 = sq (x_5);
x_7 = sq (x_6);
x_8 = sq (x_7);
x_9 = sq (x_8);
x_10 = sq (x_9);
x_11 = sq (x_10);
x_12 = sq (x_11);
dosomething (x_12);
return;
}
```
And fixup_cfg4 is able to do its job for both calls:
```
h (int x)
{
....
<bb 2>:
_36 = x_2(D) * x_2(D);
_37 = _36;
x_13 = _37;
_35 = x_13 * x_13;
_38 = _35;
x_14 = _38;
_34 = x_14 * x_14;
_39 = _34;
x_15 = _39;
_33 = x_15 * x_15;
_40 = _33;
x_16 = _40;
_32 = x_16 * x_16;
_41 = _32;
x_17 = _41;
_31 = x_17 * x_17;
_42 = _31;
x_18 = _42;
_30 = x_18 * x_18;
_43 = _30;
x_19 = _43;
_29 = x_19 * x_19;
_44 = _29;
x_20 = _44;
dosomething (x_20);
_28 = x_2(D) * x_2(D);
_45 = _28;
x_5 = _45;
_27 = x_5 * x_5;
_46 = _27;
x_6 = _46;
_26 = x_6 * x_6;
_47 = _26;
x_7 = _47;
_25 = x_7 * x_7;
_48 = _25;
x_8 = _48;
_24 = x_8 * x_8;
_49 = _24;
x_9 = _49;
_23 = x_9 * x_9;
_50 = _23;
x_10 = _50;
_22 = x_10 * x_10;
_51 = _22;
x_11 = _51;
_21 = x_11 * x_11;
_52 = _21;
x_12 = _52;
dosomething (x_12);
return;
}
```
The bottom line is that I get full inlining if einline manages to
early inline both g calls, but I get incomplete inlining otherwise. I
guess the problem is that fixup_cfg4 is not able to infer that
f$call_7 is just sq in disguise when f is the global static const
struct but it is able to get it when it's a local or literal one. In
case einline expands the code early the successive passes will make
fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
matter. But if einline hits its hard limits, fixup_cfg4 will have to
figure out that f$call is sq by itself.
I'm not sure whether this should be considered a proper bug or more of
a quirk of the inlining system one must learn to live with. In the
first case, I'll report it if you ask me to do it. In the second case,
I would like to ask for some advice about the best way to cope with
this scenario (besides blindly incrementing early-inlining-insns); I
can provide more background regarding my real use case if necessary.
Cheers
--
Carlos