This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Early inlining and function references from static const struct (bug?)


Hi all,

I've been trying to understand some bizarre interaction between
optimizing passes I've observed while compiling a heavily nested
inlined numerical code of mine. I managed to reduce the issue down to
this simple code:

``` test.c

typedef struct F {
  int (*call)(int);
} F;

static int g(F f, int x) {
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  x = f.call(x);
  return x;
}

static int sq(int x) {
  return x * x;
}

static const F f = {sq};

void dosomething(int);

int h(int x) {
  dosomething(g(f, x));
  dosomething(g((F){sq}, x));
}

```

Here we have a driver function h calling the workhorse g which
delegates some simple task to the inline-wannabe f. The distinctive
aspect of the above scheme is that f is referenced from a struct
member. The first call to g passes a static const struct while the
second call passes a compound literal (alternatively, a local version
of the struct will have the same effect regarding what follows).

Now, say I compile this code with:

gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c

The einline pass will not be able to inline calls to g with such a low
value for early-inlining-insns.

The inline_param2 pass still shows:

```
h (int x)
{
  struct F D.1847;
  int _4;
  int _8;

  <bb 2>:
  _4 = g (f, x_2(D));
  dosomething (_4);
  D.1847.call = sq;
  _8 = g (D.1847, x_2(D));
  dosomething (_8);
  return;

}

```

The next tree pass is fixup_cfg4, which does the inline but just for
the second all to g:

```
h (int x)
{
  ....

  <bb 2>:
  f = f;
  f$call_7 = MEM[(struct F *)&f];
  x_19 = f$call_7 (x_2(D));
  x_20 = f$call_7 (x_19);
  x_21 = f$call_7 (x_20);
  x_22 = f$call_7 (x_21);
  x_23 = f$call_7 (x_22);
  x_24 = f$call_7 (x_23);
  x_25 = f$call_7 (x_24);
  x_26 = f$call_7 (x_25);
  _43 = x_26;
  _4 = _43;
  dosomething (_4);
  D.1847.call = sq;
  f = D.1847;
  f$call_10 = MEM[(struct F *)&f];
  _33 = x_2(D) * x_2(D);
  _45 = _33;
  x_11 = _45;
  _32 = x_11 * x_11;
  _46 = _32;
  x_12 = _46;
  _31 = x_12 * x_12;
  _47 = _31;
  x_13 = _47;
  _30 = x_13 * x_13;
  _48 = _30;
  x_14 = _48;
  _29 = x_14 * x_14;
  _49 = _29;
  x_15 = _49;
  _28 = x_15 * x_15;
  _50 = _28;
  x_16 = _50;
  _27 = x_16 * x_16;
  _51 = _27;
  x_17 = _51;
  _3 = x_17 * x_17;
  _52 = _3;
  x_18 = _52;
  _53 = x_18;
  _8 = _53;
  dosomething (_8);
  return;

}
```

Now, say I recompile the code with a larger early-inlining-insns, so
that einline is able to early inline both calls to g:

gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c

After inline_param2 (that is, before fixup_cfg4), we now have:

```
h (int x)
{
  int x;
  int x;

  <bb 2>:
  x_13 = sq (x_2(D));
  x_14 = sq (x_13);
  x_15 = sq (x_14);
  x_16 = sq (x_15);
  x_17 = sq (x_16);
  x_18 = sq (x_17);
  x_19 = sq (x_18);
  x_20 = sq (x_19);
  dosomething (x_20);
  x_5 = sq (x_2(D));
  x_6 = sq (x_5);
  x_7 = sq (x_6);
  x_8 = sq (x_7);
  x_9 = sq (x_8);
  x_10 = sq (x_9);
  x_11 = sq (x_10);
  x_12 = sq (x_11);
  dosomething (x_12);
  return;

}
```

And fixup_cfg4 is able to do its job for both calls:

```
h (int x)
{
  ....

  <bb 2>:
  _36 = x_2(D) * x_2(D);
  _37 = _36;
  x_13 = _37;
  _35 = x_13 * x_13;
  _38 = _35;
  x_14 = _38;
  _34 = x_14 * x_14;
  _39 = _34;
  x_15 = _39;
  _33 = x_15 * x_15;
  _40 = _33;
  x_16 = _40;
  _32 = x_16 * x_16;
  _41 = _32;
  x_17 = _41;
  _31 = x_17 * x_17;
  _42 = _31;
  x_18 = _42;
  _30 = x_18 * x_18;
  _43 = _30;
  x_19 = _43;
  _29 = x_19 * x_19;
  _44 = _29;
  x_20 = _44;
  dosomething (x_20);
  _28 = x_2(D) * x_2(D);
  _45 = _28;
  x_5 = _45;
  _27 = x_5 * x_5;
  _46 = _27;
  x_6 = _46;
  _26 = x_6 * x_6;
  _47 = _26;
  x_7 = _47;
  _25 = x_7 * x_7;
  _48 = _25;
  x_8 = _48;
  _24 = x_8 * x_8;
  _49 = _24;
  x_9 = _49;
  _23 = x_9 * x_9;
  _50 = _23;
  x_10 = _50;
  _22 = x_10 * x_10;
  _51 = _22;
  x_11 = _51;
  _21 = x_11 * x_11;
  _52 = _21;
  x_12 = _52;
  dosomething (x_12);
  return;

}
```

The bottom line is that I get full inlining if einline manages to
early inline both g calls, but I get incomplete inlining otherwise. I
guess the problem is that fixup_cfg4 is not able to infer that
f$call_7 is just sq in disguise when f is the global static const
struct but it is able to get it when it's a local or literal one. In
case einline expands the code early the successive passes will make
fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
matter. But if einline hits its hard limits, fixup_cfg4 will have to
figure out that f$call is sq by itself.

I'm not sure whether this should be considered a proper bug or more of
a quirk of the inlining system one must learn to live with. In the
first case, I'll report it if you ask me to do it. In the second case,
I would like to ask for some advice about the best way to cope with
this scenario (besides blindly incrementing early-inlining-insns); I
can provide more background regarding my real use case if necessary.

Cheers
--
Carlos


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]