This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Early inlining and function references from static const struct (bug?)
- From: Carlos Pita <carlosjosepita at gmail dot com>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 4 Feb 2016 15:05:02 -0300
- Subject: Re: Early inlining and function references from static const struct (bug?)
- Authentication-results: sourceware.org; auth=none
- References: <CAELgYhepY2Vnjt8-ExmY-9jtw=4Q6E6=SrFYgX85QmT5Aqnc=g at mail dot gmail dot com>
PS: I framed the issue between the inline_param and fixup_cfg passes
because I was only looking at the tree passes, but the really relevant
passes are tree-einline (obviously) and ipa-inline, which happens
between tree-inline_param2 and tree-fixup_cfg2. So, restating the
problem: if early inline is not happening, late inline will miss the
chance to inline the reference from the static const struct member.
On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosepita@gmail.com> wrote:
> Hi all,
>
> I've been trying to understand some bizarre interaction between
> optimizing passes I've observed while compiling a heavily nested
> inlined numerical code of mine. I managed to reduce the issue down to
> this simple code:
>
> ``` test.c
>
> typedef struct F {
> int (*call)(int);
> } F;
>
> static int g(F f, int x) {
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> x = f.call(x);
> return x;
> }
>
> static int sq(int x) {
> return x * x;
> }
>
> static const F f = {sq};
>
> void dosomething(int);
>
> int h(int x) {
> dosomething(g(f, x));
> dosomething(g((F){sq}, x));
> }
>
> ```
>
> Here we have a driver function h calling the workhorse g which
> delegates some simple task to the inline-wannabe f. The distinctive
> aspect of the above scheme is that f is referenced from a struct
> member. The first call to g passes a static const struct while the
> second call passes a compound literal (alternatively, a local version
> of the struct will have the same effect regarding what follows).
>
> Now, say I compile this code with:
>
> gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c
>
> The einline pass will not be able to inline calls to g with such a low
> value for early-inlining-insns.
>
> The inline_param2 pass still shows:
>
> ```
> h (int x)
> {
> struct F D.1847;
> int _4;
> int _8;
>
> <bb 2>:
> _4 = g (f, x_2(D));
> dosomething (_4);
> D.1847.call = sq;
> _8 = g (D.1847, x_2(D));
> dosomething (_8);
> return;
>
> }
>
> ```
>
> The next tree pass is fixup_cfg4, which does the inline but just for
> the second all to g:
>
> ```
> h (int x)
> {
> ....
>
> <bb 2>:
> f = f;
> f$call_7 = MEM[(struct F *)&f];
> x_19 = f$call_7 (x_2(D));
> x_20 = f$call_7 (x_19);
> x_21 = f$call_7 (x_20);
> x_22 = f$call_7 (x_21);
> x_23 = f$call_7 (x_22);
> x_24 = f$call_7 (x_23);
> x_25 = f$call_7 (x_24);
> x_26 = f$call_7 (x_25);
> _43 = x_26;
> _4 = _43;
> dosomething (_4);
> D.1847.call = sq;
> f = D.1847;
> f$call_10 = MEM[(struct F *)&f];
> _33 = x_2(D) * x_2(D);
> _45 = _33;
> x_11 = _45;
> _32 = x_11 * x_11;
> _46 = _32;
> x_12 = _46;
> _31 = x_12 * x_12;
> _47 = _31;
> x_13 = _47;
> _30 = x_13 * x_13;
> _48 = _30;
> x_14 = _48;
> _29 = x_14 * x_14;
> _49 = _29;
> x_15 = _49;
> _28 = x_15 * x_15;
> _50 = _28;
> x_16 = _50;
> _27 = x_16 * x_16;
> _51 = _27;
> x_17 = _51;
> _3 = x_17 * x_17;
> _52 = _3;
> x_18 = _52;
> _53 = x_18;
> _8 = _53;
> dosomething (_8);
> return;
>
> }
> ```
>
> Now, say I recompile the code with a larger early-inlining-insns, so
> that einline is able to early inline both calls to g:
>
> gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c
>
> After inline_param2 (that is, before fixup_cfg4), we now have:
>
> ```
> h (int x)
> {
> int x;
> int x;
>
> <bb 2>:
> x_13 = sq (x_2(D));
> x_14 = sq (x_13);
> x_15 = sq (x_14);
> x_16 = sq (x_15);
> x_17 = sq (x_16);
> x_18 = sq (x_17);
> x_19 = sq (x_18);
> x_20 = sq (x_19);
> dosomething (x_20);
> x_5 = sq (x_2(D));
> x_6 = sq (x_5);
> x_7 = sq (x_6);
> x_8 = sq (x_7);
> x_9 = sq (x_8);
> x_10 = sq (x_9);
> x_11 = sq (x_10);
> x_12 = sq (x_11);
> dosomething (x_12);
> return;
>
> }
> ```
>
> And fixup_cfg4 is able to do its job for both calls:
>
> ```
> h (int x)
> {
> ....
>
> <bb 2>:
> _36 = x_2(D) * x_2(D);
> _37 = _36;
> x_13 = _37;
> _35 = x_13 * x_13;
> _38 = _35;
> x_14 = _38;
> _34 = x_14 * x_14;
> _39 = _34;
> x_15 = _39;
> _33 = x_15 * x_15;
> _40 = _33;
> x_16 = _40;
> _32 = x_16 * x_16;
> _41 = _32;
> x_17 = _41;
> _31 = x_17 * x_17;
> _42 = _31;
> x_18 = _42;
> _30 = x_18 * x_18;
> _43 = _30;
> x_19 = _43;
> _29 = x_19 * x_19;
> _44 = _29;
> x_20 = _44;
> dosomething (x_20);
> _28 = x_2(D) * x_2(D);
> _45 = _28;
> x_5 = _45;
> _27 = x_5 * x_5;
> _46 = _27;
> x_6 = _46;
> _26 = x_6 * x_6;
> _47 = _26;
> x_7 = _47;
> _25 = x_7 * x_7;
> _48 = _25;
> x_8 = _48;
> _24 = x_8 * x_8;
> _49 = _24;
> x_9 = _49;
> _23 = x_9 * x_9;
> _50 = _23;
> x_10 = _50;
> _22 = x_10 * x_10;
> _51 = _22;
> x_11 = _51;
> _21 = x_11 * x_11;
> _52 = _21;
> x_12 = _52;
> dosomething (x_12);
> return;
>
> }
> ```
>
> The bottom line is that I get full inlining if einline manages to
> early inline both g calls, but I get incomplete inlining otherwise. I
> guess the problem is that fixup_cfg4 is not able to infer that
> f$call_7 is just sq in disguise when f is the global static const
> struct but it is able to get it when it's a local or literal one. In
> case einline expands the code early the successive passes will make
> fixup_cfg4 see just sq in both cases, making inlining of sq a trivial
> matter. But if einline hits its hard limits, fixup_cfg4 will have to
> figure out that f$call is sq by itself.
>
> I'm not sure whether this should be considered a proper bug or more of
> a quirk of the inlining system one must learn to live with. In the
> first case, I'll report it if you ask me to do it. In the second case,
> I would like to ask for some advice about the best way to cope with
> this scenario (besides blindly incrementing early-inlining-insns); I
> can provide more background regarding my real use case if necessary.
>
> Cheers
> --
> Carlos