100220 – missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2) (inlining differences)

Bug 100220 - missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2) (inlining differences)

Summary: missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2) (inl...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization, needs-bisection

Depends on:	100314
Blocks:
	Show dependency tree / graph

Reported:	2021-04-22 19:25 UTC by Zhendong Su
Modified:	2023-08-18 02:30 UTC (History)
CC List:	3 users (show)

See Also:	100314
Host:
Target:
Build:
Known to work:	13.1.0
Known to fail:
Last reconfirmed:	2021-09-25 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zhendong Su 2021-04-22 19:25:30 UTC

[638] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap --prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210422 (experimental) [master revision 3cf04d1afa8:0e51007a40c:d42088e453042f4f8ba9190a7e29efd937ea2181] (GCC) 
[639] % 
[639] % gcctk -O1 -S -o O1.s small.c
[640] % gcctk -O3 -S -o O3.s small.c
[641] % 
[641] % wc O1.s O3.s
  62  135  857 O1.s
  93  200 1337 O3.s
 155  335 2194 total
[642] % 
[642] % grep foo O1.s
[643] % grep foo O3.s
        call    foo
[644] % 
[644] % cat small.c
extern void foo(void);
int b, c, d, e, *h;
static int *f = &e;
static int a() { return 1; }
static void g() {
  if (!*f)
    for (; 1; d++)
      ;
  foo();
}
static void i() {
  int j, l = 0, k[24] = {0}, *m[2] = {&k[4], &l}, n[27];
  h = n;
  if (a() & n[0])
    for (; c; c--)
      ;
  int p[8];
  h = p;
  p[0] && (h = &j);
  e = 0;
}
static void o() {
  int *q, **r = &q, ***s[1];
  s[0] = &r;
  i();
  g();
}
int main() {
  if (b)
    o();
  return 0;
}

Comment 1 Andrew Pinski 2021-09-25 09:49:15 UTC

-O3 changes inlining just so slightly and not inlining as much any more;
    /* -O3 parameters.  */
    { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
    { OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 },
    { OPT_LEVELS_3_PLUS, OPT__param_inline_heuristics_hint_percent_, NULL, 600 },
    { OPT_LEVELS_3_PLUS, OPT__param_inline_min_speedup_, NULL, 15 },
    { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_single_, NULL, 200 },

Comment 2 Richard Biener 2022-01-11 09:57:38 UTC

Indeed similar interaction between inlining, static var const promotion and IPA CP / inline heuristics

Comment 3 Jan Hubicka 2022-01-11 10:54:44 UTC

Here the stack frame size of i is stimated to 244 bytes
void i ()
{
  int p[8];
  int n[27];
  int k[24];
  int l;
  int j;
  int _1;
  int _2;
  int _3;
  int c.2_4;
  int _5;

  <bb 2> [local count: 236223200]:
  l = 0;
  k = {};
  h = &n;
  _1 = n[0];
  _2 = _1 & 1;
  if (_2 != 0)
    goto <bb 8>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 8> [local count: 118111600]:
  goto <bb 4>; [100.00%]

  <bb 3> [local count: 955630225]:
  _3 = c.2_4 + -1;
  c = _3;

  <bb 4> [local count: 1073741824]:
  c.2_4 = c;
  if (c.2_4 != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 5> [local count: 236223200]:
  h = &p;
  _5 = p[0];
  if (_5 != 0)
    goto <bb 6>; [50.00%]
  else
    goto <bb 7>; [50.00%]

  <bb 6> [local count: 118111600]:
  h = &j;

  <bb 7> [local count: 236223200]:
  e = 0;
  j ={v} {CLOBBER};
  l ={v} {CLOBBER};
  k ={v} {CLOBBER};
  n ={v} {CLOBBER};
  p ={v} {CLOBBER};
  return;

}

so it indeed has larger arrays. k is initialized but never used (so it is missed DSE). n is used in stupid way:

  h = &n;                                                                       
  _1 = n[0];                                                                    

where h is write only static var, but we do not know that during early opts (we could try our luck and schedule one extra writeonly detection before early optimization passes, but I am not sure it is worth).

I would say that main issue is also missed DSE

Comment 4 Andrew Pinski 2023-08-18 02:30:53 UTC

Seems to be fixed in GCC 13.1.0.