Bug 77580 - Improve devirtualization
Summary: Improve devirtualization
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2016-09-13 14:47 UTC by Wilco
Modified: 2024-08-14 19:27 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-09-13 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wilco 2016-09-13 14:47:39 UTC
A commonly used benchmark contains a hot loop which calls one of 2 virtual functions via a static variable which is set just before. A reduced example is:

int f1(int x) { return x + 1; }
int f2(int x) { return x + 2; }

static int (*virt)(int);

int f(int *p, int n, int x)
{
  int i;
  if (x > 10)
    virt = f1;
  else
    virt = f2;
  for (i = 0; i < n; i++)
    p[i] = virt (i);
}

This is obviously very stupid code, however GCC could do better. virt is not address-taken, neither f1 nor f2 make a call, so virt is effectively a local variable. So the loop could be transformed to p[i] = (x > 10) ? f1 (i) : f2 (i); to enable inlining after which the if can be lifted:

int f_opt(int *p, int n, int x)
{
  int i;
  if (x > 10)
    virt = f1;
  else
    virt = f2;
  if (x > 10)
    for (i = 0; i < n; i++)
      p[i] = f1 (i);
  else
    for (i = 0; i < n; i++)
      p[i] = f2 (i);
}
Comment 1 Andrew Pinski 2016-09-13 14:54:10 UTC
To some extend this is loop versioning.  There is another bug where we don't need the versioning and GCC does not devirtualization the loop but I can't find it right now; this bug should depend on that bug.

Anyways confirmed.
Comment 2 Richard Biener 2016-09-14 10:51:00 UTC
Note that for inlining to kick in this all has to be done during early optimization which is somewhat against its intent (perform transforms
reducing code-size).  Putting x > 10 ? f1 : f2 into the loop is also
against any reasonable heuristics (move invariant stuff outside).

Thus it's going to be tricky overall to teach GCC to handle this situation.

Can you share preprocessed source of that "commonly used benchmark" (or name
it at least?)
Comment 3 Andrew Pinski 2016-09-14 10:53:47 UTC
I think this shows up in h264 code in spec2006.