Bug 46554 - Less inlining leads to CSiBE regression
Summary: Less inlining leads to CSiBE regression
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-19 08:24 UTC by Jan Hubicka
Modified: 2010-11-19 10:57 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-11-19 10:52:53


Attachments
testcase flex-2.5.31/regex.c (17.34 KB, application/octet-stream)
2010-11-19 08:24 UTC, Jan Hubicka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Hubicka 2010-11-19 08:24:34 UTC
Created attachment 22451 [details]
testcase flex-2.5.31/regex.c

The loss here is not inlining regmatch_len. The catch is that the test if (m ==
((void *)0) || m->rm_so < 0) is tested before all uses of regmatch_len and thus
optimized out.  So it simplifies into m->rm_so < 0 test and arithmetic that
ends up being cheaper than call.

int regmatch_len (regmatch_t * m)
{
 if (m == ((void *)0) || m->rm_so < 0) {
  return 0;
 }

 return m->rm_eo - m->rm_so;
}

It is used as:

 if (m == ((void *)0) || m->rm_so < 0)
  return 0;

 if (regmatch_len (m) < 20)
  s = regmatch_cpy (m, buf, src);
 else
  s = regmatch_dup (m, src);

Tricky.  Inliner sees it as:

Analyzing function body size: regmatch_len
  freq:  1000 size:  2 time:  2 if (m_2(D) == 0B)
  freq:   898 size:  1 time:  1 D.7268_3 = m_2(D)->rm_so;
    50% will be eliminated by inlining
  freq:   898 size:  2 time:  2 if (D.7268_3 < 0)
  freq:   726 size:  1 time:  1 D.7270_4 = m_2(D)->rm_eo;
    50% will be eliminated by inlining
  freq:   726 size:  1 time:  1 D.7268_5 = m_2(D)->rm_so;
    50% will be eliminated by inlining
  freq:   726 size:  1 time:  1 D.7269_6 = D.7270_4 - D.7268_5;
  freq:  1000 size:  1 time:  2 return D.7269_1;
    will eliminated by inlining
Overall function body time: 9-3 size: 11-5
With function call overhead time: 9-15 size: 11-8

I can imagine we can try to get summary based on value ranges, instead of known
constants, do early VRP and work out first test well.

Even optimizing the first conditoinal away won't make it inlined, it will be
still considered to have size 9, so code will be expected to grow by 1 byte.
Optimizing second conditoinal is even trickier.

The code can be optimized away by IP-value range propagation that would
be interesting optimization to have...
Comment 1 Richard Biener 2010-11-19 10:52:53 UTC
I thought partial inlining would maybe fix this?  Otherwise it's really a
case that needs IP analysis.
Comment 2 Jan Hubicka 2010-11-19 10:57:57 UTC
> I thought partial inlining would maybe fix this?  Otherwise it's really a
> case that needs IP analysis.

Not with -Os, we really know that it will optimize away.

Honza