Bug 55478 - [devirt] trunk fails inline-devirt test #4
Summary: [devirt] trunk fails inline-devirt test #4
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-26 22:34 UTC by Matt Hargett
Modified: 2014-01-28 01:16 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-11-26 00:00:00


Attachments
test case that fails in trunk but passed with Maxim's devirt patches (545 bytes, application/octet-stream)
2012-11-26 22:34 UTC, Matt Hargett
Details
patch (1.06 KB, patch)
2012-11-27 10:55 UTC, Jan Hubicka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Hargett 2012-11-26 22:34:51 UTC
Created attachment 28783 [details]
test case that fails in trunk but passed with Maxim's devirt patches

trunk currently fails test #4 here (also attached):
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02589.html

$ g++-4.8 -O2 inline-devirt-4.C && objdump -d -Mintel a.out | less -p main\>

0000000000400630 <main>:
  400630:       48 83 ec 18             sub    rsp,0x18
  400634:       48 89 e7                mov    rdi,rsp
  400637:       48 c7 04 24 d0 09 40    mov    QWORD PTR [rsp],0x4009d0
  40063e:       00 
  40063f:       c6 44 24 0a 00          mov    BYTE PTR [rsp+0xa],0x0
  400644:       c6 44 24 08 61          mov    BYTE PTR [rsp+0x8],0x61
  400649:       c6 44 24 09 62          mov    BYTE PTR [rsp+0x9],0x62
  40064e:       e8 2d 01 00 00          call   400780 <_Z12print_lengthRK6String>
  400653:       31 c0                   xor    eax,eax
  400655:       48 83 c4 18             add    rsp,0x18
  400659:       c3                      ret    
  40065a:       66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]

It does inline the FixedString ctor, but not print_length(const String&), nor does it even inline FixedString::length() into print_length(const String&). I tried -O3 and -Ofast along with various inline --params, but to no avail.
Comment 1 Jan Hubicka 2012-11-26 23:03:25 UTC
Hmm, for some reason inliner do not see the devirt_benefit path at all in this testcase....
Comment 2 Jan Hubicka 2012-11-27 00:00:52 UTC
OK,
the inliner recognizes that the call is going to be inlined.

At -O2 this is however ignored because the caller is too large and not declared inline. I am not sure if we really want to enforce inlining in this case

At -O3 this is ignored because the function is called just once and thus call is cold.  This is also sort of correct, but a testcase that will do the same in a large loop will suffer from the same problem.  I guess we should introduce the notion of heavy calls like open64 does and inline in such case (i.e. inline into functions called once when some of inline hints are met).  I will look into this.
Comment 3 Jan Hubicka 2012-11-27 10:55:04 UTC
Created attachment 28792 [details]
patch

This is patch I am going to test.  It copies open64' notion of heavy calls - i.e. the calls that are not hot, but inlining is still importnat because callee takes considerable time. We had few examples where this is profitable thing, so I guess it is good idea to implement this.  The heaviness should be propagated by ipa-profile that is the bit I did not implement for simplicity, so we now consider pretty much every nonleaf function or function with loop as possibly heavy.
Comment 4 Jan Hubicka 2012-11-27 10:57:04 UTC
With the patch attached (or with main() renamed to something else) I now get at -O3

int main() ()
{
  struct FixedString empty;
  int _16;
  const char pretmp_18;
  int _20;
  char _24;
  char _25;
  const char pretmp_26;

  <bb 2>:
  MEM[(char &)&empty + 8] = 97;
  MEM[(char &)&empty + 9] = 98;
  _24 = empty.contents[0];
  _16 = (int) _24;
  printf ("%d\n", _16);
  _25 = empty.contents[1];
  _20 = (int) _25;
  printf ("%d\n", _20);
  empty ={v} {CLOBBER};
  return 0;

}

seems like inconsistency in SRA is causing missing constant porpagation here.
97 and 98 should be propagated into the load.
Richard/Martin, can you take a look, please?
Also notice the stale pretmp declarations.
Comment 5 Richard Biener 2012-11-27 13:39:57 UTC
Well, the loop isn't unrolled until late cunroll after which there is no
SRA / FRE to fix things up.  No, DOM doesn't get it by implementation
limitation.

As for the PRE issue - we indeed seem to not release SSA temporaries
created by PRE.  I'll look into that.
Comment 6 Jan Hubicka 2012-11-27 14:35:39 UTC
Hmm, OK.  I wonder how google's branch handles this correctly...
Comment 7 Matt Hargett 2012-11-27 17:32:01 UTC
I'll rewrite the test to add a loop that hopefully triggers it as hot at -O3 (and gets unrolled).  shouldn't it inline at -O2 since DCE would eliminate the function body and make it an overall win?
Comment 8 Richard Biener 2012-12-13 10:46:00 UTC
(In reply to comment #5)
> As for the PRE issue - we indeed seem to not release SSA temporaries
> created by PRE.  I'll look into that.

Fixed with

2012-12-13  Richard Biener  <rguenther@suse.de>

        * tree-ssa-pre.c (get_representative_for): Adjust dumping.
        Mark created SSA names for release.
        (eliminate_bb): Insert only when expr is not NULL.
Comment 9 Matt Hargett 2013-02-11 02:02:46 UTC
Just tested with latest trunk and it passes at -O3. With Maxim's previous devirt patches, it passed at -O2. That being said, I'm happy and this can be marked as RESOLVED.
Comment 10 Richard Biener 2013-02-11 08:45:31 UTC
Note a testcase should still be added to the testsuite.
Comment 11 Matt Hargett 2013-02-14 21:27:54 UTC
Attached is an updated version of the tests Maxim committed to the google/4_7 branch. The only difference is that more of the tests are xfail'd than in the older google branch.

We could unset the xfail for inline-devirt-4.C if we set it to -O3, but I'd rather track the fact the test hasn't passed at -O2 since the patches Maxim proposed ~2 years ago due to various (possibly nested) regressions in other areas.
Comment 12 Jan Hubicka 2014-01-17 00:42:24 UTC
print_length is not really a trivial wrapper:
void print_length (const String& string)
{
  for (uint64_t i = 0; i < string.length(); i++)
    {
      printf("%d\n", string.get(i));
    }
}

and the fact that it disappears depends on
 1) ability to inline it
 2) ability to devirtualize length and get calls
 3) ability to eliminate the loop since length is small.

Inlined can anticiapte 1-2 to happen, but not 3 and with -O2 it decides that code size would grow. 

For 3 one would need return functions to start with to determine the constant bound. So we can get it better only once Martin gets to it ;)
Comment 13 Matt Hargett 2014-01-28 01:16:23 UTC
passes on current trunk, modulo the lack of DCE as tracked by another bug.