Created attachment 28783 [details] test case that fails in trunk but passed with Maxim's devirt patches trunk currently fails test #4 here (also attached): http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02589.html $ g++-4.8 -O2 inline-devirt-4.C && objdump -d -Mintel a.out | less -p main\> 0000000000400630 <main>: 400630: 48 83 ec 18 sub rsp,0x18 400634: 48 89 e7 mov rdi,rsp 400637: 48 c7 04 24 d0 09 40 mov QWORD PTR [rsp],0x4009d0 40063e: 00 40063f: c6 44 24 0a 00 mov BYTE PTR [rsp+0xa],0x0 400644: c6 44 24 08 61 mov BYTE PTR [rsp+0x8],0x61 400649: c6 44 24 09 62 mov BYTE PTR [rsp+0x9],0x62 40064e: e8 2d 01 00 00 call 400780 <_Z12print_lengthRK6String> 400653: 31 c0 xor eax,eax 400655: 48 83 c4 18 add rsp,0x18 400659: c3 ret 40065a: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0] It does inline the FixedString ctor, but not print_length(const String&), nor does it even inline FixedString::length() into print_length(const String&). I tried -O3 and -Ofast along with various inline --params, but to no avail.
Hmm, for some reason inliner do not see the devirt_benefit path at all in this testcase....
OK, the inliner recognizes that the call is going to be inlined. At -O2 this is however ignored because the caller is too large and not declared inline. I am not sure if we really want to enforce inlining in this case At -O3 this is ignored because the function is called just once and thus call is cold. This is also sort of correct, but a testcase that will do the same in a large loop will suffer from the same problem. I guess we should introduce the notion of heavy calls like open64 does and inline in such case (i.e. inline into functions called once when some of inline hints are met). I will look into this.
Created attachment 28792 [details] patch This is patch I am going to test. It copies open64' notion of heavy calls - i.e. the calls that are not hot, but inlining is still importnat because callee takes considerable time. We had few examples where this is profitable thing, so I guess it is good idea to implement this. The heaviness should be propagated by ipa-profile that is the bit I did not implement for simplicity, so we now consider pretty much every nonleaf function or function with loop as possibly heavy.
With the patch attached (or with main() renamed to something else) I now get at -O3 int main() () { struct FixedString empty; int _16; const char pretmp_18; int _20; char _24; char _25; const char pretmp_26; <bb 2>: MEM[(char &)&empty + 8] = 97; MEM[(char &)&empty + 9] = 98; _24 = empty.contents[0]; _16 = (int) _24; printf ("%d\n", _16); _25 = empty.contents[1]; _20 = (int) _25; printf ("%d\n", _20); empty ={v} {CLOBBER}; return 0; } seems like inconsistency in SRA is causing missing constant porpagation here. 97 and 98 should be propagated into the load. Richard/Martin, can you take a look, please? Also notice the stale pretmp declarations.
Well, the loop isn't unrolled until late cunroll after which there is no SRA / FRE to fix things up. No, DOM doesn't get it by implementation limitation. As for the PRE issue - we indeed seem to not release SSA temporaries created by PRE. I'll look into that.
Hmm, OK. I wonder how google's branch handles this correctly...
I'll rewrite the test to add a loop that hopefully triggers it as hot at -O3 (and gets unrolled). shouldn't it inline at -O2 since DCE would eliminate the function body and make it an overall win?
(In reply to comment #5) > As for the PRE issue - we indeed seem to not release SSA temporaries > created by PRE. I'll look into that. Fixed with 2012-12-13 Richard Biener <rguenther@suse.de> * tree-ssa-pre.c (get_representative_for): Adjust dumping. Mark created SSA names for release. (eliminate_bb): Insert only when expr is not NULL.
Just tested with latest trunk and it passes at -O3. With Maxim's previous devirt patches, it passed at -O2. That being said, I'm happy and this can be marked as RESOLVED.
Note a testcase should still be added to the testsuite.
Attached is an updated version of the tests Maxim committed to the google/4_7 branch. The only difference is that more of the tests are xfail'd than in the older google branch. We could unset the xfail for inline-devirt-4.C if we set it to -O3, but I'd rather track the fact the test hasn't passed at -O2 since the patches Maxim proposed ~2 years ago due to various (possibly nested) regressions in other areas.
print_length is not really a trivial wrapper: void print_length (const String& string) { for (uint64_t i = 0; i < string.length(); i++) { printf("%d\n", string.get(i)); } } and the fact that it disappears depends on 1) ability to inline it 2) ability to devirtualize length and get calls 3) ability to eliminate the loop since length is small. Inlined can anticiapte 1-2 to happen, but not 3 and with -O2 it decides that code size would grow. For 3 one would need return functions to start with to determine the constant bound. So we can get it better only once Martin gets to it ;)
passes on current trunk, modulo the lack of DCE as tracked by another bug.