[Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
hubicka at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sat Mar 21 10:25:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I can re-confirm the 16% compile time regression. I went through some compare.
$ wc -l *.ssa
299231 tramp3d-v4.ii.015t.ssa
$ wc -l ../5/*.ssa
331115 ../5/tramp3d-v4.ii.018t.ssa
so as a lame compare, we already have 10% more statements to start with.
Now einline
$ wc -l *.einline
692812 tramp3d-v4.ii.018t.einline
$ wc -l ../5/*.einline
724090 ../5/tramp3d-v4.ii.026t.einline
so after einline we seem to have 4% statements more, we do about the same
number of inlining:
$ grep Inlining tramp3d-v4.ii.*einline | wc -l
28003
$ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
28685
but at release_ssa we still have about 4% more.
$ wc -l *release_ssa*
348378 tramp3d-v4.ii.036t.release_ssa
$ wc -l ../5/*release_ssa*
365689 ../5/tramp3d-v4.ii.043t.release_ssa
There is no difference in number of functions in ssa and release_ssa dumps.
What makes the functions bigger in GCC 5?
$ grep "^ .* = " *.release_ssa | wc -l
65028
$ grep "^ .* = " ../5/*.release_ssa | wc -l
72636
The number of statements is about the same.
During the actual inlining GCC 4.9 reports:
Unit growth for small function inlining: 88536->114049 (28%)
and
Unit growth for small function inlining: 87943->97699 (11%)
Statement count seems to remain 7% in .optimized dumps. So perhaps the
slowdown is not really that much caused by IPA passes as we somehow manage to
produce more code out of C++ FE.
I looked for interesting differences in SSA dump. Here are few:
-;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8436, symbol_order=127)
+;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)
int __gthread_active_p() ()
{
- bool _1;
- int _2;
+ static void * const __gthread_active_ptr = (void *) __gthrw_pthread_cancel;
+ void * __gthread_active_ptr.111_2;
+ bool _3;
+ int _4;
<bb 2>:
- _1 = __gthrw_pthread_cancel != 0B;
- _2 = (int) _1;
- return _2;
+ __gthread_active_ptr.111_2 = __gthread_active_ptr;
+ _3 = __gthread_active_ptr.111_2 != 0B;
+ _4 = (int) _3;
+ return _4;
}
... this looks like header change, perhaps ...
ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
{
- int _6;
+ int (*__vtbl_ptr_type) () * _2;
+ int _7;
<bb 2>:
- this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
- *this_3(D) ={v} {CLOBBER};
- _6 = 0;
- if (_6 != 0)
+ _2 = &_ZTV13ObserverEvent + 16;
+ this_4(D)->_vptr.ObserverEvent = _2;
+ MEM[(struct &)this_4(D)] ={v} {CLOBBER};
+ _7 = 0;
+ if (_7 != 0)
... extra temporary initializing vtbl pointer. This is repeated many times ...
-;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51649, symbol_order=884)
+;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)
static Unique::Value_t Unique::get() ()
{
Value_t retval;
- long int next_s.83_2;
- long int next_s.84_3;
- long int next_s.85_4;
- Value_t _7;
+ long int next_s.83_3;
+ long int next_s.84_4;
+ long int next_s.85_5;
+ Value_t _9;
<bb 2>:
- Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
- next_s.83_2 = next_s;
- next_s.84_3 = next_s.83_2;
- next_s.85_4 = next_s.84_3 + 1;
- next_s = next_s.85_4;
- retval_6 = next_s.84_3;
- Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
- _7 = retval_6;
- return _7;
+ Pooma::DummyMutex::lock (&mutex_s);
+ next_s.83_3 = next_s;
+ next_s.84_4 = next_s.83_3;
+ next_s.85_5 = next_s.84_4 + 1;
+ next_s = next_s.85_5;
+ retval_7 = next_s.84_4;
+ Pooma::DummyMutex::unlock (&mutex_s);
+ _9 = retval_7;
+ return _9;
}
... here we give up on ISRA....
and we have about twice as much EH:
$ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
4816
$ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
8671
which however is optimized out at a time of release_ssa.
Another thing that we may consider to cleanup in next stage1 is to get rid of
dead stores:
- MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
- D.561702 ={v} {CLOBBER};
- D.561702 ={v} {CLOBBER};
- MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
- MEM[(struct allocator *)_2] ={v} {CLOBBER};
- MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
- MEM[(struct basic_string *)_2] ={v} {CLOBBER};
- *_2 ={v} {CLOBBER};
- *this_1(D) ={v} {CLOBBER};
+ MEM[(struct &)&D.570046] ={v} {CLOBBER};
+ MEM[(struct &)&D.570046] ={v} {CLOBBER};
+ D.570046 ={v} {CLOBBER};
+ MEM[(struct &)_2] ={v} {CLOBBER};
+ MEM[(struct &)_2] ={v} {CLOBBER};
+ MEM[(struct &)_2] ={v} {CLOBBER};
+ MEM[(struct &)_2] ={v} {CLOBBER};
+ MEM[(struct &)_2] ={v} {CLOBBER};
+ MEM[(struct &)this_1(D)] ={v} {CLOBBER};
Clobbers are dangerously common. There are 18K clobbers in release_ssa dump out
of 65K assignments, that makes them to be 29% of all the code. The number of
clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup dump and we
still get a lot of redundancies:
<bb 32>:
D.581063 ={v} {CLOBBER};
D.581063 ={v} {CLOBBER};
D.164155 ={v} {CLOBBER};
D.164155 ={v} {CLOBBER};
operator delete [] (begbuf_18);
Why those are not considered a dead stores and DCEed out earlier?
More information about the Gcc-bugs
mailing list