[Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Mar 24 14:56:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #10)
> I can re-confirm the 16% compile time regression. I went through some
> compare.
>
> $ wc -l *.ssa
> 299231 tramp3d-v4.ii.015t.ssa
> $ wc -l ../5/*.ssa
> 331115 ../5/tramp3d-v4.ii.018t.ssa
>
> so as a lame compare, we already have 10% more statements to start with.
> Now einline
>
> $ wc -l *.einline
> 692812 tramp3d-v4.ii.018t.einline
> $ wc -l ../5/*.einline
> 724090 ../5/tramp3d-v4.ii.026t.einline
>
> so after einline we seem to have 4% statements more, we do about the same
> number of inlining:
>
> $ grep Inlining tramp3d-v4.ii.*einline | wc -l
> 28003
> $ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
> 28685
>
> but at release_ssa we still have about 4% more.
>
> $ wc -l *release_ssa*
> 348378 tramp3d-v4.ii.036t.release_ssa
> $ wc -l ../5/*release_ssa*
> 365689 ../5/tramp3d-v4.ii.043t.release_ssa
>
> There is no difference in number of functions in ssa and release_ssa dumps.
> What makes the functions bigger in GCC 5?
>
> $ grep "^ .* = " *.release_ssa | wc -l
> 65028
> $ grep "^ .* = " ../5/*.release_ssa | wc -l
> 72636
>
> The number of statements is about the same.
>
> During the actual inlining GCC 4.9 reports:
> Unit growth for small function inlining: 88536->114049 (28%)
> and
> Unit growth for small function inlining: 87943->97699 (11%)
>
> Statement count seems to remain 7% in .optimized dumps. So perhaps the
> slowdown is not really that much caused by IPA passes as we somehow manage
> to produce more code out of C++ FE.
>
> I looked for interesting differences in SSA dump. Here are few:
>
> -;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8436, symbol_order=127)
> +;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)
>
> int __gthread_active_p() ()
> {
> - bool _1;
> - int _2;
> + static void * const __gthread_active_ptr = (void *)
> __gthrw_pthread_cancel;
> + void * __gthread_active_ptr.111_2;
> + bool _3;
> + int _4;
>
> <bb 2>:
> - _1 = __gthrw_pthread_cancel != 0B;
> - _2 = (int) _1;
> - return _2;
> + __gthread_active_ptr.111_2 = __gthread_active_ptr;
> + _3 = __gthread_active_ptr.111_2 != 0B;
> + _4 = (int) _3;
> + return _4;
>
> }
>
> ... this looks like header change, perhaps ...
Yep. __gthrw_pthread_cancel is a function pointer (thsu constant)
while __gthread_active_ptr is a global variable.
> ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
> {
> - int _6;
> + int (*__vtbl_ptr_type) () * _2;
> + int _7;
>
> <bb 2>:
> - this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
> - *this_3(D) ={v} {CLOBBER};
> - _6 = 0;
> - if (_6 != 0)
> + _2 = &_ZTV13ObserverEvent + 16;
> + this_4(D)->_vptr.ObserverEvent = _2;
> + MEM[(struct &)this_4(D)] ={v} {CLOBBER};
> + _7 = 0;
> + if (_7 != 0)
>
> ... extra temporary initializing vtbl pointer. This is repeated many times
> ...
This is because of
2015-03-20 Richard Biener <rguenther@suse.de>
PR middle-end/64715
...
* gimplify.c (gimplify_expr): Remove premature folding of
&X + CST to &MEM[&X, CST].
thus relatively recent. It will be fixed up by ccp1.
> -;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51649, symbol_order=884)
> +;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)
>
> static Unique::Value_t Unique::get() ()
> {
> Value_t retval;
> - long int next_s.83_2;
> - long int next_s.84_3;
> - long int next_s.85_4;
> - Value_t _7;
> + long int next_s.83_3;
> + long int next_s.84_4;
> + long int next_s.85_5;
> + Value_t _9;
>
> <bb 2>:
> - Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
> - next_s.83_2 = next_s;
> - next_s.84_3 = next_s.83_2;
> - next_s.85_4 = next_s.84_3 + 1;
> - next_s = next_s.85_4;
> - retval_6 = next_s.84_3;
> - Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
> - _7 = retval_6;
> - return _7;
> + Pooma::DummyMutex::lock (&mutex_s);
> + next_s.83_3 = next_s;
> + next_s.84_4 = next_s.83_3;
> + next_s.85_5 = next_s.84_4 + 1;
> + next_s = next_s.85_5;
> + retval_7 = next_s.84_4;
> + Pooma::DummyMutex::unlock (&mutex_s);
> + _9 = retval_7;
> + return _9;
>
> }
>
> ... here we give up on ISRA....
I believe because of
2015-02-13 Ilya Enkovich <ilya.enkovich@intel.com>
PR tree-optimization/65002
* tree-cfg.c (pass_data_fixup_cfg): Don't update
SSA on start.
* tree-sra.c (some_callers_have_no_vuse_p): New.
(ipa_early_sra): Reject functions whose callers
assume function is read only.
or related changes.
> and we have about twice as much EH:
>
> $ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
> 4816
> $ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
> 8671
>
> which however is optimized out at a time of release_ssa.
That's maybe because we emit more CLOBBERs initially (do we?)
> Another thing that we may consider to cleanup in next stage1 is to get rid
> of dead stores:
>
> - MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
> - D.561702 ={v} {CLOBBER};
> - D.561702 ={v} {CLOBBER};
> - MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
> - MEM[(struct allocator *)_2] ={v} {CLOBBER};
> - MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
> - MEM[(struct basic_string *)_2] ={v} {CLOBBER};
> - *_2 ={v} {CLOBBER};
> - *this_1(D) ={v} {CLOBBER};
> + MEM[(struct &)&D.570046] ={v} {CLOBBER};
> + MEM[(struct &)&D.570046] ={v} {CLOBBER};
> + D.570046 ={v} {CLOBBER};
> + MEM[(struct &)_2] ={v} {CLOBBER};
> + MEM[(struct &)_2] ={v} {CLOBBER};
> + MEM[(struct &)_2] ={v} {CLOBBER};
> + MEM[(struct &)_2] ={v} {CLOBBER};
> + MEM[(struct &)_2] ={v} {CLOBBER};
> + MEM[(struct &)this_1(D)] ={v} {CLOBBER};
>
> Clobbers are dangerously common. There are 18K clobbers in release_ssa dump
> out of 65K assignments, that makes them to be 29% of all the code. The
> number of clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup
> dump and we still get a lot of redundancies:
Yeah, well ... :/ I've already taught DCE to get rid of the really useless
ones...
> <bb 32>:
>
> D.581063 ={v} {CLOBBER};
>
> D.581063 ={v} {CLOBBER};
>
> D.164155 ={v} {CLOBBER};
>
> D.164155 ={v} {CLOBBER};
>
> operator delete [] (begbuf_18);
>
>
> Why those are not considered a dead stores and DCEed out earlier?
dead clobbers you mean? Well, they are only "dead" if there are no
uses/defs of its LHS dominating them.
More information about the Gcc-bugs
mailing list