This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] var-tracking.c speedups


Hi!

Still on the http://bugzilla.redhat.com/503816 testcase I've run callgrind
(valgrind --tool=callgrind --collect-atstart=no --toggle-collect=variable_tracking_main ./cc1plus ...
callgrind_annotate --auto=yes --inclusive=yes callgrind*.out*) and the
following patch fixes some low hanging fruit discovered by it.
Overall the patch (still on 4.4 branch, x86_64-linux -> s390x-linux cross)
results in (real/user/sys):
cc1plus.vanilla -g -O2 -fno-var-tracking	3m30.618s	3m29.638s	0m0.947s
cc1plus.vanilla -g -O2				6m49.722s	6m47.254s	0m2.374s
cc1plus.patched -g -O2				5m50.255s	5m48.398s	0m1.787s
(middle from 3 runs).  cc1plus.vanilla here is branches/gcc-4_4-branch with
r148760 backported to it, and cc1plus.patched is the same with the following
patch on top of it.  So this patch saves approx. 15% of runtime, or 30% of
vartrack pass runtime.

The fixes are:
1) variable_union has been totally unnecessarily calling 178999434x xcalloc,
   qsort and free.  This is in the loc_chain sorting code.  I've
   instrumented trunk bootstrap (x86_64-linux and i686-linux) to track how
   frequent different values of dst_l and src_l are (x,y) stands for
   dst_l == x, src_l == y, sum stands for dst_l + src_l.  The first
   line is gathered from the trunk bootstraps, the second one from the
   above mentioned s390x testcase:
(1,1)	 (2,2)	 sum==3	sum<=4 sum<=8 sum<=16 sum<=32 sum>32
10072567 2698962 706337 27977  382251 2492    0       0
89673364 89268534  1400    89    2397 53650   0       0
   Turns out that the most often combinations can be handled without
   qsort, fairly cheaply.  For dst_l == 1 we can just put in the
   dst single entry followed by the src entries in order, except possibly
   if some src entry is the same as dst single entry we just skip it.
   dst_l == 2 can be handled easily as well.  For dst_l == 1 we
   can avoid any kind of temporary buffer, for other combinations
   we can just use a fixed size static buffer with a dynamically realloced
   buffer just to be sure (and free it only at the end of the pass).
2) I doubt anybody uses -fvar-tracking-uninit, it certainly uses a very
   weird non-DWARFy DW_GNU_uninit extension which isn't clearly defined
   and GDB only barely supports it, just in a few places.  Still, this
   code was causing a quadratic loop in variable_union which was showing up
   very high in the profile.  Fixed by making sure ->init is always
   *_INITIALIZED unless -fvar-tracking-uninit and avoiding that loop.
3) dataflow_set_different_2 was just wasting cycles, we've checked at the
   beginning that both hashtables have the same number of elements, and
   if traversal of one of the hashtables found all decls from that table
   in the other table (and their variable objects were considered equal),
   then the same is necessarily true for the other hash table.
4) in sel-sched merge, rtx_equal_p was moved to very infrequently used
   rtx_equal_p_cb with added callback and rtx_equal_p implemented on top
   of that new function.  Given that rtx_equal_p was showing very high
   in the callgrind profile and is very frequently used throughout gcc
   rtl passes (400+ different callers), isn't too long to be a maintainance
   burden and is recursive, so we can't get rid of the overhead of passing
   cb around and checking at each level, I'm proposing returning to the
   old rtx_equal_p function and have rtx_equal_p_cb as completely separate
   function.

Patch has been bootstrapped/regtested on the trunk on x86_64-linux and
i686-linux, on 4.4 branch and above mentioned testcase has been verified
to generate bitwise identical assembly.

Attached is the patch, then callgrind_annotate dump before this
patch and after it, for anyone interested to suggest further low hanging
fruit.

Ok for trunk?

	Jakub

Attachment: Y342
Description: Text document

Attachment: cg.rh09
Description: Text document

Attachment: cg.rh15
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]