This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi! Still on the http://bugzilla.redhat.com/503816 testcase I've run callgrind (valgrind --tool=callgrind --collect-atstart=no --toggle-collect=variable_tracking_main ./cc1plus ... callgrind_annotate --auto=yes --inclusive=yes callgrind*.out*) and the following patch fixes some low hanging fruit discovered by it. Overall the patch (still on 4.4 branch, x86_64-linux -> s390x-linux cross) results in (real/user/sys): cc1plus.vanilla -g -O2 -fno-var-tracking 3m30.618s 3m29.638s 0m0.947s cc1plus.vanilla -g -O2 6m49.722s 6m47.254s 0m2.374s cc1plus.patched -g -O2 5m50.255s 5m48.398s 0m1.787s (middle from 3 runs). cc1plus.vanilla here is branches/gcc-4_4-branch with r148760 backported to it, and cc1plus.patched is the same with the following patch on top of it. So this patch saves approx. 15% of runtime, or 30% of vartrack pass runtime. The fixes are: 1) variable_union has been totally unnecessarily calling 178999434x xcalloc, qsort and free. This is in the loc_chain sorting code. I've instrumented trunk bootstrap (x86_64-linux and i686-linux) to track how frequent different values of dst_l and src_l are (x,y) stands for dst_l == x, src_l == y, sum stands for dst_l + src_l. The first line is gathered from the trunk bootstraps, the second one from the above mentioned s390x testcase: (1,1) (2,2) sum==3 sum<=4 sum<=8 sum<=16 sum<=32 sum>32 10072567 2698962 706337 27977 382251 2492 0 0 89673364 89268534 1400 89 2397 53650 0 0 Turns out that the most often combinations can be handled without qsort, fairly cheaply. For dst_l == 1 we can just put in the dst single entry followed by the src entries in order, except possibly if some src entry is the same as dst single entry we just skip it. dst_l == 2 can be handled easily as well. For dst_l == 1 we can avoid any kind of temporary buffer, for other combinations we can just use a fixed size static buffer with a dynamically realloced buffer just to be sure (and free it only at the end of the pass). 2) I doubt anybody uses -fvar-tracking-uninit, it certainly uses a very weird non-DWARFy DW_GNU_uninit extension which isn't clearly defined and GDB only barely supports it, just in a few places. Still, this code was causing a quadratic loop in variable_union which was showing up very high in the profile. Fixed by making sure ->init is always *_INITIALIZED unless -fvar-tracking-uninit and avoiding that loop. 3) dataflow_set_different_2 was just wasting cycles, we've checked at the beginning that both hashtables have the same number of elements, and if traversal of one of the hashtables found all decls from that table in the other table (and their variable objects were considered equal), then the same is necessarily true for the other hash table. 4) in sel-sched merge, rtx_equal_p was moved to very infrequently used rtx_equal_p_cb with added callback and rtx_equal_p implemented on top of that new function. Given that rtx_equal_p was showing very high in the callgrind profile and is very frequently used throughout gcc rtl passes (400+ different callers), isn't too long to be a maintainance burden and is recursive, so we can't get rid of the overhead of passing cb around and checking at each level, I'm proposing returning to the old rtx_equal_p function and have rtx_equal_p_cb as completely separate function. Patch has been bootstrapped/regtested on the trunk on x86_64-linux and i686-linux, on 4.4 branch and above mentioned testcase has been verified to generate bitwise identical assembly. Attached is the patch, then callgrind_annotate dump before this patch and after it, for anyone interested to suggest further low hanging fruit. Ok for trunk? Jakub
Attachment:
Y342
Description: Text document
Attachment:
cg.rh09
Description: Text document
Attachment:
cg.rh15
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |