For PR 15855 we see that the tree aliasing analysis is slow: tree alias analysis : 24.22 ( 6%) usr 0.75 ( 2%) sys 26.57 ( 5%) wall 9% of the total time is spent looking at for if something is in either build_v_may_defs or build_vuses.
That 9% should have been a 90%.
90% of what? the time spent in alias analysis? how long is the average operand list we are looking up then? is it because the lists are long, or is it because there are a bazillion lookups? If the lists are longish, then we have a good argument for introducing sorted by DECL_UID operand lists that have been discussed at various times in the past. The lookup time can be decreased significantly.
Oh, this has nothing to do with tree aliasing analysis at all but: tree operand scan : 26.65 ( 7%) usr 3.78 (10%) sys 31.64 ( 6%) wall With a hash table for these two we get: tree operand scan : 14.89 ( 9%) usr 2.57 (11%) sys 17.61 ( 8%) wall (alias analysis also goes down too): tree alias analysis : 22.82 (14%) usr 0.16 ( 1%) sys 31.26 (13%) wall
I have no idea what you are talking about. what are you doing, adding a hash value to the stmt for the vmaydef and vuse lists? And in the new numbers, why does tree alaising go down if its not affected? And btw, it goes down in time, but increases from 6% of compile time to 14%? and at the same time the operand scan goes from 26 sec. or 7% of compile time down to 14sec. but now comsumes 9% of compile time? something doesnt add up. are you mixing enable checking and disabled checking compiles?
No I changed the options so that -fPIC would not be used and the scheduler did not take up 200 seconds which is why the precentage is different.
Created attachment 7609 [details] patch to resolve the problem First, we eliminate the linear search for duplicates when building VUSES and VMAYDEFs. Instead, two bits are added to the var annotation and we use those bits to determine whether a given var is already in the list or not. second we speed up the code which adds virtual operands for the call clobbered bit vector at each call site. Every time we had a call, we iterated through the bitvector and built up an operand vector for those variables. Those variables dont change very often, so now we keep a cache of the operand vector that is built. The next time we need the clobbers for a call, we simply use the cache if it hasn't been invalidated by adding or removing global variables. The combined speedups from these two patches are pretty decent. Average of two runs USER time for operand scanning phase on a 1.8Ghz P4. orig patched cplusplus-grammer.ii 22.58 sec 10.35 sec tramp3d.ii 6.14 sec 4.03 sec generate3.4.ii 1.69 sec 1.23 sec all gcc .i files 6.72 sec 5.83 sec
Patch posted here: <http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02175.html>.
Subject: Bug 18587 CVSROOT: /cvs/gcc Module name: gcc Changes by: amacleod@gcc.gnu.org 2004-11-25 20:24:59 Modified files: gcc : ChangeLog tree-flow-inline.h tree-flow.h tree-ssa-operands.c tree-ssa-operands.h Log message: 2004-11-25 Andrew Macleod <amacleod@redhat.com> PR tree-optimization/18587 * tree-flow-inline.h (mark_call_clobbered, mark_non_addressable): Flag call clobbered caches as invalid. * tree-ssa-operands.c (ssa_call_clobbered_cache_valid): New. Flag indicating whether the call clobbered operand cache is valid. (ssa_ro_call_cache_valid): New. Flag indicating whether the pure/const call operand cache is valid. (clobbered_v_may_defs, clobbered_vuses, ro_call_vuses): New. cached list of operands for cached call virtual operands. (clobbered_aliased_loads, clobbered_aliased_stores, ro_call_aliased_load): New. flags caching whether alias bits are to be set in call stmt's. */ (fini_ssa_operands): Remove call operand caches if present. (get_expr_operands, get_asm_expr_operands, get_indirect_ref_operands): Pass stmt annotation to add_stmt_operand. (get_call_expr_operands): Add call clobbered variables first. (add_stmt_operand): Take stmt annotation rather than stmt as a param. (add_call_clobber_ops, add_call_read_ops): Use the call operand cache if it is valid, otherise fill the cache. * tree-ssa-operands.h (ssa_clobbered_cache_valid): Declare extern. * tree-flow.h (struct var_ann_d): Add in_vuse_list and in_v_may_def_list bits. * tree-ssa-operands.c (cleanup_v_may_defs): New. Clear the in_list bits for the v_may_def elements and empty the operand build array. (finalize_ssa_vuses): Use cleanup_v_may_defs and remove redundant VUSES by checking the in_v_may_def_list bit. (append_v_may_def, append_vuse): Use the in_list bit rather than scanning the array for duplicates. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6552&r2=2.6553 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow-inline.h.diff?cvsroot=gcc&r1=2.26&r2=2.27 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.71&r2=2.72 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.c.diff?cvsroot=gcc&r1=2.56&r2=2.57 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.h.diff?cvsroot=gcc&r1=2.8&r2=2.9
Fixed.
It looks like operands scan times have snuck up again when building virtual operand lists. 3.0 Ghz p4: tree operand scan : 18.97 (27%) usr TOTAL : 71.11
Created attachment 9868 [details] patch to replace the operand build vectors Virtual operands are maintained in a sorted order, and the scanner was keeping them sorted as they were inserted. When there were a lot of operands, this could result in a lot of time searching for the right place to insert them. Part of the original reason for sorting them was to be able to detect duplicates and reject them. The original patch for this bug detects duplicate faster, so there is no longer a pressing need to have them sorted early. They still need to be sorted as there are passes which depend on the order being predictable. This patch removes all the opbuild_* routines to maintain the operand list as sorted, and simply replaces it with a VEC of operands. This VEC is then sorted if necessary by a call to qsort. This results in some good improvements in cases where there are a lot of virtual operands. on an x86-64 machine: mainline tree operand scan 13.67 secs total time 65.00 secs patched tree operands scan 4.20 secs total time 50.09 secs Same testcase on a 3.0Ghz p4: mainline tree operand scan 18.61 secs total time 69.19 secs patched tree operands scan 3.85 secs total time 49.86 secs overall effect to the cc1-i collection of files appears to be pretty neutral.. tree optimizers and operands scan appear slightly faster, overall time similar or a hair slower (1 second on 204 seconds). Couldnt tell you why. maybe measurement noise. C tends not to have nearly as many virtual operands, so I don't expect the same kind of speedups there. bootstrapped and no new test failures on i686-pc-linux-gnu.
Subject: Bug 18587 CVSROOT: /cvs/gcc Module name: gcc Changes by: amacleod@gcc.gnu.org 2005-10-05 15:16:42 Modified files: gcc : ChangeLog tree-ssa-opfinalize.h tree-ssa-operands.c Log message: 2005-10-05 Andrew MacLeod <amacleod@redhat.com> PR tree-optimization/18587 * tree-ssa-operands.c (struct opbuild_list_d, OPBUILD_LAST): Delete. (build_defs, build_uses, build_v_may_defs, build_v_must_defs, build_vuses): Change to VEC type. (opbuild_initialize_virtual, opbuild_initialize_real, opbuild_free, opbuild_num_elems, opbuild_append_real, opbuild_append_virtual, opbuild_first, opbuild_next, opbuild_elem_real, opbuild_elem_virtual, opbuild_elem_uid, opbuild_clear, opbuild_remove_elem): Delete. (get_name_decl): New. Return DECL_UID of base variable. (operand_build_cmp): New. qsort comparison routine. (operand_build_sort_virtual): New. Sort virtual build vector. (init_ssa_operands, fini_ssa_operands): Use VEC routines. (FINALIZE_OPBUILD_BASE, FINALIZE_OPBUILD_ELEM): Use VEC_Index. (FINALIZE_BASE): Use get_name_decl. (finalize_ssa_defs, finalize_ssa_uses, cleanup_v_may_defs, finalize_ssa_v_may_defs, finalize_ssa_vuses, finalize_ssa_v_must_defs, (start_ssa_stmt_operands, append_def, append_use, append_vuse, append_v_may_def, append_v_must_def): Replace opbuild_* routines with direct VEC_* manipulations. (build_ssa_operands): Call operand_build_sort_virtual. (copy_virtual_operand, create_ssa_artficial_load_stmt, add_call_clobber_ops, add_call_read_ops): Replace opbuild_* routines with direct VEC_* manipulations. * tree-ssa-opfinalize.h (FINALIZE_FUNC): Replace opbuild_* routines with direct VEC manipulations. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10088&r2=2.10089 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-opfinalize.h.diff?cvsroot=gcc&r1=2.4&r2=2.5 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.c.diff?cvsroot=gcc&r1=2.102&r2=2.103
fixed