Bug 18587 - [4.1 Regression] build_v_may_defs and build_vuses can be improved when adding
Summary: [4.1 Regression] build_v_may_defs and build_vuses can be improved when adding
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 minor
Target Milestone: 4.1.0
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks: 15855
  Show dependency treegraph
 
Reported: 2004-11-21 05:32 UTC by Andrew Pinski
Modified: 2005-10-05 15:22 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2004-11-25 19:48:40


Attachments
patch to resolve the problem (5.89 KB, patch)
2004-11-25 19:39 UTC, Andrew Macleod
Details | Diff
patch to replace the operand build vectors (7.18 KB, patch)
2005-10-04 12:46 UTC, Andrew Macleod
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2004-11-21 05:32:41 UTC
For PR 15855 we see that the tree aliasing analysis is slow:
 tree alias analysis   :  24.22 ( 6%) usr   0.75 ( 2%) sys  26.57 ( 5%) wall

9% of the total time is spent looking at for if something is in either build_v_may_defs or build_vuses.
Comment 1 Andrew Pinski 2004-11-22 17:24:25 UTC
That 9% should have been a 90%.
Comment 2 Andrew Macleod 2004-11-22 17:39:56 UTC
90% of what? the time spent in alias analysis?  how long is the average operand
list we are looking up then? is it because the lists are long, or is it because
there are a bazillion lookups? 

If the lists are longish, then we have a good argument for introducing sorted by
DECL_UID operand lists that have been discussed at various times in the past. 
The lookup time can be decreased significantly.

Comment 3 Andrew Pinski 2004-11-22 23:49:48 UTC
Oh, this has nothing to do with tree aliasing analysis at all but:
 tree operand scan     :  26.65 ( 7%) usr   3.78 (10%) sys  31.64 ( 6%) wall

With a hash table for these two we get:
 tree operand scan     :  14.89 ( 9%) usr   2.57 (11%) sys  17.61 ( 8%) wall

(alias analysis also goes down too):
 tree alias analysis   :  22.82 (14%) usr   0.16 ( 1%) sys  31.26 (13%) wall
Comment 4 Andrew Macleod 2004-11-23 15:05:31 UTC
I have no idea what you are talking about.  what are you doing, adding a hash
value to the stmt for the vmaydef and vuse lists?

And in the new numbers, why does tree alaising go down if its not affected? And
btw, it goes down in time, but increases from 6% of compile time to 14%? and at
the same time the operand scan goes from 26 sec. or 7% of compile time down to
14sec. but now comsumes 9% of compile time?

something doesnt add up. are you mixing enable checking and disabled checking
compiles?
Comment 5 Andrew Pinski 2004-11-23 15:20:40 UTC
No I changed the options so that -fPIC would not be used and the scheduler did not take up 200 
seconds which is why the precentage is different.
Comment 6 Andrew Macleod 2004-11-25 19:39:00 UTC
Created attachment 7609 [details]
patch to resolve the problem

First, we eliminate the linear search for duplicates when building VUSES
and VMAYDEFs. Instead, two bits are added to the var annotation and we
use those bits to determine whether a given var is already in the list
or not. 

second we speed up the code which adds virtual operands for the
call clobbered bit vector at each call site. Every time we had a call,
we iterated through the bitvector and built up an operand vector for
those variables.  Those variables dont change very often, so now we keep
a cache of the operand vector that is built. The next time we need the
clobbers for a call, we simply use the cache if it hasn't been
invalidated by adding or removing global variables.

The combined speedups from these two patches are pretty decent.

Average of two runs USER time for operand scanning phase on a 1.8Ghz P4.

				  orig		  patched
cplusplus-grammer.ii		22.58 sec	10.35 sec
tramp3d.ii			 6.14 sec	 4.03 sec
generate3.4.ii			 1.69 sec	 1.23 sec
all gcc .i files		 6.72 sec	 5.83 sec
Comment 7 Andrew Pinski 2004-11-25 19:48:40 UTC
Patch posted here: <http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02175.html>.
Comment 8 GCC Commits 2004-11-25 20:25:24 UTC
Subject: Bug 18587

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	amacleod@gcc.gnu.org	2004-11-25 20:24:59

Modified files:
	gcc            : ChangeLog tree-flow-inline.h tree-flow.h 
	                 tree-ssa-operands.c tree-ssa-operands.h 

Log message:
	2004-11-25  Andrew Macleod  <amacleod@redhat.com>
	
	PR tree-optimization/18587
	* tree-flow-inline.h (mark_call_clobbered, mark_non_addressable): Flag
	call clobbered caches as invalid.
	* tree-ssa-operands.c (ssa_call_clobbered_cache_valid): New.  Flag
	indicating whether the call clobbered operand cache is valid.
	(ssa_ro_call_cache_valid): New.  Flag indicating whether the pure/const
	call operand cache is valid.
	(clobbered_v_may_defs, clobbered_vuses, ro_call_vuses): New.
	cached list of operands for cached call virtual operands.
	(clobbered_aliased_loads, clobbered_aliased_stores,
	ro_call_aliased_load): New.  flags caching whether alias bits are to be
	set in call stmt's.  */
	(fini_ssa_operands): Remove call operand caches if present.
	(get_expr_operands, get_asm_expr_operands, get_indirect_ref_operands):
	Pass stmt annotation to add_stmt_operand.
	(get_call_expr_operands): Add call clobbered variables first.
	(add_stmt_operand): Take stmt annotation rather than stmt as a param.
	(add_call_clobber_ops, add_call_read_ops): Use the call operand cache
	if it is valid, otherise fill the cache.
	* tree-ssa-operands.h (ssa_clobbered_cache_valid): Declare extern.
	
	* tree-flow.h (struct var_ann_d): Add in_vuse_list and in_v_may_def_list
	bits.
	* tree-ssa-operands.c (cleanup_v_may_defs): New.  Clear the in_list bits
	for the v_may_def elements and empty the operand build array.
	(finalize_ssa_vuses): Use cleanup_v_may_defs and remove redundant VUSES
	by checking the in_v_may_def_list bit.
	(append_v_may_def, append_vuse): Use the in_list bit rather than
	scanning the array for duplicates.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6552&r2=2.6553
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow-inline.h.diff?cvsroot=gcc&r1=2.26&r2=2.27
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.71&r2=2.72
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.c.diff?cvsroot=gcc&r1=2.56&r2=2.57
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.h.diff?cvsroot=gcc&r1=2.8&r2=2.9

Comment 9 Andrew Pinski 2004-11-25 20:28:53 UTC
Fixed.
Comment 10 Andrew Macleod 2005-10-04 12:36:17 UTC
It looks like operands scan times have snuck up again when building virtual operand lists.

3.0 Ghz p4:
tree operand scan     :  18.97 (27%) usr
TOTAL                 :  71.11 
Comment 11 Andrew Macleod 2005-10-04 12:46:19 UTC
Created attachment 9868 [details]
patch to replace the operand build vectors

Virtual operands are maintained in a sorted order, and the scanner was
keeping them sorted as they were inserted. When there were a lot of
operands, this could result in a lot of time searching for the right
place to insert them.

Part of the original reason for sorting them was to be able to detect
duplicates and reject them. The original patch for this bug detects duplicate faster, so there is no longer a pressing need to have them sorted early.  They still need to be sorted as there are passes which depend on the order being predictable.  

This patch removes all the opbuild_* routines to maintain the operand
list as sorted, and simply replaces it with a VEC of operands. This VEC is
then sorted if necessary by a call to qsort.

This results in some good improvements in cases where there are a
lot of virtual operands.

on an x86-64 machine:

mainline tree operand scan	13.67 secs
         total time	        65.00 secs

patched  tree operands scan	 4.20 secs
         total time		50.09 secs

Same testcase on a 3.0Ghz p4:
 
mainline tree operand scan      18.61 secs
         total time             69.19 secs

patched  tree operands scan      3.85 secs
         total time             49.86 secs

overall effect to the cc1-i collection of files appears to be pretty
neutral..  tree optimizers and operands scan appear slightly faster,
overall time similar or a hair slower (1 second on 204 seconds). Couldnt
tell you why. maybe measurement noise.  C tends not to have nearly as
many virtual operands, so I don't expect the same kind of speedups
there.

bootstrapped and no new test failures on  i686-pc-linux-gnu.
Comment 12 GCC Commits 2005-10-05 15:16:55 UTC
Subject: Bug 18587

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	amacleod@gcc.gnu.org	2005-10-05 15:16:42

Modified files:
	gcc            : ChangeLog tree-ssa-opfinalize.h 
	                 tree-ssa-operands.c 

Log message:
	2005-10-05  Andrew MacLeod  <amacleod@redhat.com>
	
	PR tree-optimization/18587
	
	* tree-ssa-operands.c (struct opbuild_list_d, OPBUILD_LAST): Delete.
	(build_defs, build_uses, build_v_may_defs, build_v_must_defs,
	build_vuses): Change to VEC type.
	(opbuild_initialize_virtual, opbuild_initialize_real, opbuild_free,
	opbuild_num_elems, opbuild_append_real, opbuild_append_virtual,
	opbuild_first, opbuild_next, opbuild_elem_real, opbuild_elem_virtual,
	opbuild_elem_uid, opbuild_clear, opbuild_remove_elem): Delete.
	(get_name_decl): New.  Return DECL_UID of base variable.
	(operand_build_cmp): New.  qsort comparison routine.
	(operand_build_sort_virtual): New.  Sort virtual build vector.
	(init_ssa_operands, fini_ssa_operands): Use VEC routines.
	(FINALIZE_OPBUILD_BASE, FINALIZE_OPBUILD_ELEM): Use VEC_Index.
	(FINALIZE_BASE): Use get_name_decl.
	(finalize_ssa_defs, finalize_ssa_uses, cleanup_v_may_defs,
	finalize_ssa_v_may_defs, finalize_ssa_vuses, finalize_ssa_v_must_defs,
	(start_ssa_stmt_operands, append_def, append_use, append_vuse,
	append_v_may_def, append_v_must_def): Replace opbuild_* routines with
	direct VEC_* manipulations.
	(build_ssa_operands): Call operand_build_sort_virtual.
	(copy_virtual_operand, create_ssa_artficial_load_stmt,
	add_call_clobber_ops, add_call_read_ops): Replace opbuild_* routines
	with direct VEC_* manipulations.
	* tree-ssa-opfinalize.h (FINALIZE_FUNC): Replace opbuild_* routines
	with direct VEC manipulations.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10088&r2=2.10089
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-opfinalize.h.diff?cvsroot=gcc&r1=2.4&r2=2.5
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-operands.c.diff?cvsroot=gcc&r1=2.102&r2=2.103

Comment 13 Andrew Macleod 2005-10-05 15:22:39 UTC
fixed