This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Disable accumulate-outgoing-args for Generic and Buldozers
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: Alexandre Oliva <aoliva at redhat dot com>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 29 Jan 2014 11:02:05 +0100
- Subject: Re: Disable accumulate-outgoing-args for Generic and Buldozers
- Authentication-results: sourceware.org; auth=none
- References: <20140101143004 dot GD26209 at kam dot mff dot cuni dot cz> <20140124211119 dot GX892 at tucnak dot redhat dot com> <20140127221939 dot GH892 at tucnak dot redhat dot com> <20140127222546 dot GI892 at tucnak dot redhat dot com> <20140127235628 dot GC12944 at kam dot mff dot cuni dot cz> <20140128062647 dot GJ892 at tucnak dot redhat dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Tue, Jan 28, 2014 at 07:26:47AM +0100, Jakub Jelinek wrote:
> > I wonder if this is just some of --enable-checking tests in dwarf2out going wild
> > or if it is just expensive sanity checking code?
> > I used to have chroot environment for 32bit builds, I will need to re-install it now.
>
> variable tracking :2914.85 (83%) usr 1.88 ( 7%) sys2931.22 (82%) wall 80844 kB ( 3%) ggc
> var-tracking dataflow : 18.19 ( 1%) usr 0.19 ( 1%) sys 18.49 ( 1%) wall 10899 kB ( 0%) ggc
> var-tracking emit : 29.41 ( 1%) usr 0.11 ( 0%) sys 29.65 ( 1%) wall 148128 kB ( 6%) ggc
> TOTAL :3525.97 25.73 3570.33 2321043 kB
>
> So, strangely both vt_find_locations and vt_emit_notes, typically the most expensive ones,
> are quite unexpensive and most of the time is spent elsewhere, in vt_initialize?
So, most of the time seems to be spent in cselib.c remove_useless_values
(both from Ctrl-C in gdb profiling, and callgrind). For callgrind I've
actually built 64-bit cc1plus with --enable-checking=release, and still compiled
the same --enable-checking=yes,rtl -m32 -O2 -g insn-recog.c, the build then
took just 14 minutes instead of 60 minutes, and in that case only about 30%
of compile time has been spent in var-tracking and 20% of compile time
in remove_useless_values in particular.
The problem with remove_useless_values is that we have quickly very big
cselib hash table (over 200000 entries) and very large number of very small
basic blocks (over 34000) and at the end of every basic block we call
cselib_preserve_only_values which walks the whole hash table, and e.g.
references_value_p is called 845114869x from discard_useless_locs.
A micro-optimization could be e.g. to turn references_value_p into a
template where only_useless would be a template parameter rather than actual
parameter (due to recursion inlining doesn't help here) or just two
functions.
Also, for RTL checking, I wonder if the functions like reference_values_p
and similar ones that use GET_RTX_FORMAT/GET_RTX_LENGTH to walk the elements
couldn't use special non-checking macros in doing so if the compiler can't
figure out checking is redundant there because it is being performed by hand
by the function (haven't verified). And, perhaps also an approach similar
to http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html for
GET_RTX_FORMAT and/or GET_RTX_LENGTH (so that at least for the cases where
the compiler knows which rtx code it is (if some code is guarded with
specific GET_CODE () == test), it could avoid loading from the const
arrays).
Anyway, I guess more important is whether all the values in the
cselib_hash_table will be ever useful for some lookup, or if there are
e.g. values that only reference preserved values where none of those
referenced values have any locations other than preserved values.
Or if we can somehow quickly find out what VALUEs have changed during
processing of the last bb and only process that subset instead of all the
hash table entries all the time.
Alex? Your thoughts?
Jakub