Disable accumulate-outgoing-args for Generic and Buldozers
Wed Jan 29 10:02:00 GMT 2014
On Tue, Jan 28, 2014 at 07:26:47AM +0100, Jakub Jelinek wrote:
> > I wonder if this is just some of --enable-checking tests in dwarf2out going wild
> > or if it is just expensive sanity checking code?
> > I used to have chroot environment for 32bit builds, I will need to re-install it now.
> variable tracking :2914.85 (83%) usr 1.88 ( 7%) sys2931.22 (82%) wall 80844 kB ( 3%) ggc
> var-tracking dataflow : 18.19 ( 1%) usr 0.19 ( 1%) sys 18.49 ( 1%) wall 10899 kB ( 0%) ggc
> var-tracking emit : 29.41 ( 1%) usr 0.11 ( 0%) sys 29.65 ( 1%) wall 148128 kB ( 6%) ggc
> TOTAL :3525.97 25.73 3570.33 2321043 kB
> So, strangely both vt_find_locations and vt_emit_notes, typically the most expensive ones,
> are quite unexpensive and most of the time is spent elsewhere, in vt_initialize?
So, most of the time seems to be spent in cselib.c remove_useless_values
(both from Ctrl-C in gdb profiling, and callgrind). For callgrind I've
actually built 64-bit cc1plus with --enable-checking=release, and still compiled
the same --enable-checking=yes,rtl -m32 -O2 -g insn-recog.c, the build then
took just 14 minutes instead of 60 minutes, and in that case only about 30%
of compile time has been spent in var-tracking and 20% of compile time
in remove_useless_values in particular.
The problem with remove_useless_values is that we have quickly very big
cselib hash table (over 200000 entries) and very large number of very small
basic blocks (over 34000) and at the end of every basic block we call
cselib_preserve_only_values which walks the whole hash table, and e.g.
references_value_p is called 845114869x from discard_useless_locs.
A micro-optimization could be e.g. to turn references_value_p into a
template where only_useless would be a template parameter rather than actual
parameter (due to recursion inlining doesn't help here) or just two
Also, for RTL checking, I wonder if the functions like reference_values_p
and similar ones that use GET_RTX_FORMAT/GET_RTX_LENGTH to walk the elements
couldn't use special non-checking macros in doing so if the compiler can't
figure out checking is redundant there because it is being performed by hand
by the function (haven't verified). And, perhaps also an approach similar
to http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html for
GET_RTX_FORMAT and/or GET_RTX_LENGTH (so that at least for the cases where
the compiler knows which rtx code it is (if some code is guarded with
specific GET_CODE () == test), it could avoid loading from the const
Anyway, I guess more important is whether all the values in the
cselib_hash_table will be ever useful for some lookup, or if there are
e.g. values that only reference preserved values where none of those
referenced values have any locations other than preserved values.
Or if we can somehow quickly find out what VALUEs have changed during
processing of the last bb and only process that subset instead of all the
hash table entries all the time.
Alex? Your thoughts?
More information about the Gcc-patches