This is the mail archive of the
mailing list for the GCC project.
Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)
- From: Martin Liška <mliska at suse dot cz>
- To: Jan Hubicka <hubicka at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 18 Feb 2015 11:28:37 +0100
- Subject: Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)
- Authentication-results: sourceware.org; auth=none
- References: <54E376FC dot 9080709 at suse dot cz> <20150217183839 dot GA18175 at atrey dot karlin dot mff dot cuni dot cz>
On 02/17/2015 07:38 PM, Jan Hubicka wrote:
thanks for working on it. There are 3 basically indpeendent changes in the patch
- The patch to make checking in lto_streamer_init ENABLE_CHECKING only that I
think can be comitted as obvoius.
Following email contains fix for that, which I'm going to install.
- Templates for call_for_symbol_and_aliases
I do not think these should be strictly necessary for perofrmance, because once we
spent too much time in these we are bit screwed.
I however see it also makes things bit nicer by not needing typecasts on data pointer.
Pehraps that could be further cleaned?
Alternative would be to implement FOR_EACH_ALIAS macro with tree walking iterator.
You have all the structure to not require stack. Iterator will ocntain an
root node, current node and index to ref.
This may be even easier to use and probably wind up generating about the same code
given that the for each template anyway needs to produce self recursive function.
I would not care about for_symbol_thunk_and_aliases. That function is heavy by walking
all callers anyway and should not be used in hot code.
I have patch that removes its use from inliner - it is more or less leftover from time
we represented thunks as special aliases instead of functions w/o gimple body.
Yes, I was also thinking about flat iterator that will be capable of iterating thunks/aliases and
I prefer that approach compared to recursive functions. I think we can prepare it for next release,
as you said it does not bring so much performance gain.
- the caching itself.
I will look into the caching in detail. I am not quite sure I like the idea of exposing inline
only cache into cgraph.h. You could just keep the predicates as are, but have inline_ variants
in ipa-inline.h that does the caching for you.
Allocating the bits directly in cgraph_node is probably OK, we don't really have shortage there
and can be revisited easily later...
Please take a look at caching, it would be crucial part of speed improvement.
>From eb9d34244c43ae1d0576b2ae1002f5267c6cd547 Mon Sep 17 00:00:00 2001
From: mliska <firstname.lastname@example.org>
Date: Wed, 18 Feb 2015 11:18:47 +0100
Subject: [PATCH] Add checking macro within lto_streamer_init.
2015-02-18 Martin Liska <email@example.com>
* lto-streamer.c (lto_streamer_init): Encapsulate
streamer_check_handled_ts_structures with checking macro.
gcc/lto-streamer.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 836dce9..542a813 100644
@@ -319,11 +319,13 @@ static hash_table<tree_hash_entry> *tree_htab;
/* Check that all the TS_* handled by the reader and writer routines
match exactly the structures defined in treestruct.def. When a
new TS_* astructure is added, the streamer should be updated to
handle it. */
tree_htab = new hash_table<tree_hash_entry> (31);