This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC, PATCH] LTO: IPA inline speed up for large apps (Chrome)

On 02/17/2015 07:38 PM, Jan Hubicka wrote:
thanks for working on it.  There are 3 basically indpeendent changes in the patch
  - The patch to make checking in lto_streamer_init ENABLE_CHECKING only that I
    think can be comitted as obvoius.


Following email contains fix for that, which I'm going to install.

  - Templates for call_for_symbol_and_aliases
    I do not think these should be strictly necessary for perofrmance, because once we
    spent too much time in these we are bit screwed.
    I however see it also makes things bit nicer by not needing typecasts on data pointer.
    Pehraps that could be further cleaned?

    Alternative would be to implement FOR_EACH_ALIAS macro with tree walking iterator.
    You have all the structure to not require stack.  Iterator will ocntain an
    root node, current node and index to ref.
    This may be even easier to use and probably wind up generating about the same code
    given that the for each template anyway needs to produce self recursive function.

    I would not care about for_symbol_thunk_and_aliases.  That function is heavy by walking
    all callers anyway and should not be used in hot code.
    I have patch that removes its use from inliner - it is more or less leftover from time
    we represented thunks as special aliases instead of functions w/o gimple body.

Yes, I was also thinking about flat iterator that will be capable of iterating thunks/aliases and
I prefer that approach compared to recursive functions. I think we can prepare it for next release,
as you said it does not bring so much performance gain.

  - the caching itself.

I will look into the caching in detail.  I am not quite sure I like the idea of exposing inline
only cache into cgraph.h.  You could just keep the predicates as are, but have inline_ variants
in ipa-inline.h that does the caching for you.

Allocating the bits directly in cgraph_node is probably OK, we don't really have shortage there
and can be revisited easily later...


Please take a look at caching, it would be crucial part of speed improvement.

>From eb9d34244c43ae1d0576b2ae1002f5267c6cd547 Mon Sep 17 00:00:00 2001
From: mliska <>
Date: Wed, 18 Feb 2015 11:18:47 +0100
Subject: [PATCH] Add checking macro within lto_streamer_init.


2015-02-18  Martin Liska  <>

	* lto-streamer.c (lto_streamer_init): Encapsulate
	streamer_check_handled_ts_structures with checking macro.
 gcc/lto-streamer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 836dce9..542a813 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -319,11 +319,13 @@ static hash_table<tree_hash_entry> *tree_htab;
 lto_streamer_init (void)
   /* Check that all the TS_* handled by the reader and writer routines
      match exactly the structures defined in treestruct.def.  When a
      new TS_* astructure is added, the streamer should be updated to
      handle it.  */
   streamer_check_handled_ts_structures ();
   tree_htab = new hash_table<tree_hash_entry> (31);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]