This is the mail archive of the
mailing list for the GCC project.
Re: Path ping...
> > http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01371.html
> > Pass info about hotness of instruction to RTL expanders
> Here's where I've got stuck whilst reviewing the memcpy/memset
> patches previously, for which I apologise. I've been investigating
> attempts to use more profile information during RTL expansion myself.
> Firstly, I think its probably preferrable to provide a new global
> variable current_bb during RTL expansion, rather than just your
> new maybe_hot_insn_p. This can be set in expand_gimple_basic_block,
The reason why I didn't go this route is, that at the time RTL expanders
are called, notion of "current basic block" is not always very well
In particular during the rtl expansion the basic blocks are
still partly in tree form and they actually are a SEME regions of the
future RTL CFG (so it might make sense to flip the hotness for expanding
cold path of currently expanded higher level contruct).
Also some optimizers do emit instructions on the edges so there is no
convenient BB to deal with. Also once the inferface is pushed to
splitters, we do splitting without CFG on some machine targets. Since
the INSN chain at that time contains a lot of weird stuff, like constant
pools, it seems to me that it is probably sanner to consider the insn
chain as flat chain at this stage and just keep the hotness indicators
somewhere in the INSNs themselves rather than trying to force CFG code
to deal with even more inter-BB noise than it is dealing with now.
My overall plan is to kill all the inter-BB stuff. I am trying to get
number of INSN_NOTES down for the reason that once PROLOGUE_* and
FUNCTION_* notes are dropped, we can change CFG definition to start
considering NOTEs to be instructions and thus part of BB killing good
portion of cfglayout uglyness. (all the other notes either can be
removed at delete_basic_block time or are short lived, so we don't
necesarily need to cleanup_cfg at that time)
I am not too opposed to current_bb scheme, just it seems to me that it
is better to give less info the interfaces so we won't end up assuming
that other parts of the CFG structures are necesarily sane at the
moment or end up doing nasty tricks like temporarily modifying the CFG.
> and cleared when it's done, so that expansion routines can check
> "current_bb && maybe_hot_bb_p (current_bb)" as appropriate. This
> also avoids issues with your change to standard_80387_constant_p
> which is called during many passes of the compiler, not just during
> RTL expansion, at which point it may have bogus values. This issue
This is also solved by the patch: the idea is that compiler should
optimize for speed at unless given -Os or it has good reason to believe
that the current code is cold. This is also reason for "maybe" in the
name of predicate and "probably_" in the name of cold predicate.
The BB expansion code takes care to always reset the hotness indicator
to true, so all the other places we enter RTL emitting machinery (it is
not only the standard.*_p macro), we will conservatively optimize for
As I've mentioned, I intend to push hotness information into more parts
of compiler (such as cost tables, so multiply expansion and friends are
optimized for size in cold blocks too).
> > http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01358.html
> > Infrastructure for passing memcpy/memset profiles to backend
> If it's OK with you, I'd like to give some more opportunity for folks
> to comment on this histogram infrastructure change, whilst the dust
> settles on the above patches. It'll also give me a chance to see
> what x86 code we actually generate with all of your above changes.
> Please ping this again in a while, and it you could include some
> expected performance numbers that would be great.
Hehe, I guess it is time for me to finish the profile passing code.
I've put some expected benefits into the end of x86 patch mail, I will
re-benchmark once I have everything on place again.
> Many thanks,