Re: gcc compile-time performance

In message <>, "David S. Miller" write
 > The biggest offender on x86 for some of my profiles seems to be
 > for_each_rtx(), which actually stems a tiny bit from stupidity in
 > for_each_rtx (some recursion can be eliminated) and a lot of stupidity
 > in CSE.
 > CSE's approx_reg_cost() is the true problem...  That thing gets called
 > on every expression CSE inserts into it's tables, every address that
 > the cost is computed for, etc. (which is "a lot").  What's more it is
 > implemented stupidly (construct a reg set just to check for bits set,
 > ummm why not just do the counter bumping in the for_each_rtx helper
 > function, duh...)
 > Furthermore, when approx_reg_cost does get a REG, it should just
 > return -1 to for_each_rtx so it doesn't look into subexpressions
 > of the REG (translating into 2 or 1 useless recursive call to
 > for_each_rtx depending upon whether simple recursion has been
 > eliminated from for_each_rtx).
 > Finally, the whole hardreg counting thing is just to see if there
 > is ONE hard reg present in the expression.  This also only matters
 > on SMALL_REGISTER_CLASSES machines, so we can just return '1' from
 > approx_reg_cost_1() if we see a hard reg on a S_R_C target and check
 > that in the top-level return from for_each_rtx().
 > I've begun to hack up most of this...  patch below in case anyone
 > wants to play along at home.
 > One big disappointment is that, because approx_reg_cost runs on
 > just about any RTX, we can't use note_uses() just like gcse.c
 > does to find REGs.  note_uses only works on the toplevel pattern
 > of an INSN.
 > Even with my fixes below, for_each_rtx still is the third function
 > listed in the x86 profiles (right under memset() and cse_insn()).
 > I started to look into the memset() issues, but got distracted
 > when I noticed this approx_reg_cost() buisness...
 > Basically, I discovered that the compiler spends most of it's time
 > computing a heuristic... I think that speaks for itself :-)
[ ... ]
Whoops.  It is this change that gets me 1-2%.  

The change to cselib causes my PAs to hang in cselib_invalidate_regno.


