This is the mail archive of the
mailing list for the GCC project.
Re: gcc compile-time performance
> The biggest offender on x86 for some of my profiles seems to be
> for_each_rtx(), which actually stems a tiny bit from stupidity in
> for_each_rtx (some recursion can be eliminated) and a lot of stupidity
> in CSE.
I sent the patch for non-recursive for_each_rtx some time ago.
It didn't made big difference for i386, but I would guess on SPARC it can
be different, because of the register windows.
> Furthermore, when approx_reg_cost does get a REG, it should just
> return -1 to for_each_rtx so it doesn't look into subexpressions
> of the REG (translating into 2 or 1 useless recursive call to
> for_each_rtx depending upon whether simple recursion has been
> eliminated from for_each_rtx).
> Finally, the whole hardreg counting thing is just to see if there
> is ONE hard reg present in the expression. This also only matters
> on SMALL_REGISTER_CLASSES machines, so we can just return '1' from
> approx_reg_cost_1() if we see a hard reg on a S_R_C target and check
> that in the top-level return from for_each_rtx().
> I've begun to hack up most of this... patch below in case anyone
> wants to play along at home.
> One big disappointment is that, because approx_reg_cost runs on
> just about any RTX, we can't use note_uses() just like gcse.c
> does to find REGs. note_uses only works on the toplevel pattern
note_uses won't return you the REGs in complex expression.
For (set (reg A) (plus (reg B) (reg C))
the callback is called just for (plus (reg B) (reg C)) and needs to do
it's own walk.
If GCSE relies on something else, it is buggy.
> Basically, I discovered that the compiler spends most of it's time
> computing a heuristic... I think that speaks for itself :-)
Yes. Hope I will see the day CSE pass will die.