This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: CPROP and spec2000 slowdown in mid february
- To: jh at suse dot cz
- Subject: Re: CPROP and spec2000 slowdown in mid february
- From: kenner at vlsi1 dot ultra dot nyu dot edu (Richard Kenner)
- Date: Sat, 28 Apr 01 22:33:30 EDT
- Cc: gcc at gcc dot gnu dot org
Problem is that try_replace_reg believes that it did the replacement,
but due to missing bits in simplify_replace_rtx and forgetting about
set destinations it don't.
Ah!
Yes, I guessed that this was probably your goal. Could you explain me
the main rationale, why we want to have two completely different
infrastructure bits, one designed to substitute register by register
and other to substitute register by constant?
My idea was that sometime you'd want to just do a "lexical"
substitution without doing any arithmetic simplifications and I'd
prefer the existing routine keep doing that. It may well be that all
the users of that routine will benefit from the simplifications in
which case they'll all call the new routine and we'll delete the old one.
This way each caller can be changed as we're sure we want to allow it to
make simplifications rather than changing the routine and having all of
those places "automatically" change: I think it's the more conservative
approach.
Your implementation IMO does have two problems. Most improtantly it
works way to hard doing an extra copy of whole rtx even if there is no
single change (by getting everything called trought simplify_*_rtx).
That's easy to fix if it becomes an issue, but unclear if it's worth
it now that we have GC.
Second problem is that, as currently implemented, it stops recursing
on first unknown element. One I caught was memory expression, but
there are many more. Getting it to recurse over generic rtxes is
somewhat tricky.
I don't follow.
Only think I am affraid of is performance regression of
combine. Currently combine spends most time IMO by doing substitutions
and these will get much more costy by getting torught memory
allocation, memset, re-filling of members etc.
I don't see that. Pretty much exactly the same code will be executed, except
it will live in simplify-rtx.c instead of combine.c.