This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: An issue for the SC: horrible documentation quality of GCC
- From: Jan Hubicka <jh at suse dot cz>
- To: Richard Kenner <kenner at vlsi1 dot ultra dot nyu dot edu>
- Cc: jh at suse dot cz, gcc at gcc dot gnu dot org
- Date: Sat, 10 May 2003 00:59:30 +0200
- Subject: Re: An issue for the SC: horrible documentation quality of GCC
- References: <10305092256.AA20158@vlsi1.ultra.nyu.edu>
> For each use of the register (reg c) you look for the chain of
> operations preceeding it like:
> (set (reg b) (reg a)
> (set (reg c) (reg b)
> and you replace the use by the oldest copy of the value (reg a) in this
> case.
>
> Why? That seems, at best, to accomplish nothing and, at worst, to
> pessimize the code. I'm not familiar with the new register allocator,
> but in the original one, all of those registers, assuming they are local,
> will be tied together and all allocated the same hard register.
They are tied in allocator, the point is to tie them earlier, so
expression
(plus (reg a) (reg c))
will look equivalent to
(plus (reg a) (reg a))
This is what PRE needs to work correctly.
>
> It would seem to me that when you do this optimization, you run the
> real risk of increasing the lifetime of a register over what it was
> before, especially if you consider the cases where some of the
> registers in question are hard regs and some are global. Don't we already
> have code that knows how to handle these cases efficiently?
Depends on the definition of efficiently. The problem is dificult to
solve in the full generality. You can construct cases where copy
propagation increases live ranges of registers (when the ealiest
incarantion of the value is killed so the later incarantions are needed)
I seem to remember that even this subset of optimization (looking for
minimal amount of copies with minimal number of simultatelously live
registers) is NP complette.
We do have heuristics to deal with this but they don't work well either
and I believe that they work worse than the copy propagation mechanizm
(at least that is how do I explain the speedups I measured after
introducing special purpose copy propagation pass)
> It is not perfect, as in the case user already wrote code initializing
> variable to 100 many times, we won't cache the copy of 100 in a
> register.
>
> But if the constant of 100 can fit into one insn, why would you *want to*
> cache it in a register? It's more efficient to use it each time rather
> than waste a register.
In the case we can use it each time, the constant propagation as
implemented will do that (replace every occurence of the pseudo by
constant). I tought you are speaking about situation where it is not
good idea to replace the reigsters by constant?
>
> This is also interesting. How well does the heuristic work? In fact
> I would expect that it will often fail to elliminate the else block
> and result in moving NOP to the less frequently executed place in the
> superblock.
>
> I'm not sure what you mean here.
I man extended basic block like:
A
/ \
B C
Now CSE is deciding whether NOP will be in A or B. A is put before B
(this is usual) and B constains some other instructions. Now it will
place the NOP in B and keep copy in A and it can be the case that A is
executed 1000000 times while B not at all.
Honza