This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Re: regstack

To: law at cygnus dot com
Subject: Re: regstack
From: Jan Hubicka <hubicka at atrey dot karlin dot mff dot cuni dot cz>
Date: Thu, 5 Nov 1998 15:29:49 +0100
Cc: egcs-patches at cygnus dot com
References: <19981104110941.37266@atrey.karlin.mff.cuni.cz> <10524.910242706@upchuck>
> 
>   In message <19981104110941.37266@atrey.karlin.mff.cuni.cz>you write:
>   > To distinguish those two, regstack removes all REG_DEAD notes and adds
>   > REG_DEAD note with FIRST_STACK_REG to insns where it expects to pop the
>   > operand.
> So, what we really need is a to mark the ambigious case instead of abusing
> REG_DEAD notes.  Right?
Yes..
> 
> Then presumably we could keep the death and unused notes if we had another
> way to mark the ambigious insns.  If that's the case we can create a new
> note.  The only question would be if we leave the REG_DEAD and REG_UNUSED
> notes attached to the insns, are they still accurate?
According my understanding, reg-stack goes trought stack and generates those
pop insns in case register dies (REG_DEAD note is present). So the new
REG_DEAD notes quite corresponds to the old REG_DEAD notes, but their handling
is complettely different and thats why I think they should be different.
(also sometimes reg-stack expect that register dies implicitly so some
notes are lost and sometimes new notes are generated)

Other problem is, that after reg-stack is done, RTL is in quite inconsistent
state (all fp insns do something a bit different than their rtl representation
says etc.) Comment in reg-stack says that no FP insn should be modified after
reg-stack. (thats why reg-stack is the last pass of compilation)

The places where register dies are the same, so old REG_DEAD notes
should be kept IMO (except rare cases when register is poped by separate insn
and then they needs to be moved).
I really don't have any idea why REG_UNUSED notes are removed, because
they don't play any important role to reg-stack.

I will have to read reg-stack even more closely.
>   > 
>   > This should be fixed by eighter swaping the loads or because i387 has for
>   > almost all non-commutative opcodes versions with swapped operands just
>   > by emiting other instruction. I want to make reg-stack handling this soon,
>   > because on non-pentium architectures (swap is _VERY_ cheap at Pentium)
>   > it should help a lot. (I think pgcc already has the code, but disabled
>   > because it invoked the FP comparsion bug I've (I hope) fixed)
> As I mentioned before, I think rewriting the whole swapping code to use
> lazy code motion is much better way to model the problem.

Yes. I was thinking about this as about temporary change (expect that LCM
will take some time to get in and rewriting reg-stack is relatively huge task)
> 
>   > (after LCM framework is in). But I don't know if LCM is applicable in all
>   > cases, because regstack as it is does perfect job on Intel's
>   > CPUs, where register stack is just virtual and done only by renaming (thats
>   > why swap is less expensive than load). So scheduling is better to be done
>   > before reg-stack.... this is not true for AMD, [34]86 and Cyrix.
> I don't see that we can't model this problem with lcm and still include the
> differences between the various cpus.
> 
> When all the weirdness is stripped away, all regstack does is copy values
> from one register to another.  That screams "copy motion based on lazy code
> motion" to me.  If we need cost parameters or whatever to drive it, fine.

Well, I have no knowledge about LCM. I *think* that LCM is expected to reorder
code to eliminate unnecesary moves. So because all FPU operations operate
with "top of stack" and arbitary register it will reorder code into
something like:
add register 5 to top of stack
multiply top of stack by 10
add four to top of stack
and so on. Always top of stack will be used so natural paralelizm of code
is lost and scheduling is impossible.

Intel manual this call "top of stack bottleneck".
Because FP instructions have large latencies, Pentium has work-around for it
and make stack just virtual. Internally FPU has normal
registers and stack operations are done by renaming the registers.

Manual says, that best is to write code for normal registers (not stack),
schedule it and then rewrite into stack representation using fxch.
This instruction exchanges two registers. It does just renaming of them
so it can be done even when values in registers are not calucalted yet
so you should eliminate the bottleneck. Between two FP operations it takes
0 cycles.

This is exactly what current reg-stack implementation does and thats IMO
why GCC generate excelent FP for Pentium, PentiumII and PentiumPro CPUs.

I *think* that LCM started before scheduler will destroy the paralelizm
(you can't schedule code wrote for stack registers). And done after
scheduler it will undo it's changes resulting in code with many flow
dependencies. This is great for K6 and Cyrix where FPU units are not pipelined
but catastrophical for Intel's CPUs.

Formulation from Intel manual is following:

Most floating-point operations require that one operand and the result use
the top of stack. This makes each instruction dependent on the previous
instruction and inhibits overlapping the instructions .

One obvious way to get around this is to change the architecture and
have floating-point registers, rather than a stack. Unfortunately,
upward and downward compatibility would be lost.   Instead, the fxch
instruction was made &quot;fast&quot;. This provides us another way to
avoid the top of stack dependencies.  The fxch instructions can be paired
with the common floating-point operations, so there is no penalty on the
Pentium processor.  On the Intel486 processor, each fxch takes 4 clocks.

> Well, I don't expect that we'll be doing the final splitting before regstack,
> but we'll keep the option open if we can't fix the problems in a cleaner
> manner.

OK.
>   > So I've added expand allocating stack slot and using this insn. This solved
>   > some of bad effect of original solution but still is very bad.
>   > I believe that right place to handle this storing is reload, that should
>   > then handle something like
>   > (match_operand "general_operand" "m") and spill second register to memory.
>   > It should take advantage of this fact and should avoid possible spilling
>   > of other register and so on.
> It'll do that for certain SUBREGs, but in general I don't think reload is
> good enough to say "ahh, that's a memory operand so I'll stuff it in memory."

OK. Do you have any other suggestions to handle this problem?

Honza
References:
- Re: regstack
  - From: Jeffrey A Law
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]