This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: DFA for PPro, P2, P3


> Now we have I1, which uses one instance of P01 and P0.  Those are precisely
> the resources we have available, so we go ahead and fire I1.  [ Note this
> is based on haifa's notion of cycles and issue rates, meaning the 4:1:1
> uop decoder template is not modeled. ]
> 
> 
> Clearly this is not good as we actually over-subscribed the P0 unit with
> two uops in a single cycle.  We over-subscribed the P0 unit because we
> have not accurately described the pipeline.   And yes, this actually happens.

No, what we do is to model p0 instrution to use both p0 unit and p01 unit, so
the p01 unit does not get over-subscribed.  Only one p01 instruction can be
issued then.  The description actually is accurate, just unnatural.

The thing you may describe well using DFA and not using old scheme are the
decoders - ie PentiumPro has 3 decoders, where decoder 0 is able to decode more
than others so you need to order the triples of instructions issued same cycle
accordingly.  The features made for VLIW CPUs (exclusion set/presence set) can
be nicely used to describe this asymetricity as I do already for pentium and
plan for K6 sometime later.  It kills the MD_SCHED_REORDER code in the i386.c,
that is really good thing!

Honza
> 
> With the DFA model this is handled trivially.
> 
> First, we model two cpu units.  P0 and P1.
> 
> Second, instructions which use P0 are marked as reserving P0.
> 
> Third, instructions which may use P0 or P1 are marked as reserving P0 or P1.
> 
> Viola!  We now have a more accurate model of the pipeline and as a result
> we know that I2 really shouldn't fire in the same cycle as I1 as it will
> over-subscribe the P0 unit.
> 
> When I say this is trivial to describe with the DFA scheduler I'm not kidding:
> 
> (define_insn_reservation <latency>
>   <attribute test to select the insn type for i1>
>   "p0")
> 
> (define_insn_reservation <latency>
>   <attribute test to select the insn type for I2>
>   "(p0|p1)+p0")
> 
> 
> Anyway, I'm getting excellent results with my DFA desription for PPro/P2/P3.
> As of right now there are only 3 cases where the DFA scheduler generates
> different code from the old scheduler.
> 
>   1. There's a slight difference in how we handle ASMs.  Basically the DFA
>      scheduler treats them as starting a new cycle, whereas the old scheduler
>      would actually issue them to "no-unit" without starting a new cycle.
> 
>      I believe the DFA's handling of this case is better, so I won't be
>      contributing my hack to haifa-sched to make the two schedulers handle
>      ASMs in an identical manner.
> 
>   2. There's a slight difference in how we handle USE/CLOBBER insns when
>      one or more insns have already been issued in the current cycle.
> 
>      Basically if the DFA appears to be deadlocked, then we we start a
>      new cycle -- even if the next instruction to fire consumes no 
>      resources.  I think the DFA behavior in this case is reasonable,
>      so like #1 above, I won't be contributing my hack to make the two
>      schedulers behave identically.
> 
>   3. There is a performance problem in building some of the tables in
>      genautomata/genattrtab that is preventing me from modeling 
>      fdiv/fsqrt like the old description.  Vlad is working on fixing
>      the performance issue with an algorithmic change in how the
>      particular table is built.
> 
> 
> jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]