This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Reservation unit as FIFO


> Jan Hubicka wrote:
> > 
> > Hi,
> > I am just running over testcases where scheduling seems to be important
> > for Athlon as it is simply getting to the limits of decoding.
> > My problem is that I have loop (matrix multiplication, so common case)
> > that can run at speed of almost 3 instructions per cycle.  Athlon can
> > decode these at speed of 3 per cycle with exception of loads that
> > consume 2 decoders.
> > 
> > So the problem is that Athlon when runs at full speed keeps reorder
> > buffers empty and requires GCC to do all the job.
> > GCC works fairly poorly as it's scheduling gets limited by the fact that
> > it believes that two loads can't be started at same cycle.  They can,
> > only can't be decoded.
> > 
> > I think I can get accorss this shortcomming by simulating simple FIFO
> > between decoding and executing.  Since I am limited by fact that I can't
> > vary instruction latencies and I can't make execution unit dependent on
> > the decoder so I can keep them in spearate automatas, I think I can
> > model decoder as 4 stage pipelined unit with allocations like this:
> > 
> > (decode, nothing, nothing, nothing)
> > | (nothing, decode, nothing, nothing)
> > | (nothing, nothign, decode, nothing)
> > | (nothing, nothign, nothing, decode)
> > where decode stands for (decode0 | decode1 | decode2)
> > 
> > And define issue rate to 6 (this is limit of micro operations issued per
> > cycle).  I expect that this will make scheduler to understand that
> > decoding can be done in advance to the actual execution.
> > 
> > Of course the decoders will need to be slightly more complicated to
> > allow vector decoding to happen and some presence set thus will be
> > needed to explain that cycles are allocated in order.  I hope the
> > automata to be small then as the state can be encoded by integer 0-4*3
> > and I have only 3 types of instructions.
> > 
> > What I need is to make the decoder occupied at the start of block, so we
> > start with something like
> > (decodeall, decodeall, decodeall, nothing)
> > so GCC won't believe that many instructions can be started in first
> > cycle.
> > 
> > I suppose this to be easy to add (if not existing already).  What do you
> > think about this?
> > To me this look like funny enought idea to try.  Especially I would be
> > curious what multipass scheduling will do then.
> 
>   Yes, it is interesting.  In general, to model FIFO we need more
> powerful model than DFA and regular expressions.  But regular expression
> could be used in some special cases.  I see the following analogy.  In
> general CF language can be described by CFG but not regular grammar.  
> But we could still use regular grammar to describe any CF language if we
> constrain the nesting level of recursive constructions.  Long ago I made
> an experiment to describe typical expressions by lex with the maximal
> nesting level of parentheses equal to 10.  I don't remember the result

My FIFO is even limited in architecture.  It has many stages, but
modeling few should be enought I believe.

> but I remember impression that it will not work because of the automata
> size.  That time what I had was PDP-11.  Probably the automaton size for
> your case will be not big.
I believe I can store the state by remembering amount of allocated
entries in FIFO, so automaton should simplify, however I am unsure how
big will be before minimizing.
> 
>   I think it is important to use right latency time too.  Actually, we
> could modify latency time by querying units in automaton state. 
> Unfortunately, the current interface implements querying units only for
> the current cycle.  But we could implement querying units for the

Fortunately I think I can avoid need for this.  If I model the time of
issuing the insn to be the time when it leaves FIFO minus some fixed
number of cycles.  This will result in same schedule because FIFO is
FIFO and I can model the actual state of FIFO by hidding it inside the
automaton.

I think if I go other way around (time 0 is time when the instruction is
put into FIFO), I will run into problem by not being able to split FIFO
automata fromt he others.

> subsequent cycles too.  Another thing should be implemented is
> possibility to modify latency times and insn priorities during insn
> scheduling.  Currently the scheduler has no such infrastructure.
> 
>   As for the initial state.  You could add a hook to the scheduler at
> the beginning of schedule_block or modify md_init to change the state to
> the necessary one (e.g. by issuing a pseudo instruction).
OK, I will give it a try.
I see I can simply define some named instruction and use her resources,
but perhaps some cleaner syntax would be nice.  Something like
(initial_relservation "reservation")
however I am not that familiar on how this can be implemented...

Honza
> 
>   In any case, it would be interesting to make such experiment and look
> at the result.  Good luck.
> 
> Vlad


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]