This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Reservation unit as FIFO
- From: Jan Hubicka <jh at suse dot cz>
- To: vmakarov at redhat dot com, gcc at gcc dot gnu dot org
- Date: Thu, 28 Nov 2002 01:05:39 +0100
- Subject: Reservation unit as FIFO
Hi,
I am just running over testcases where scheduling seems to be important
for Athlon as it is simply getting to the limits of decoding.
My problem is that I have loop (matrix multiplication, so common case)
that can run at speed of almost 3 instructions per cycle. Athlon can
decode these at speed of 3 per cycle with exception of loads that
consume 2 decoders.
So the problem is that Athlon when runs at full speed keeps reorder
buffers empty and requires GCC to do all the job.
GCC works fairly poorly as it's scheduling gets limited by the fact that
it believes that two loads can't be started at same cycle. They can,
only can't be decoded.
I think I can get accorss this shortcomming by simulating simple FIFO
between decoding and executing. Since I am limited by fact that I can't
vary instruction latencies and I can't make execution unit dependent on
the decoder so I can keep them in spearate automatas, I think I can
model decoder as 4 stage pipelined unit with allocations like this:
(decode, nothing, nothing, nothing)
| (nothing, decode, nothing, nothing)
| (nothing, nothign, decode, nothing)
| (nothing, nothign, nothing, decode)
where decode stands for (decode0 | decode1 | decode2)
And define issue rate to 6 (this is limit of micro operations issued per
cycle). I expect that this will make scheduler to understand that
decoding can be done in advance to the actual execution.
Of course the decoders will need to be slightly more complicated to
allow vector decoding to happen and some presence set thus will be
needed to explain that cycles are allocated in order. I hope the
automata to be small then as the state can be encoded by integer 0-4*3
and I have only 3 types of instructions.
What I need is to make the decoder occupied at the start of block, so we
start with something like
(decodeall, decodeall, decodeall, nothing)
so GCC won't believe that many instructions can be started in first
cycle.
I suppose this to be easy to add (if not existing already). What do you
think about this?
To me this look like funny enought idea to try. Especially I would be
curious what multipass scheduling will do then.
Honza