This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

DFA scheduler for i386.md


Hi,
Yesterday I started work on rewriting the scheduler descriptions in i386.md to
DFA scheduler in hope that new code will be smaller and easier to understand.
So far I am about 70% done with pentium description and it apperas to work well
and be much cleaner than older code.  I hope to fix all the MD_SCHED macros
usage soonish so we will be able to kill it later.  Also the pentium description
is far the larges from i386 ones that is about the larges in GCC, so I want
to give DFA some testing.

I am really impressed.  It appears to work fluently and even you added
define_reservation feature I was asking for some time ago.  The resulting
automaton apperas to be still smaller than the original one.

I've run into few small problems I would like to mention:

1) It took me quite a while to realize that I need to define
TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE :).  Perhaps it should be mentioned
in the md.texi where rest of interface is documented.

2) define_insn_reservation description mentions result_name in the grammar,
but I didn't find any description what it means.

3) Pentium has AGI stalls.  In fact the address generation unit is run
one cycle before the rest of operands are needed, so when the registers
used in addresses are not ready one cycle earlier, instruction stalls.

I think define_bypass is the proper way to describe it.  I would like to use
the existing agi_dependent function as guard, but if I understand it properly,
I need to define_bypass for each instruction name, list all the other
instruction names in out_instruction_names and specify one cycle longer latency
than the default one.

Since I do have already 12 types and I guess I will over with about 20, this
is impractical, would be possible to leave the out_instruction_names string
just empty and rely on the agi_dependent completely?

4) Even for the pentium, the exact instruction latency depends on the
reservation it gets.  This is more important for later CPUs, where the on-chip
scheduler may delay instruction execution quite few cycles.  I can express the
on-chip scheduler for instruction unit reservation, but if I understand it
properly, scheduler will think that the results are ready after latency given
by default_latency value.

Is there any way to express this?  (In pentium case, when load-execute-store
is executed together with load-execute instruction, the latency of both
increases to 4 cycles instead of original 3 and 2).

Is there any way to describe that particular operands are needed later in
execution of the instruction?  This is common for i386-ish CPUs.

5) Is there any way to dump the reservation of units at each cycle, like
old scheduler did, or at least when each particular instruction whas scheduled?
I am getting somewhat confused by the new format of dumps.

Thanks for all the work!
Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]