This is the mail archive of the
mailing list for the GCC project.
The new scheduler and x86 CPUs
- To: gcc at gcc dot gnu dot org, vmakarov at tooth dot toronto dot redhat dot com
- Subject: The new scheduler and x86 CPUs
- From: Jan Hubicka <jh at suse dot cz>
- Date: Tue, 28 Aug 2001 17:12:18 +0200
I am looking at your work and trying to figure out what benefits it can bring
to convert the current i386.md descriptions to the new syntax. First of all I
need to thank you for such a huge effort for implementing all this. At the
moment I don't understand much the automaton implementation, so my questions
are probably somewhat naive. Hope it will change soon :)
The i386 CPUs differ from RISC/VLIW your patch is targeted for by reorder buffer.
It would be nice to be able to model this explicitly. I think it can be possible
to do by placing "variable sized" repetetions to your syntax. So for instance
typical operation can look like:
"decode, none*x, execute, none*x, retire"
I am not aware with the automaton design, but is there any chance to make such
a think possible? Is there some alternative?
Another problem of i386.md is huge amount of variants of various instructions.
The instruction may or may not load memory, do execution and write memory so it
can be wonderfull to make possible to write (define_reservation "name" "str"
condition), where condtion will need to match in order this particular
reservation to be used.
For instance then I can generate "operand_fetch" like
(define_reservation "operand_fetch" "address*2, load*2" (eq_attr "memory" "load,both")
(define_reservation "operand_fetch" "" (eq_attr "memory" "store, none)
Then when the define_unit is used, actually all variants are generated.
This can hide the complexity at least in the .md file.
Last problem are the latencies. On Athlon, the usual operation has latency of 1,
the load latency of 3. In order for instruction to start executing, the
operands don't need to be ready. It can be decoded anytime and then for
instance memory loads can start as soon as address is ready, so the other
argument needed for execution can still be in computation.
So in order to define this, I probably need to have one define_insn_reservation
for each possible latency (in case there is no load/nor store, there is load
and there is load and store) and I also need the define_bypass for each
instruction to avoid the load latency, or is there better way to define?