This is the mail archive of the
mailing list for the GCC project.
Scheduling x86 dispatch windows
- From: reza yazdani <yazdani_reza at yahoo dot com>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 10 Jun 2010 10:20:57 -0700 (PDT)
- Subject: Scheduling x86 dispatch windows
We are in the process of adding a feature to GCC to take advantage of a new hardware feature in the latest AMD micro processor. This feature requires a certain mix, ordering and alignments in instruction sequences to obtain the expected hardware performance.
I am asking the community to review this high level implementation design and give me direction or advice.
The new hardware issues two windows of the size N bytes of instructions in every cycle. It goes into accelerate mode if the windows have the right combination of instructions or alignments. Our goal is to maximize the IPC by proper instruction scheduling and alignments.
Here is a summary of the most important requirements:
a) Maximum of N instructions per window.
b) An instruction may cross the first window.
c) Each window can have maximum of x memory loads and y memory stores .
d) The total number of immediate constants in the instructions of a window should not exceed k.
e) The first window must be aligned on 16 byte boundary.
f) A Window set terminates when a branch exists in a window.
g) The number of allowed prefixes varies for instructions.
h) A window set needs to be padded by prefixes in instructions or terminated by nops to ensure adherence to the rules.
We have the following implementation plan for GCC:
1) Modify the Haifa scheduler to make the desired arrangement of instructions for the two dispatch windows. The scheduler is called once before and once after register allocation as usual. In both cases it performs dispatch scheduling along with its normal job of instruction scheduling.
The advantage of doing it before register allocation is avoiding extra dependencies caused by register allocation which may become an obstacle to movement of instructions.? The advantage of doing it after register allocation is a consideration for spilling code which may be generated by the register allocator.
The algorithm we use is:
a) Considering the current dispatch window set, choose the first instruction from ready queue that does not violate dispatch rules.
b) When an instruction is selected and scheduled, inform the dispatcher code about the instruction. This step keeps track of the instruction content of windows for future evaluation. It also manages the window set by closing and opening new virtual dispatch windows.
2) Insertion of alignment code.
In x86 alignment is done by inserting prefixes or by generating nops. As the object code is generated by the assembler in GCC, some information such as sizes of branches are unknown until assembly or link time. To do alignments related to dispatch correctly in GCC, we need to iteratively compute prefixes and branch sizes until its convergence. This pass currently does not exist in GCC, but it exists in the assembler.
There are two possible approaches to solve alignment problem.
a) ?Let the assembler performs the alignments and padding needed to adhere with the new machine dispatching rules and avoid an extra pass in GCC.
b) ?Add a new pass to mimic what assembler does before generating the assembly listing in GCC and insert the required alignments.
I appreciate your comments on the proposed implementation procedure and the choices a or b above.