Instruction scheduling is the reordering of instructions such that the available resources are used in the most efficient way possible. Resources in this context are the CPU function units and registers. Most instruction scheduling algorithms concentrate on optimizing the function unit usage, only taking register pressure into account when it may become a serious problem.
In GCC, the scheduler framework has the following major components:
- Function unit usage descriptions for each instruction
- A module to compute instruction dependencies
A list scheduler (called the Haifa scheduler because it was donated by IBM Haifa in the late '90s)
- Some scheduler drivers implementing different scheduling algorithms
The main GCC scheduler is a region scheduler implemented in [gccsource:sched-rgn.c]. It can do interblock scheduling within regions such as inner loops. This scheduling algorithm usually does not give the most efficient scheduling, so on targets where scheduling is really critical, for example IA-64, a form of trace scheduling is implemented in [gccsource:sched-ebb.c]. This trace scheduler does not produce optimal compensation code and may really bloat the code, so unless profile information is available, this scheduler actually only schedules the most likely paths through extended basic blocks. The region scheduler and the trace scheduler also use the same basic list scheduler implemented in [gccsource:haifa-sched.c].
Finally, there is a file that implements modulo scheduling, which is new in GCC 4.0 and not yet fully finished, see [gccsource:modulo-sched.c]. All three different scheduling algorithms share the dependency module, which is implemented in [gccsource:sched-deps.c].
The function unit descriptions are based on regular expressions to compute finite automata that can be used to efficiently describe the state of the function units and transitions between states. For more information, see the announcement on the GCC news page.
All these schedulers, but especially the region and trace scheduler, share one problem: They stem from an age when GCC somehow managed to work without a proper control flow graph, which is pretty much essential for interblock scheduling. In fact, the existing CFG infrastructure for GCC was derived from the Haifa scheduler. Anyway, the black magic involved in the scheduler to cope with interblock scheduling decisions is both difficult to maintain and suboptimal from a scheduling point of view.
What is needed is a set of new scheduler drivers that are CFG aware. This probably requires updating the Haifa list scheduler in a few places, but I would guess no major surgery is necessary there. For real trace scheduling we also need reverse automata, that is, automata that describe the transition from some state to possible previous states. Such reverse automata are required to optimally insert compensation code and to pad empty slots with NOPs at join points of traces.
On the other hand, it is not clear how much gain there would be from a new scheduler. Most cores nowadays have out-of-order execution, so for example GCC does not schedule at all for the Pentium 4 because it just does not make a difference at all. For most RISCy architectures we seem to do reasonably well, so improvements of many percentage-points are very unlikely. And IA-64, well, first of all pre-reload is not COND_EXEC aware, so the scheduler already loses there. To overcome some of this, the IA-64 backend even has its own scheduler. It's unlikely anyone would care enough about IA-64 to redo scheduling from scratch.