This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Register Pressure in Instruction Level Parallelism


Hello GCC developers,

I am going to write a Master's Thesis about instruction scheduling based on the technique presented in [1]. This will also include a implementation.

The basic idea is to have an additional pass on the data dependency graph before instruction scheduling and register allocation takes place. It estimates the minimal (register sufficiency) and maximal number of registers (register saturation) required by schedules based on that graph.

If the register sufficiency is higher than the physical number of registers, spill code is added to the graph.

If the register saturation is higher than the physical number of registers, artificial dependencies are added to the graph, such that the instruction scheduler is forced to generate a schedule with less registers.

The aim is that the instruction scheduler can focus on the optimal arrangement of instructions to exploit a maximal amount of instruction level parallelism. [1] also suggests heuristics for all these problems. According to the author, these are "nearly optimal" in practice. The heuristics for estimating register sufficiency and saturation are both optimistic, meaning that it might still be necessary to add more spill code to the final code. Hence, this this technique is just an optimization pass and doesn't replace existing register allocation or instruction scheduling passes.

[1] also has a part about loop scheduling, but my thesis will be about basic blocks only. See [2] for an presentation of this technique.

So, now my questions: How much do you think could this could improve compiled code speed? Would the current GCC/YARA benefit from such an optimization pass at all? What are the chances that this could get into the main GCC tree if it shows up to be an improvement?

There has been a short discussion on this mailing list already [3] some years ago (Note if you read this: intLP has been used to compare the heuristic to the optimal result only). To my knowledge, there hasn't been any more on this topic yet (anywhere).

I'd prefer to implement this for the gcc, but my advisor wants me to do it for the university's own compiler. Therefore I could also need arguments why to do it for the GCC.

Regards,
Michael Kruse

[1] http://www.prism.uvsq.fr/~touati/thesis.html
[2] http://tel.archives-ouvertes.fr/docs/00/04/72/88/ANNEX/tel-00007405.ppt
[3] http://gcc.gnu.org/ml/gcc/2002-07/msg00565.html

--
Tardyzentrismus verboten!

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]