This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Register Pressure in Instruction Level Parallelism
- From: Michael Kruse <meinert at uni-paderborn dot de>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 28 Jun 2009 12:30:06 +0200
- Subject: Register Pressure in Instruction Level Parallelism
- Reply-to: reply at meinersbur dot de
Hello GCC developers,
I am going to write a Master's Thesis about instruction scheduling based
on the technique presented in [1]. This will also include a implementation.
The basic idea is to have an additional pass on the data dependency
graph before instruction scheduling and register allocation takes place.
It estimates the minimal (register sufficiency) and maximal number of
registers (register saturation) required by schedules based on that graph.
If the register sufficiency is higher than the physical number of
registers, spill code is added to the graph.
If the register saturation is higher than the physical number of
registers, artificial dependencies are added to the graph, such that the
instruction scheduler is forced to generate a schedule with less registers.
The aim is that the instruction scheduler can focus on the optimal
arrangement of instructions to exploit a maximal amount of instruction
level parallelism. [1] also suggests heuristics for all these problems.
According to the author, these are "nearly optimal" in practice. The
heuristics for estimating register sufficiency and saturation are both
optimistic, meaning that it might still be necessary to add more spill
code to the final code. Hence, this this technique is just an
optimization pass and doesn't replace existing register allocation or
instruction scheduling passes.
[1] also has a part about loop scheduling, but my thesis will be about
basic blocks only. See [2] for an presentation of this technique.
So, now my questions: How much do you think could this could improve
compiled code speed? Would the current GCC/YARA benefit from such an
optimization pass at all? What are the chances that this could get into
the main GCC tree if it shows up to be an improvement?
There has been a short discussion on this mailing list already [3] some
years ago (Note if you read this: intLP has been used to compare the
heuristic to the optimal result only). To my knowledge, there hasn't
been any more on this topic yet (anywhere).
I'd prefer to implement this for the gcc, but my advisor wants me to do
it for the university's own compiler. Therefore I could also need
arguments why to do it for the GCC.
Regards,
Michael Kruse
[1] http://www.prism.uvsq.fr/~touati/thesis.html
[2] http://tel.archives-ouvertes.fr/docs/00/04/72/88/ANNEX/tel-00007405.ppt
[3] http://gcc.gnu.org/ml/gcc/2002-07/msg00565.html
--
Tardyzentrismus verboten!
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature