This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Register Pressure in Instruction Level Parallelism

From: Vladimir Makarov <vmakarov at redhat dot com>
To: reply at meinersbur dot de
Cc: gcc at gcc dot gnu dot org
Date: Mon, 29 Jun 2009 12:25:43 -0400
Subject: Re: Register Pressure in Instruction Level Parallelism
References: <4A47462E.1080402@uni-paderborn.de>

Michael Kruse wrote:

Hello GCC developers,

I am going to write a Master's Thesis about instruction scheduling based on the technique presented in [1]. This will also include a implementation.

The basic idea is to have an additional pass on the data dependency graph before instruction scheduling and register allocation takes place. It estimates the minimal (register sufficiency) and maximal number of registers (register saturation) required by schedules based on that graph.

If the register sufficiency is higher than the physical number of registers, spill code is added to the graph.

For me, that is the most interesting part, unfortunately Touti (as I remember) in his thesis say practically nothing about this.

If the register saturation is higher than the physical number of registers, artificial dependencies are added to the graph, such that the instruction scheduler is forced to generate a schedule with less registers.

The aim is that the instruction scheduler can focus on the optimal arrangement of instructions to exploit a maximal amount of instruction level parallelism. [1] also suggests heuristics for all these problems. According to the author, these are "nearly optimal" in practice. The heuristics for estimating register sufficiency and saturation are both optimistic, meaning that it might still be necessary to add more spill code to the final code. Hence, this this technique is just an optimization pass and doesn't replace existing register allocation or instruction scheduling passes.

[1] also has a part about loop scheduling, but my thesis will be about basic blocks only. See [2] for an presentation of this technique.

So, now my questions: How much do you think could this could improve compiled code speed? Would the current GCC/YARA benefit from such an optimization pass at all?

I think nobody can answer the questions until it is implemented.

I am working on register-pressure sensitive insn scheduling too for a few last months. I started with simple heuristic described in Muchnic book: when the current register pressure is high, choose insn decreasing the pressure and when the pressure is low, use the usual scheduling heuristic. This did not work at all (increasing code size in about 3-5% for x86). The problem was in that even the current register pressure is low at given point, issuing insn at this point could still increase register pressure in later points. So I am working now on evaluation register pressure in later points too (some form of look ahead). This approach is similar what Touati proposed (but without actually adding additional DD arcs).

If you are going to work on this project, some small advice about evaluating register sufficiency. I found that register pressure is already practically minimized before insn-scheduling (I suspect that it is mostly done in TER). By the way, it also means that tree balancing (which is actually much more complicated issue because it is not tree but DAG) is not necessary for the backend as it was done in Preston Briggs project (and mentioned in proposed Ken Zadeck's pass stack).

What are the chances that this could get into the main GCC tree if it shows up to be an improvement?

I don't see any problem to get the code into main GCC tree if you get even 1-2% improvement. Although there are some technical questions (like code fitting into gcc practice standards) and commitment to maintain the code. But this problems could be definitely overcome.

There has been a short discussion on this mailing list already [3] some years ago (Note if you read this: intLP has been used to compare the heuristic to the optimal result only). To my knowledge, there hasn't been any more on this topic yet (anywhere).

I'd prefer to implement this for the gcc, but my advisor wants me to do it for the university's own compiler. Therefore I could also need arguments why to do it for the GCC.

Personally, I'd love to see this work done in GCC. I believe the research work in compiler field should be done in real industrial compilers because that is a final target of the research. I saw many times that researchers report big improvements of their algorithms because they are using toy compilers but when the same algorithms are implemented in GCC they give practically nothing. For me a toy compiler criteria is defined how good they are on some generally used compiler benchmarks as SPEC (the better existing compiler performance, the harder to get the same percent improvement). So if your university compiler performance is close for example to SPECFP2000, I think you could use it to get a real optimization impact.

On the other hand, you definitely will get better numbers (and spending less time) using a toy compiler than GCC and your research probably could look more successful with the academic point of view.

[1] http://www.prism.uvsq.fr/~touati/thesis.html [2] http://tel.archives-ouvertes.fr/docs/00/04/72/88/ANNEX/tel-00007405.ppt [3] http://gcc.gnu.org/ml/gcc/2002-07/msg00565.html

Follow-Ups:
- Re: Register Pressure in Instruction Level Parallelism
  - From: Michael Kruse

References:
- Register Pressure in Instruction Level Parallelism
  - From: Michael Kruse

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]