This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A Question About LRA/reload

On 12/09/2014 04:37 AM, lin zuojian wrote:
> Hi,
>     I have read ira/lra code for a while, but still fails to understand
>     their relationship. The main question is why ira do color so early?
>     lra pass will do the assignment anyway. Sorry if I mess up coloring
>     and hard register assignment, but I think it's better to get job
>     done after lra elimiation, inheriation, ...
  There are two major approaches in RA implementation.  One is division
of it on global and local passes and another one is to use iterative
approach with the same coloring algorithm as it is described in most
books and research articles.  Historically GCC used separate passes
approach (even more complicated regmove, local RA, then global RA, and
then reload).  Some industrial compilers to which I had access to the
sources use this approach too (e.g. pathscale compiler uses global and
local RA).

  I believe changing RA in GCC is very challenging task and only
evolutionary approach could work.   Changing old global/local/regmove by
IRA took 4-5 years.  Changing reload by LRA took 3 years and is still in
progress. This is even more challenging task than IRA.  That is why we
have what we have: IRA (global RA) and LRA (local RA).  The division
also solves compilation time problem: one expensive global RA IRA (which
builds conflict graph and dealing with irregular register file
architectures by dynamically created register classes which is further
development of Chaitin-Briggs coloring algorithm) and LRA which uses
simplified coloring without building expensive conflict graph but making
it several times (in average about 2-4 coloring passes for function). 
On my estimation, using the same algorithm as in IRA iteratively would
add 5-10% more to all GCC compilation time.  It would be hard to
convince people to such approach as GCC compilation speed is a sensitive
area (especially when we have a strong competitor of GCC as LLVM).  The
articles about RA never say about RA cycling problem (although Peter
Bergner who implemented an extension of Chaitin-Brigs allocator
mentioned this to me).  This problem happens in LRA even with simpler
coloring algorithm.  It would be worse if we used more complicated one
from IRA.
  Saying all that, I would also add that using iteratively the same
algorithm is appealing idea to me too.  I try to use this approach in
YARA project (predecessor of IRA) in which I tried to remove all old GCC
RA (including reload) at once but it was too slow and did not even
generate the correct code in many cases even for x86.  Jeff Law tried
IRA coloring reusage too for reload but whole RA became slower (although
he achieved performance improvements on x86).  As I know, Preston Briggs
(and his small team) from google tried to implement his iterative
coloring algorithm only for x86/x86-64 but performance results were not
better than IRA+reload that time.

  So many things in IRA/LRA have historical roots.  That is what we
worked out.  Probably some approaches could work and used too if the
same performance and compiler speed is achieved for major architectures
and they generates correct code for all other architectures (supporting
all of them by RA is complicated task by itself).  Personally, I like
classical iterative Chaitin-Briggs allocator approach but I failed to
implement this and probably will avoid to returning to its
implementation but I encourage anyone to try other approaches and help
to include them into GCC if the results will be promising.

  I hope I answer your question.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]