This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFA] The Integrated Register Allocator

From: Vladimir Makarov <vmakarov at redhat dot com>
To: Pat Haugen <pthaugen at us dot ibm dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 08 Apr 2008 18:41:51 -0400
Subject: Re: [RFA] The Integrated Register Allocator
References: <OFFDD3FA2A.57E1AEBE-ON86257425.000F874A-86257425.0011A2FF@us.ibm.com>

Pat Haugen wrote:

Vladimir Makarov <vmakarov@redhat.com> wrote on 04/04/2008 02:20:15 PM:

I tried your patch on ppc64 (a Power6 box). For the most part results were close, but 187.facerec degraded about 24% with -fira. It looks like

the

hot procedure, graphRoutines.f90:LocalMove(), ended up with a lot more
spill code.

Thanks for testing.

I know about the problem, it was occurred as the result of latest
patches.  IRA by mistake assigns FLOAT_REGS instead of GENERAL_REGS
for some pseudos.  The wrong code results in code size degradation too
on ppc64 and itanium.

I have the patch for fixing this. I'll commit it to the branch today.

I've just committed a patch fixing the problem.  Sorry for
inconvenience.  I've should done it earlier.


I tried the latest ira branch which includes your fix, but am still seeing
the degradation, still due to extra spill in the mentioned procedure.

Following are static instruction counts from the annotated oprofile report
for LocalMove(), both stack references and LR/CTR moves (which can also be
used for spill).  Granted not all these references are for spill, but gives
a general feel of increase.

Old/existing RA:

run/00000014> grep "(r1)" oprof.localmove.annot | wc -l
393
run/00000014> grep mflr oprof.localmove.annot | wc -l
1
run/00000014> grep mfctr oprof.localmove.annot | wc -l
0
run/00000014> grep mtlr oprof.localmove.annot | wc -l
1
run/00000014> grep mtctr oprof.localmove.annot | wc -l
2

-fira:

run/00000016> grep "(r1)" oprof.localmove.annot | wc -l
686
run/00000016> grep mflr oprof.localmove.annot | wc -l
69
run/00000016> grep mfctr oprof.localmove.annot | wc -l
61
run/00000016> grep mtlr oprof.localmove.annot | wc -l
51
run/00000016> grep mtctr oprof.localmove.annot | wc -l
46

That is static numbers. I've checked facerec again on G5 machine and got

187.facerec 1900 234 812* 1900 234 811 187.facerec 1900 234 811 1900 234 813* 187.facerec 1900 231 823 1900 224 849

The base is for -O2 -mtune=G5, the peak is for -O2 -fira -mtune=G5. I do remember that before the patch facerec was really much worse for IRA.

Indeed, IRA spills more pseudos for localmove (I see the same problem even if non-regional RA is used with -fira-algorithm=CB) and therefore generates a bigger code. I've analyzed and did not find something wrong with the coloring algorithm (on this program Chaitin-Briggs algorithm just works really different than Chow's one used in the old allocator). I could modify choosing potentially spilled pseudos in Chaitin-Briggs algorithm taking number of pseudo-references into account. It could decrease static number of the spilled pseudos but I think it is a wrong idea.

Unfortunately, I have no available power6 machine. Are the spec numbers for facerec the same?

Follow-Ups:
- Re: [RFA] The Integrated Register Allocator
  - From: Pat Haugen
- Re: [RFA] The Integrated Register Allocator
  - From: Pat Haugen

References:
- Re: [RFA] The Integrated Register Allocator
  - From: Pat Haugen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]