This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Code size improvement for global alloc


On Tue, Jul 06, 2004 at 12:55:44PM -0400, Vladimir Makarov wrote:
> Daniel Jacobowitz wrote:
> 
> >Testing: I tested this patch with bootstrap / make check (all languages
> >except Ada) on x86_64-pc-linux-gnu.  I also tested an equivalent patch for
> >arm-none-elf using csl-arm-branch.  There were no regressions in either
> >case.  Bootstrap got 5.5 seconds faster out of 81 minutes, which is in the
> >noise.  On x86_64 there was a size improvement of 269 bytes out 590K in 
> >some
> >random files from cc1-i-files; on ARM there was a size improvement of about
> >0.05% in CSiBE.
> >
> >OK?  Comments?
> >
> > 
> >
> I think that at least you need to introduce a new flag for this.  You 
> can not make it by default until you prove that there is a performance 
> improvement on a credible benchmark (better for SPEC95 or SPEC2000) on 
> major platforms (x86, x86-64, ppc).  Although the optimization removes 
> move insns, the final result might be worse after the reload because the 
> reload might expel the coalesced pseudo-registers from a hard register 
> and only one pseudo-register when coalescing did not happen.

I ran SPECINT 2000 on x86 (using x86-64 hardware but -m32); the patch
was a 0.7% improvement.  There was a lot of noise; I don't think the
results are significant.

##############################################################################
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
#
# 'reportable' flag not set during run
# Error 252.eon: Output miscompare
# Error 252.eon: Output miscompare
# Error 252.eon: Output miscompare
#
#   INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN  #
##############################################################################

Base:
   164.gzip          1400     208         673*
   175.vpr           1400     175         802*
   176.gcc           1100     105        1048*
   181.mcf           1800     242         743*
   186.crafty        1000      93.8      1067*
   197.parser        1800     210         858*
   252.eon                                   X
   253.perlbmk       1800     167        1080*
   254.gap           1100     115         960*
   255.vortex        1900     180        1058*
   256.bzip2         1500     178         841*
   300.twolf         3000     279        1077*
   Est. SPECint_base2000                  916

Patched:
   164.gzip          1400     209         670*
   175.vpr           1400     173         810*
   176.gcc           1100     105        1048*
   181.mcf           1800     243         739*
   186.crafty        1000      92.3      1083*
   197.parser        1800     211         853*
   252.eon                                   X
   253.perlbmk       1800     168        1074*
   254.gap           1100     120         920*
   255.vortex        1900     191         996*
   256.bzip2         1500     179         838*
   300.twolf         3000     278        1080*
   Est. SPECint_base2000                  908

I can set up SPECINT/SPECFP on more platforms if necessary.

> Also I have a code for more common form of coalescing in global (please 
> see gcc summit proceeding) which coalesces all registers not only global 
> and local ones (as in your patch) and trying the two hard registers (not 
> only one of the global).  It even coalesces two registers one of which 
> or the both ones got memory if it is profitable.  It also coalesces 
> pseudo-register according to the frequencies of the move insns to get a 
> better results.

This sounds great.  I look forward to it.  However, it sounds like it
won't be Stage 2 material, so I would like to have this patch reviewed
in the mean time.

> Besides features mentioned above I see the following possible 
> improvements in your patch:
> 
>  1. trying hard registers from alternative register class too.
>  2. trying call used hard registers if it is profitable.

Yes, definitely.  I took a conservative approach on both of these,
because I was not familiar with the details; the port I did the
primary development on does not have substantial alternative register
classes, and I don't have any idea how to judge profitability for call
used registers.

> Still my patch (and yours) is a constrained form of coalescing because 
> it is done after the register allocation (assigning).  More common form 
> of coalescing would require an iterative approach to the register 
> allocation (and implementing register live range spilling to undo 
> coalescing if it is necessary). It is one more way to improve the 
> register allocation.

Yes; I spent a little time investigating and decided that I could get
most of the improvement I needed in a cleanup, but it certainly isn't
optimal.

> I am going to submit my patch in a few weeks.  It would be interesting 
> to compare your patch with mine.  I will probably do that.

Thanks.

-- 
Daniel Jacobowitz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]