This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFC: Code size improvement for global alloc
- From: Daniel Jacobowitz <drow at false dot org>
- To: Vladimir Makarov <vmakarov at redhat dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Richard Earnshaw <rearnsha at gcc dot gnu dot org>,Mark Mitchell <mark at codesourcery dot com>
- Date: Thu, 8 Jul 2004 14:13:12 -0400
- Subject: Re: RFC: Code size improvement for global alloc
- References: <20040706150716.GA10639@nevyn.them.org> <40EAD990.8020202@redhat.com>
On Tue, Jul 06, 2004 at 12:55:44PM -0400, Vladimir Makarov wrote:
> Daniel Jacobowitz wrote:
>
> >Testing: I tested this patch with bootstrap / make check (all languages
> >except Ada) on x86_64-pc-linux-gnu. I also tested an equivalent patch for
> >arm-none-elf using csl-arm-branch. There were no regressions in either
> >case. Bootstrap got 5.5 seconds faster out of 81 minutes, which is in the
> >noise. On x86_64 there was a size improvement of 269 bytes out 590K in
> >some
> >random files from cc1-i-files; on ARM there was a size improvement of about
> >0.05% in CSiBE.
> >
> >OK? Comments?
> >
> >
> >
> I think that at least you need to introduce a new flag for this. You
> can not make it by default until you prove that there is a performance
> improvement on a credible benchmark (better for SPEC95 or SPEC2000) on
> major platforms (x86, x86-64, ppc). Although the optimization removes
> move insns, the final result might be worse after the reload because the
> reload might expel the coalesced pseudo-registers from a hard register
> and only one pseudo-register when coalescing did not happen.
I ran SPECINT 2000 on x86 (using x86-64 hardware but -m32); the patch
was a 0.7% improvement. There was a lot of noise; I don't think the
results are significant.
##############################################################################
# INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN #
#
# 'reportable' flag not set during run
# Error 252.eon: Output miscompare
# Error 252.eon: Output miscompare
# Error 252.eon: Output miscompare
#
# INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN INVALID RUN #
##############################################################################
Base:
164.gzip 1400 208 673*
175.vpr 1400 175 802*
176.gcc 1100 105 1048*
181.mcf 1800 242 743*
186.crafty 1000 93.8 1067*
197.parser 1800 210 858*
252.eon X
253.perlbmk 1800 167 1080*
254.gap 1100 115 960*
255.vortex 1900 180 1058*
256.bzip2 1500 178 841*
300.twolf 3000 279 1077*
Est. SPECint_base2000 916
Patched:
164.gzip 1400 209 670*
175.vpr 1400 173 810*
176.gcc 1100 105 1048*
181.mcf 1800 243 739*
186.crafty 1000 92.3 1083*
197.parser 1800 211 853*
252.eon X
253.perlbmk 1800 168 1074*
254.gap 1100 120 920*
255.vortex 1900 191 996*
256.bzip2 1500 179 838*
300.twolf 3000 278 1080*
Est. SPECint_base2000 908
I can set up SPECINT/SPECFP on more platforms if necessary.
> Also I have a code for more common form of coalescing in global (please
> see gcc summit proceeding) which coalesces all registers not only global
> and local ones (as in your patch) and trying the two hard registers (not
> only one of the global). It even coalesces two registers one of which
> or the both ones got memory if it is profitable. It also coalesces
> pseudo-register according to the frequencies of the move insns to get a
> better results.
This sounds great. I look forward to it. However, it sounds like it
won't be Stage 2 material, so I would like to have this patch reviewed
in the mean time.
> Besides features mentioned above I see the following possible
> improvements in your patch:
>
> 1. trying hard registers from alternative register class too.
> 2. trying call used hard registers if it is profitable.
Yes, definitely. I took a conservative approach on both of these,
because I was not familiar with the details; the port I did the
primary development on does not have substantial alternative register
classes, and I don't have any idea how to judge profitability for call
used registers.
> Still my patch (and yours) is a constrained form of coalescing because
> it is done after the register allocation (assigning). More common form
> of coalescing would require an iterative approach to the register
> allocation (and implementing register live range spilling to undo
> coalescing if it is necessary). It is one more way to improve the
> register allocation.
Yes; I spent a little time investigating and decided that I could get
most of the improvement I needed in a cleanup, but it certainly isn't
optimal.
> I am going to submit my patch in a few weeks. It would be interesting
> to compare your patch with mine. I will probably do that.
Thanks.
--
Daniel Jacobowitz