This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix PR 30735 and 31090: Change static memory partition heuristic
On Wed, Apr 11, 2007 at 11:23:21PM +0200, Richard Guenther wrote:
> On 4/11/07, Diego Novillo <dnovillo@redhat.com> wrote:
> >
> >This patch improves the static memory partitioning heuristic so that
> >symbols that are used in hot paths of the code are less likely to be
> >grouped inside a partition.
> >
> >The heuristic will now count the total number of memory references in
> >the function. With that, it will estimate the number of virtual
> >operators needed for stores and loads and compare them against two
> >thresholds:
> >
> >- A maximum number of virtual operators allowed for the whole function
> >(max-aliased-vops)
> >
> >- An average number of virtual operators allowed per statement
> >(avg-aliased-vops).
> >
> >If both values are below the threshold, nothing is done. Otherwise, the
> >heuristic in compute_memory_partitions triggers and symbols are added to
> >a work list. Each element in the list is given a partitioning score
> >(pscore) which is a weighted with the following formula:
> >
> >frequency_writes * 64 -> Aggregate frequency of all stores
> >+ frequency_reads * 32 -> Aggregate frequency of all reads
> >+ num_direct_writes * 16
> >+ num_direct_reads * 8
> >+ num_indirect_writes * 4
> >+ num_indirect_reads * 2
> >+ noalias_state -> Given by -fargument-noalias-*
> >
> >For a given symbol V, the higher this score, the less likely that V will
> >be added to a partition.
> >
> >This makes the partitioning better, particularly in small functions
> >(where we just don't care about how many virtual operators are needed)
> >and allows a much smoother control over the partitioning behaviour.
> >
> >I have also preset 3 different values for max-aliased-vops and
> >avg-aliased-vops. One for each of -O1, -O2 and -O3. At -O1 the idea is
> >to make compilation time very quick. The current values give us about
> >1-2% memory savings at -O1 and a 3-5% compile-time savings.
> >
> >For -O2, the compile-time and memory utilization is roughly the same
> >(though in some cases, you'll notice an increase, so I may have to
> >adjust further).
> >
> >At -O3, the settings should be such that we very rarely partition.
> >
> >In terms of runtime, I have not noticed slowdowns in SPEC2000 nor
> >Polyhedron. I did notice a 6-7% performance improvement on tramp3d at
> >-O2. However, being a static heuristic, I presume we'll go over this a
> >few more rounds.
> >
> >Bootstrapped and tested x86_64, ia64, ppc64 and i686.
>
I got SPEC CPU 2006 numbers on x86-64. I tried
1. Revision 123724
2. Revision 123718 +
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg02335.html
I found
#2 vs. #1
410.bwaves 2.67857%
416.gamess -1.14286%
433.milc 0%
434.zeusmp -0.653595%
435.gromacs 0.107527%
436.cactusADM 0.943396%
437.leslie3d 15.8009%
444.namd -0.671141%
447.dealII -3.89105%
450.soplex 0%
453.povray 0%
454.calculix 6.57216%
459.GemsFDTD -6%
465.tonto 0%
470.lbm -0.699301%
481.wrf 1.92308%
482.sphinx3 0%
SPECfp_base2006 0.75188%
and there is no difference in INT numbers. That shows the simple
change not to put function local variables into a memory partion
is quite effective on some SPEC CPU 2006 FP benchmarks. Unfortunately,
Richard's simple patch no longer applies now after Diego's change is
checked in. Richard, do you have a newer patch I can try?
Thanks.
H.J.