This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: WHOPR partitioning algorithm

From: Jan Hubicka <hubicka at ucw dot cz>
To: Richard Guenther <rguenther at suse dot de>
Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org, dnovillo at google dot com
Date: Mon, 6 Sep 2010 14:52:06 +0200
Subject: Re: WHOPR partitioning algorithm
References: <20100903222954.GQ1664@kam.mff.cuni.cz> <alpine.LNX.2.00.1009041100490.28912@zhemvz.fhfr.qr> <20100904133049.GB31380@kam.mff.cuni.cz> <alpine.LNX.2.00.1009061418560.28912@zhemvz.fhfr.qr>

> 
> Well, if with 32 partitions all partitions exceed max partition size
> (well, or reasonable partition size), we should create more partitions.
> 
> The question is really if we should create sane partition sizes
> by default or if we should create N partitions by default.  I'd vote
> for sane partition sizes which makes the lto-partitions parameter
> useless (I don't expect easy reproducible testcases anyway with
> dynamic partitioning).

Well, even if I tended to do the same originally, i concluded that this won't
work well.

With 32 partitions for Mozilla, we get about 4GB of WPA memory use, and about 2
minutes of streaming.  Then each partition is about 200MB of memory, but they
takes about 3 minutes to compile.

Partition size sort of define ltrans memory use and time of compilation. Since
memory use in ltrans is not a problem (our WPA stage is way too big of memory
hog, we save only about half of the memory), we can set hard limit on time.
Now lets decide that we will tune partition size to compile in about 1 minute.
This would be hard per se, since sizes of partitions are pretty rough estimates
by anyway.

Then we would get about 3*32 partitions for mozilla that would just increase
streaming time (32partitions is 1.5GB, 2000 partitions is 7GB) On the other
hand on project that is small enough to compile in minute (lets say GCC) we
would get no partitioning at all.

Since too much partitions has expenses too, I believe number of partitions
should be large enough so ltrans does not dominate memory use for single CPU
compilation (this is about 2-4 in current implementation I guess).

In parallel compilation the number of partitions should be higher than the
number of CPUs available, but not terribly much higher since streaming size
increase with too many parttions.

32 seemed resonable default - not that high so streaming cost would terribly
increase and still more than number of CPUs on average developer workstation
today.
> > 
> > The other algorithm I tried was the usual graph clustering algorithm - just
> > start with partition per node and then start concatenating partitions based on
> > priority of edges until their size reach the expected size of partition.  But
> > this does not play well with my plan for function reordering and since function
> > reordering algorithm is essentially same thing, just producing one linear order
> > instead of buckets, i expect that it will combine well with the linear
> > partitioning one. (and the differences are minimal anyway)
> 
> Hm, I don't see the issue with function reordering - it should exactly
> behave the same way (and thus is unneeded?)

Well, partitioning algorithm clusters graphs minimizing the cross cluster edges,
function reordering algorithm orders the functions so most calls are short distance
and forwarding.

So when you at WPA stage decide in what order the functions are, you need to partition
them in linear segments of this order and keep partition in order as they get to linker
or assign each function section and tell linker order of the sections.
The second is more expensive. So I basicaly want
1) decide on linear order of functions
2) partition program linearly in this segments
3) in each partition arrange function order in the final .s files (either via gas section fragments
or by just breaking our current topological ocomplation order0
4) link the ltrans partition in the original order

Honza
> 
> > > > +/* WHOPR partitioning configuration.  */
> > > > +
> > > > +DEFPARAM (PARAM_LTO_PARTITIONS,
> > > > +	  "lto-partitions",
> > > > +	  "Number of paritions program should be split to",
> > > 
> > > As I understand this is really lto-max-partitions, right?
> > 
> > At the moment yet, until we get maximal partition size ;)
> 
> I'd prefer to not have this parameter, at least not for this
> partitioning algorithm.
> 
> > OK, once we agree at the details, I will update patch and add comments as
> > requested.
> 
> Thanks,
> Richard.

Follow-Ups:
- Re: WHOPR partitioning algorithm
  - From: Richard Guenther

References:
- WHOPR partitioning algorithm
  - From: Jan Hubicka
- Re: WHOPR partitioning algorithm
  - From: Richard Guenther
- Re: WHOPR partitioning algorithm
  - From: Jan Hubicka
- Re: WHOPR partitioning algorithm
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]