WHOPR partitioning algorithm

Mon Sep 6 12:47:00 GMT 2010

On Sat, 4 Sep 2010, Jan Hubicka wrote:

> > 
> > Can you add -flto-partition=single so we can eventually get rid of
> > -flto (or make it an alias for -fwhopr -flto-partition=single)?
> > 
> > That would avoid the above and eventually make the implementation
> > simpler.
> 
> well, still need to test if user specified one algorithm or many.  But yes, i
> can add single.  Problem is that we still create the tmp file then, I have
> plans to merge WHOPR/LTO queue up to point we will be able to decide on the fly
> what variant we want.  It is not that hard, we just need to make LTO queue to
> read function bodies on demand instead of reading them all at once that I hope
> is possible (did not look into merging issues with this idea yet)

Hm.  The question is if we care - if a project is small enough then
the tmp file won't be too big, if it is one wouldn't use single anyways.

But yes, it would be nice to instead of writing the big tmp files
store references to the original files there and only store WPA
ltrans info in them ...

> > > +/* Return true if NODE should be partitioned.  */
> > 
> > What does "partitioned" mean?  Duplicating the node in more than
> > 1 partition?  Or does !partitioned mean that it isn't needed at all?
> > Please clarify.
> 
> Partitioned nodes are those interesting for partitioning algorithm, nonpartitioned
> nodes are dragged in on demand when computing the boundary. I.e. COMDAT is ignored
> by partitioning algorithm and brought only to partitions that need it.

Ok, please update the comment.

> > > +  partition_size = total_size / PARAM_VALUE (PARAM_LTO_PARTITIONS);
> > > +  if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
> > > +    partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
> > 
> > Hm, so there is no max-parition-size?
> 
> well, I plan to do this, to avoid excessive memory use in ltrans stage.
> Problem with this is that at the moment WPA stage does not scale that well,
> so when we do the 32 partitions the ltrans stages consume fraction of
> WPA memory that seems acceptable, so I skipped it from initial patch.
> So the parameter buys you nothing, since when you can't compile with default
> configuration, you etiher exploded in WPA or hit some memory usage bug in the
> compiler.
> 
> I can add the parameter and tune it based on what I get in Mozilla.

Ok.

> > I expected that we try to partition the program by trying to
> > minimize reference edges between partitions while keeping
> > partition size in a min/max bound.  Is that the graph algorithm
> 
> Yes, it is what the algorithm does, just the min/max bound is computed
> by dividing program size by number of partitions, rounding it up to
> min partition size and then partitions 1/4th smaller or 1/4th bigger
> are accepted
> 
> > The above suggests that users have to adjust --param lto-partitions
> > for very large programs to be able to compile them, which sounds
> > bad.
> 
> Only place I expect users will actually need to tune lto-partitions is when
> they want more than 32 partitions (have more of CPUs).  I personally don't
> like much the idea of making this to depend on WHOPR parameter since this
> would imply that resulting binary would depend on -j configuration of
> Makefile (and we probably don't have the number handy with -fwhopr=jobserv)
> that would make reproducing the compilation harder.
> 
> Don't know here, really.

Well, if with 32 partitions all partitions exceed max partition size
(well, or reasonable partition size), we should create more partitions.

The question is really if we should create sane partition sizes
by default or if we should create N partitions by default.  I'd vote
for sane partition sizes which makes the lto-partitions parameter
useless (I don't expect easy reproducible testcases anyway with
dynamic partitioning).

> > 
> > Again, when trying to do partitioning it would be really helpful
> > if cgraph nodes and varpool nodes would share a common symtab
> > base and references would just reference those ...
> > 
> > > +  npartitions = 1;
> > > +  partition = new_partition ("");
> > > +  if (cgraph_dump_file)
> > > +    fprintf (cgraph_dump_file, "Total unit size: %i, partition size: %i\n",
> > > +	     total_size, partition_size);
> > 
> > The following needs a _big_ comment describing what it is doing.
> > What's its complexity?  What timevar does it get attributed to
> > (maybe add a new one?)
> 
> There is WPA output partition. Every node is added at most twice, every edge
> visited twice too, so complexity should not be problem.
> 
> As I tried to explain in the mail, the algorithm is simple.

Please add the algorithm description as a comment then.

> We keep adding nodes (in predefined order that is currently reverse postorder)
> into current partition and compute the number of internal edges and external
> edges one the go.  As soon as partition is bigger than 3/4th of expected size
> we start to look for point when we reach minimum of the fraction and look into
> 5/4th of the expected size.  Then we undo the process up to place when we
> reached the minima and start with new partition.
> 
> Since the program size estimate is not really good - it does not include size
> of nodes that gets duplicated, we always update size of expected partition
> after each run. Otherwise the algorthm tended to make bit more partitions than
> requested (like 40 instead of 32).
> 
> So it is very primite greedy algorithm minimizing the boundary size of every
> partition.
> 
> The other algorithm I tried was the usual graph clustering algorithm - just
> start with partition per node and then start concatenating partitions based on
> priority of edges until their size reach the expected size of partition.  But
> this does not play well with my plan for function reordering and since function
> reordering algorithm is essentially same thing, just producing one linear order
> instead of buckets, i expect that it will combine well with the linear
> partitioning one. (and the differences are minimal anyway)

Hm, I don't see the issue with function reordering - it should exactly
behave the same way (and thus is unneeded?)

> > > +/* WHOPR partitioning configuration.  */
> > > +
> > > +DEFPARAM (PARAM_LTO_PARTITIONS,
> > > +	  "lto-partitions",
> > > +	  "Number of paritions program should be split to",
> > 
> > As I understand this is really lto-max-partitions, right?
> 
> At the moment yet, until we get maximal partition size ;)

I'd prefer to not have this parameter, at least not for this
partitioning algorithm.

> OK, once we agree at the details, I will update patch and add comments as
> requested.

Thanks,
Richard.