How to check if two symbols are from same source unit during WPA ?
Prathamesh Kulkarni
prathamesh.kulkarni@linaro.org
Thu Apr 14 12:08:00 GMT 2016
On 14 April 2016 at 13:56, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 14 Apr 2016, Prathamesh Kulkarni wrote:
>
>> On 12 April 2016 at 22:41, Richard Biener <rguenther@suse.de> wrote:
>> > On April 12, 2016 3:47:19 PM GMT+02:00, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:
>> >>Hi,
>> >>How to check if two symbols are from same source file during WPA ?
>> >>Is symbol1->lto_file_data == symbol2->lto_file_data true if symbol1
>> >>and symbol2 are from same
>> >>source files ? Would that be a sufficient condition or do I need to
>> >>check for something more ?
>> >
>> > Why would you want to know? The proposed check would verify it comes from the same TU.
>> Hi,
>> I am trying to address issue with section anchors with LTO.
>> In non-LTO mode (or 1to1 map), a TU is mapped to object-file 1:1
>> Therefore we can take advantage of section anchors.
>> IIUC with balanced partitioning, variable is added in the first partition it is
>> referenced in. So it may get separated into other partition from functions
>> referencing it, thus causing it to be loaded as an external reference.
>>
>> My point is that the "first" partition to which the variable is added may not
>> be the best choice for it.
>> For instance:
>> file1.c defines variables 'a' and 'b' and functions f1(), f2()
>> file2.c defines f3().
>> f1, f2, f3 use 'a' and 'b'.
>>
>> It could happen that f3, a and b are mapped to one partition (assuming
>> f3 is processed first), and f1, f2 are mapped to another partition.
>> so f1 and f2 are forced to load a, b as external references.
>> It would be better instead to put a and b in same partition as f1 and f2.
>>
>> As first cut I tried to implement the following very simple approach:
>> Assume V(f) gives a set of all variables referenced by function f.
>> Let S = V(f1) intersect V(f2) ... intersect V(fn).
>> Add f1, f2, .. fn and S to same partition if
>> a) S is non-empty, that is, we add functions and "common" variables
>> referenced by these functions to same partition.
>> b) Function uses at least two global variables. If a function uses
>> only a single global variable, then AFAIU it wouldn't matter if it's loaded
>> as external reference.
>> However the above approach results in quite distorted partitioning
>> creating very large partitions and took 4x time to build chromium with LTO.
>>
>> So I tried to restrain adding number of symbols, by allowing them to be added
>> only if they are from same TU.
>> This would (hopefully) ensure that we are not worse than non-LTO case
>> since we add functions and common set of variables referenced by these
>> functions to same partition provided all of them belong to same TU.
>> As a worst-case, it will add all symbols from the TU to which function
>> belongs, if all the functions in
>> that TU reference >=2 global variables and reference at least one global
>> variable in common from the same TU. We might want to put a higher upper
>> limit than 1 for number of shared referenced global vars to stop adding
>> too many symbols to same partition.
>>
>> For eg:
>> file1.c defines variables 'a', 'b' and functions f1_ab(), f2_ab()
>> file2.c: defines f3_ab()
>> where f1_ab, f2_ab, f3_ab both use a, b
>> If f1_ab is added to partiton, it would try adding 'a', 'b' and f2_ab
>> to the same partition but
>> not f3_ab() since it belongs to another TU.
>> (f3_ab() may get added to the same partition by another iteration if
>> there's enough space in
>> the partition but that's irrelevant).
>>
>> Ideally I would like to put those functions and variables in the same
>> partition such that the use
>> of section anchors can be maximized, which would also involve adding
>> functions from other TU.
>> In above case it's desirable to add f3_ab to the same
>> partition as f1_ab, however I am not sure what constraints to check for since
>> we could potentially end up adding lots of symbols (as in approach 1).
>> For now, I only considered the check if they are from same TU.
>>
>> I have attached a prototype patch that implements the above approach.
>> It passes testing on arm*-*-* and aarch64*-*-* and takes roughly same time
>> to build chromium with LTO as without patch.
>> The patch should only affect targets with section anchor support (arm,
>> aarch64, ppc).
>> I have to yet perform benchmarking with the patch, I will get back
>> with some results
>> after I get that done.
>> I would be grateful for feedback especially regarding correctness.
>
> Ok, I didn't try to fully understand what you did but why not simply
> keep partitioning of functions as-is and simply put variables into
> the partition which contains the most references to it?
>
> Section anchor optimization should be a 2nd choice for partitioning
> and really partitioning functions together like we do should have priority.
Well balanced partitioning would reorder functions without considering section
anchors, so putting variables in partition containing max references may not
be optimal.
For eg:
file1.c defines a, b, and three functions f1, f2, f3 using a, b.
As a worst case, balanced partitioning algo may put f1, f2, f3
in separate partitions which would defeat section anchor optimization.
With the patch, I try to retain {a, b, f1, f2, f3} in same
partition (other symbols from same TU may go elsewhere).
The rationale for this is that the effect of section anchors
remains the same as for non-LTO (although now we would eat up space
of callees which would be put in the same partition).
Complication arises when different functions reference only some
of the global variables.
for eg:
file1.c: defines a, b, c and functions f1, f2, f3, f4, f5.
f1, f2 use a, b and f3, f4, f5 use a, c.
The patch would add all the symbols {a, b, c, f1, f2, f3, f4, f5} to the
same partition. As a worst-case this could end up adding
all symbols from the same TU, which would lean more towards 1to1
rather than balanced.
So I suppose we would rather want to add that subset of symbols that
has "maximum commonality" ?
for eg:
file1.c: defines a, b, c, d and functions:
f1 -> {a, b, c}
f2 -> {a, b, c}
f3 -> {a, b, c, d}
f4 -> {a, c, d}
f5 -> {a, b}
f6 -> {a, c}
f7 -> {a, b}
f8 -> {a, c, d}
Here the pair {a, b, c} is referenced by 2 functions.
{a, b, c, d} referenced by 1 functions.
{a, c, d} referenced by 2 functions
{a, b} referenced by 2 functions
{a, c} referenced by 1 function.
So to maximize "commonality", I suppose we would want to put
{a, b, c, f1, f2} or {a, c, d, f4, f8} in same partition instead of all
symbols.
Finding "max commonality"
Let map<V, F> be a hashtable.
where V = {set of variables referenced}
and F = {set of functions referencing the variables in set V}
for (each function f)
{
V = {variables referenced by f}
map[V].insert (f); // insert f in map.F corresponding to map.V
}
In above example,
map would be:
{a, b, c} -> {f1 f2}
{a, b, c, d} -> {f3}
{a, b} -> {f5, f7}
{a, c, d} -> {f4, f8}
{a, c} -> {f6}
We could define max "commonality" then as those entries in map
which have max number of variables for max number of functions.
In above eg, entries having maximum number of functions (2):
{a, b, c} -> {f1, f2}
{a, b} -> {f5, f7}
{a, c, d} -> {f4, f8}
Among these we want to choose entries having max number of variables
(3), which is:
{a, b, c} -> {f1, f2}
{a, c, d} -> {f4, f8}
Among these we can choose either entry, so we either put {a, b, c, f1, f2} in
same partition or {a, c, d, f4, f8} in same partition.
Which I suppose would be better than putting {a, b, c, d, f1, f2, f3,
f4, f5, f6, f7, f8}
in same partition as is done currently in the patch.
I am not sure if that's the best approach.
for instance if we have 3 functions referencing same 6 variables
and 4 functions referencing same 2 variables,
then the above method would choose to put {4 functions 2 variables }
in same partition which
might be inferior to putting { 3 functions 6 variables } in same partitions.
I suppose we could modify the above method to choose those entries
having number of functions within range [max - CONSTANT, max]
instead of choosing entries having max number of functions but I am
not sure what would be a good value for CONSTANT.
We could also drop the same TU constraint since finding "max" commonality
doesn't check if symbols are in same TU.
This interferes with adding functions in callgraph order to same
partition so perhaps
we could have more constraints to ensure we don't add many symbols in
partition boundary.
Does this sound reasonable ?
If it's not desired by default could we gate it on an option ?
AFIAU, section anchors optimization is important for ARM and AArch64.
Thanks,
Prathamesh
>
> Richard.
More information about the Gcc
mailing list