This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/64058] [5/6 Regression] Performance degradation after r216304
- From: "rguenther at suse dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 10 Mar 2016 08:43:48 +0000
- Subject: [Bug tree-optimization/64058] [5/6 Regression] Performance degradation after r216304
- Auto-submitted: auto-generated
- References: <bug-64058-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64058
--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 10 Mar 2016, law at redhat dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64058
>
> Jeffrey A. Law <law at redhat dot com> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |amacleod at redhat dot com
>
> --- Comment #14 from Jeffrey A. Law <law at redhat dot com> ---
> So working on the assumption that full stability for SSA_NAME_VERSION changes
> isn't in the cards, I went back to my patch which avoids using SSA_NAME_VERSION
> as the tie-breaker given two identical coalescing costs.
Yes, and that's a good change which I think we should put into GCC 7
regardless whether it fixes these PRs or not.
> I can verify for this BZ that change brings stability to the order in which we
> process coalescing candidates with the same cost. Essentially the tie breaker
> becomes which coalescing opportunity is seen first during the coalescer's walk
> of the IL.
>
> That's fine and good, but as we know only addresses half the problem. It does
> not address whether or not adjustments need to be made to the costing mechanism
> itself or whether we can come up with a better tie breaker.
>
> For this BZ, using the referenced trunk versions these are the key coalescings
> in the chain.
> r216303:
> (4004) _1 <-> _316
> (3276) _1 <-> _200
> (3276) u1_lsm.6_253 <-> _316
>
> Coalesce list: (1)_1 & (316)_316 [map: 0, 98] : Success -> 0
> Coalesce list: (1)_1 & (200)_200 [map: 0, 46] : Success -> 0
> Coalesce list: (1)_1 & (253)l1_lsm.7_159 [map: 0, 32] : Fail due to conflict
>
>
> The successful coalescings and subsequent merging of conflicts cause the 3rd
> coalesce to fail (and others). This is actually what we want for this BZ.
>
> In r216304 we have:
> (4004) _315 <-> _316
> (3276) u1_lsm.6_253 <-> _316
> (3276) _314 <-> _315
>
> Coalesce list: (315)_315 & (316)_316 [map: 97, 98] : Success -> 97
> Coalesce list: (253)l1_lsm.7_58 & (316)_315 [map: 4, 97] : Success -> 4
> Coalesce list: (314)_314 & (315)l1_lsm.7_58 [map: 96, 4] : Fail due to conflict
>
>
> Note that the coalescing pairs with cost 3276 are reversed due to
> SSA_NAME_VERSION churn. That allows 253 to coalesce with 316 which in turn
> prevents 314 from coalescing with 315 (and other downstream effects). The net
> result of those downstream effects is an unwanted copy.
>
> Looking at the costing metrics, they are strictly based on the edge frequency
> and flags where the copy would have to be inserted.
>
> What jumps out to me is that the costing metrics don't look at all at the
> conflicts. When we coalesce two objects, we have to merge their conflicts into
> the partition leader. Essentially each time we coalesce, the resulting
> partition leader is going to have a larger conflict list and is less likely to
> participate in further coalescing.
>
> That might argue that peeking at the union of the resulting conflicts and
> giving higher priority to the one that has a smaller resulting conflict set may
> be useful in the cost calculation or as a better tie-breaker.
>
> So going back to r216303 and looking at the original partition conflicts we
> have:
>
> Partition 0 (_1 - 1 )
> Partition 98 (_316 - 316 )
> Partition 46 (_200 - 200 )
> Partition 72 (u1_lsm.6_253 - 253 )
>
> Conflict graph:
> 0: 4, 9, 28, 35, 41, 42, 44, 53, 60, 66, 75, 81, 84, 90, 93, 96, 100, 102
>
> 46: 4, 5, 6, 9, 28, 35, 41, 42, 44, 51, 63, 66, 75, 81, 84, 90, 93, 96, 100,
> 102
>
> 72: 4, 9, 28, 35, 44, 55, 67, 74, 75, 78, 84, 85, 90, 93, 96, 102, 108, 110,
> 111
>
> 98: 4, 7, 8, 9, 10, 11, 14, 15, 27, 28, 35, 41, 42, 44, 47, 48, 50, 52, 56, 57,
> 58, 59, 65, 75, 81, 84, 90, 93, 96, 102
>
> Coalescing _1 and _200 will result in a node with 22 unique conflicts.
> Coalescing _253 and _316 will result in a node with 38 unique conflicts.
>
> So if we used the conflicts sets either directly in the costing model or for
> breaking ties and gave preference to coalescings with smaller resulting
> conflicts, we'd get the right code for this test.
Yeah, this boils down to getting closer to a global optimal coalescing
solution rather than the "1-level" local optimal solution we are
computing.
But this should be addressed in an algorithmically sound way as
generally this is a NP hard problem and we want a "cheap" way
for -O[01g] at least as well as for must-coalesces.
> The question then becomes whether or not that is good in general.
Finding a better solution is certainly good in general (if our
local cost computations are accurate). Whether it's algorithmically
feasible is another question.
I suppose one may find some answers in the theory of SSA based
register allocation.