This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/64058] [5/6 Regression] Performance degradation after r216304

From: "rguenther at suse dot de" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 10 Mar 2016 08:43:48 +0000
Subject: [Bug tree-optimization/64058] [5/6 Regression] Performance degradation after r216304
Auto-submitted: auto-generated
References: <bug-64058-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64058

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 10 Mar 2016, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64058
> 
> Jeffrey A. Law <law at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |amacleod at redhat dot com
> 
> --- Comment #14 from Jeffrey A. Law <law at redhat dot com> ---
> So working on the assumption that full stability for SSA_NAME_VERSION changes
> isn't in the cards, I went back to my patch which avoids using SSA_NAME_VERSION
> as the tie-breaker given two identical coalescing costs.

Yes, and that's a good change which I think we should put into GCC 7 
regardless whether it fixes these PRs or not.

> I can verify for this BZ that change brings stability to the order in which we
> process coalescing candidates with the same cost.  Essentially the tie breaker
> becomes which coalescing opportunity is seen first during the coalescer's walk
> of the IL.
> 
> That's fine and good, but as we know only addresses half the problem.  It does
> not address whether or not adjustments need to be made to the costing mechanism
> itself or whether we can come up with a better tie breaker.
> 
> For this BZ, using the referenced trunk versions these are the key coalescings
> in the chain.
> r216303:
> (4004) _1 <-> _316
> (3276) _1 <-> _200
> (3276) u1_lsm.6_253 <-> _316
> 
> Coalesce list: (1)_1 & (316)_316 [map: 0, 98] : Success -> 0
> Coalesce list: (1)_1 & (200)_200 [map: 0, 46] : Success -> 0
> Coalesce list: (1)_1 & (253)l1_lsm.7_159 [map: 0, 32] : Fail due to conflict
> 
> 
> The successful coalescings and subsequent merging of conflicts cause the 3rd
> coalesce to fail (and others).  This is actually what we want for this BZ.
> 
> In r216304 we have:
> (4004) _315 <-> _316
> (3276) u1_lsm.6_253 <-> _316
> (3276) _314 <-> _315
> 
> Coalesce list: (315)_315 & (316)_316 [map: 97, 98] : Success -> 97
> Coalesce list: (253)l1_lsm.7_58 & (316)_315 [map: 4, 97] : Success -> 4
> Coalesce list: (314)_314 & (315)l1_lsm.7_58 [map: 96, 4] : Fail due to conflict
> 
> 
> Note that the coalescing pairs with cost 3276 are reversed due to
> SSA_NAME_VERSION churn.  That allows 253 to coalesce with 316 which in turn
> prevents 314 from coalescing with 315 (and other downstream effects).  The net
> result of those downstream effects is an unwanted copy.
> 
> Looking at the costing metrics, they are strictly based on the edge frequency
> and flags where the copy would have to be inserted.
> 
> What jumps out to me is that the costing metrics don't look at all at the
> conflicts.  When we coalesce two objects, we have to merge their conflicts into
> the partition leader.  Essentially each time we coalesce, the resulting
> partition leader is going to have a larger conflict list and is less likely to
> participate in further coalescing.
> 
> That might argue that peeking at the union of the resulting conflicts and
> giving higher priority to the one that has a smaller resulting conflict set may
> be useful in the cost calculation or as a better tie-breaker.
> 
> So going back to r216303 and looking at the original partition conflicts we
> have:
> 
> Partition 0 (_1 - 1 )
> Partition 98 (_316 - 316 )
> Partition 46 (_200 - 200 )
> Partition 72 (u1_lsm.6_253 - 253 )
> 
> Conflict graph:
> 0: 4, 9, 28, 35, 41, 42, 44, 53, 60, 66, 75, 81, 84, 90, 93, 96, 100, 102
> 
> 46: 4, 5, 6, 9, 28, 35, 41, 42, 44, 51, 63, 66, 75, 81, 84, 90, 93, 96, 100,
> 102
> 
> 72: 4, 9, 28, 35, 44, 55, 67, 74, 75, 78, 84, 85, 90, 93, 96, 102, 108, 110,
> 111
> 
> 98: 4, 7, 8, 9, 10, 11, 14, 15, 27, 28, 35, 41, 42, 44, 47, 48, 50, 52, 56, 57,
> 58, 59, 65, 75, 81, 84, 90, 93, 96, 102
> 
> Coalescing _1 and _200 will result in a node with 22 unique conflicts.  
> Coalescing _253 and _316 will result in a node with 38 unique conflicts.
> 
> So if we used the conflicts sets either directly in the costing model or for
> breaking ties and gave preference to coalescings with smaller resulting
> conflicts, we'd get the right code for this test.

Yeah, this boils down to getting closer to a global optimal coalescing
solution rather than the "1-level" local optimal solution we are
computing.

But this should be addressed in an algorithmically sound way as
generally this is a NP hard problem and we want a "cheap" way
for -O[01g] at least as well as for must-coalesces.

> The question then becomes whether or not that is good in general.

Finding a better solution is certainly good in general (if our
local cost computations are accurate).  Whether it's algorithmically
feasible is another question.

I suppose one may find some answers in the theory of SSA based
register allocation.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]