[Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Thu Dec 17 10:38:00 GMT 2015

Hello Jeff and Richard:

Here is the Summary of the FDO(Feedback Directed Optimization ) performance results.

SPEC CPU2000 INT benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
     Geomean Score = 3907.751673.
b) FDO + No Splitting Paths + tracer enabled
     Geomean Score = 3895.191536.

SPEC CPU2000 FP benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
     Geomean Score = 4793.321963
b) FDO + No Splitting Paths + tracer enabled
     Geomean Score = 4770.855467

The gains are maximum with Split Paths enabled + tracer pass enabled as compared to No Split Paths + tracer enabled. The 
Split Paths pass is very much required.

Thanks & Regards
Ajit

-----Original Message-----
From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] On Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, December 16, 2015 3:44 PM
To: Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

-----Original Message-----
From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] On Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
>     Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
>     Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
>     Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
>     Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
>     Geomean Score = 3742.833.

>>I suppose with SPEC you mean SPEC CPU 2006?

The performance data is with respect to SPEC CPU 2000 benchmarks.

>>Can you disclose the architecture you did the measurements on and the compile flags you used otherwise?

Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
cpu cores       : 10
cache size      : 25600 KB

I have used -O3 and enable the tracer with  -ftracer .

Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you re-run SPEC with FDO and compare with path-splitting enabled on top of that?

Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to tracer. With both Path Splitting and tracer enabled we are also getting  gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple 
> exits with Splitting paths through duplication. My observation is,  in 
> tracer pass also there is a creation of multiple exits through duplication. I don’t think that’s an issue with the practicality considering the gains we are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya 
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <law@redhat.com> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on 
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path 
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
>> is enabled with -fprofile-use (but it is also properly driven to only 
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on 
>> unless you duplicate the exit condition into that new block creating 
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem.  Then again, this all runs after the tree loop optimizer, so I'm not sure how big of an issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit 
>>> that I came across Honza's tracer implementation, which may in fact 
>>> be retargettable to these loops and do a better job.  I haven't 
>>> experimented with that.
>>
>> Well, I originally suggested to merge this with the tracer pass...
> I missed that, or it didn't sink into my brain.
>
>>> Again, the more statements it copies the more likely it is to be profitable.
>>> Think superblocks to expose CSE, DCE and the like.
>>
>> Ok, so similar to tracer (where I think the main benefit is actually 
>> increasing scheduling opportunities for architectures where it matters).
> Right.  They're both building superblocks, which has the effect of larger windows for scheduling, DCE, CSE, etc.
>
>
>>
>> Note that both passes are placed quite late and thus won't see much 
>> of the GIMPLE optimizations (DOM mainly).  I wonder why they were not 
>> placed adjacent to each other.
> Ajit had it fairly early, but that didn't play well with if-conversion.
>   I just pushed it past if-conversion and vectorization, but before 
> the last DOM pass.  That turns out to be where tracer lives too as you noted.
>
>>>
>>> I wouldn't lose any sleep if we disabled by default or removed, 
>>> particularly if we can repurpose Honza's code.  In fact, I might 
>>> strongly support the former until we hear back from Ajit on performance data.
>>
>> See above for what we do with -ftracer.  path-splitting should at 
>> _least_ restrict itself to operate on optimize_loop_for_speed_p () loops.
> I think we need to decide if we want the code at all, particularly 
> given the multiple-exit problem.
>
> The difficulty is I think Ajit posted some recent data that shows it's 
> helping.  So maybe the thing to do is ask Ajit to try the tracer 
> independent of path splitting and take the obvious actions based on 
> Ajit's data.
>
>
>>
>> It should also (even if counter-intuitive) limit the amount of stmt 
>> copying it does - after all there is sth like an instruction cache 
>> size which exceeeding for loops will never be a good idea (and even 
>> smaller special loop caches on some archs).
> Yup.
>
>>
>> Note that a better heuristic than "at least more than one stmt" would 
>> be to have at least one PHI in the merger block.  Otherwise I don't 
>> see how CSE opportunities could exist we don't see without the duplication.
>> And yes, more PHIs -> more possible CSE.  I wouldn't say so for the 
>> number of stmts.  So please limit the number of stmt copies!
>> (after all we do limit the number of stmts we copy during jump
>> threading!)
> Let's get some more data before we try to tune path splitting.  In an 
> ideal world, the tracer can handle this for us and we just remove path 
> splitting completely.
>
> Jeff