[patch,ira]: Improve on updated memory cost in coloring pass of integrated register allocator.

Mon Jan 25 17:33:00 GMT 2016

On 01/23/2016 06:09 AM, Ajit Kumar Agarwal wrote:
> This patch improves the updated memory cost in coloring pass of integrated register
> allocator. Only enter_freq of the loop is considered in updated memory cost in the
> coloring pass. Consideration of only enter_freq is based on the concept that live Out
> of the entry or header of the Loop is live in and liveout throughout the loop. Exit
> freq is ignored in the update memory cost in coloring pass.
As we put stores for spilled pseudos on loop entry and loads on the loop 
exits, ignoring loop exits means for me that we basically ignore the 
cost of the loads which is probably wrong in a general case.
> This increases the updated memory most and more chances of reducing the spill and
> fetch and better assignment.
>
> The concept of live-out of the header of the loop is live-in and live-out throughout
> of the Loop is based on the following.
>
> If a v live is out at the header of the loop then the variable is live-in at every node
> in the loop. To prove this, consider a loop L with header h such that the variable v
> defined at d is live-in at h. Since v is live at h, d is not part of L. This follows
> from the dominance property, i.e. h is strictly dominated by d. Furthermore, there
> exists a path from h to a use of v which does not go through d. For every node p in
> the loop, since the loop is strongly connected and node is a component of the CFG,
> there exists a path, consisting only of nodes of L from p to h. Concatenating these
> two paths proves that v is live-in and live-out of p.
>
> Bootstrapped on X86_64.
>
> Performance run is done on SPEC CPU2000 benchmarks and following are the results.
>
> SPEC INT benchmarks
> (Mean Score with this patch vs Mean score without this patch = 3729.777 vs 3717.083).
>
> Benchmarks    Gains.
> 186.crafty   = 2.78%
> 176.gcc         = 0.7%
> 253.perlbmk = 0.75%
> 255.vortex    =  0.82%
>
> SPEC FP benchmarks
> (Mean Score with this patch vs Mean score without this patch = 4774.65  vs 4751.838 ).
>
> Benchmarks  Gains
>
> 168.wupwise = 0.77%
> 171.swim        = 1.5%
> 177.mesa        = 1.2%
> 200.sixtrack    = 1.2%
> 178.galgel        = 0.6%
> 179.art             = 0.6%
> 183.equake   = 0.5%
> 187.facerec   = 0.7%.
>
Thanks for trying to improve GCC performance, Ajit.  Unfortunately, I 
got different numbers on SPEC2000 with your patch.  The different 
results might be a consequence of different test setup.

I got the following numbers using 4.2GHz i7-4790K (Haswell) using -Ofast 
-mtune=corei7.  Using the tune option is important as RA will try to 
improve code for Haswell architecture.

64-bit:
Int 5123 5126
FP 6886 6897

32-bit:
Int 4754 4763
FP 6363 6346

Here the first column is GCC with your patch and the second one is 
without your patch.  Only 32-bit FP score was improved by you patch.  
These days practically nobody uses 32-bit code for FP benchmarks.

So unfortunately I can not approve the patch.  Sorry.