IRA performance regressions on PPC

Luis Machado luisgpm@linux.vnet.ibm.com
Sun Sep 14 10:10:00 GMT 2008


Hi Vladimir,

Firstly, thanks for looking into this.

>   Analysis of 187.facerec problem was actually easier than applu one.
> It has one very hot (80%) function localmove::graphRoutines.f90 and
> there is only one hot loop in the function.  Although the loop is
> pretty big because of inlining TopCostFct.  The loop contains a few
> if-statements and several switch-statements.  But only part of the loop
> body is hot.

I've been chasing the specific portion of code that performs badly and
i've come to the same conclusion. I could isolate some parts of the loop
that have a greater effect on the performance, but it's still a big
loop. I was going to open a PR for this. Might be a good idea to keep
track of the progress, and we can attach the testcases there.

>   After comparisons of the hot loop parts I found that IRA generates
> about 20 insns more which are some stores but mostly load.  I did not
> find any problem with spilling in reload (that is after fixing one
> spilling problem about which I wrote in my previous email).  Reload
> spills only two registers which lives throughout the hot parts.

I've seen quite a number of loads/stores using the same one or two
registers over and over again. For example, r12 was heavily used during
that hot loop, according to what i saw.

>   So I concluded that the problem was actually in IRA allocation.  I
> did not find any wrong in IRA implementation of coloring algorithm.
> Changing spill first heuristic there (we choose allocno with smaller
> number to spill) from
> 
>     SPILL_COST / (NUMBER_OF_LEFT_CONFLICTS * NUMBER_HARD_REGISTER_NEEDED 
> + 1)
> 
> to
> 
>     SPILL_COST * log (NREFS) / (NUMBER_OF_LEFT_CONFLICTS
>                                 * NUMBER_HARD_REGISTER_NEEDED
>                 * LIVE_RANGE_LENGTH + 1)
> 
> increased facerec score on Power6 from 1760 to 2039 (vs 1850 for the
> old RA).  It looks very good but unfortunately the overall SPECFP2000
> score was not changed much and SPECINT2000 was not change at all.
> Some programs score became better the rest became worse.
> 
>   This is classical example of heuristic approach drawback.  You never
> get best code using one heuristic for all programs.  Sometimes the old
> RA will generate better code.  What are goal should be is to achieve
> better code in "average", i.e. for some credible benchmark like
> SPECFP2000 and we can achieve this with IRA.
> 
>   It is hard to say what heuristic will work best for given program
> and make the choice automatically (especially without profiling
> because branch probability prediction algorithm is not that accurate).
> Although we could add an option to choose different spill heuristics
> in hope to achieve best overall score using machine learning algorithm
> like Google or Netflix use to find the better prediction
> (e.g. neighborhood based search).  There is one promising project for
> GCC using this approach.  It was reported by Grigory Fursin on this
> year GCC Summit.  I think it is promising because different spill
> heuristics results in very different times (about 20% for facerec).
> 
>   Probably I'll submit the new heuristic because SPECFP2000 score a
> bit better (about 0.5%) with using this.  But I need some time to
> check it on other platforms.

If you have any patches that you think are worth testing, let me know so
i can give them a go. I'll be following closely the progress on this
topic.

Thanks,
Luis



More information about the Gcc mailing list