This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: IRA performance regressions on PPC
Hi Vladimir,
Firstly, thanks for looking into this.
> Analysis of 187.facerec problem was actually easier than applu one.
> It has one very hot (80%) function localmove::graphRoutines.f90 and
> there is only one hot loop in the function. Although the loop is
> pretty big because of inlining TopCostFct. The loop contains a few
> if-statements and several switch-statements. But only part of the loop
> body is hot.
I've been chasing the specific portion of code that performs badly and
i've come to the same conclusion. I could isolate some parts of the loop
that have a greater effect on the performance, but it's still a big
loop. I was going to open a PR for this. Might be a good idea to keep
track of the progress, and we can attach the testcases there.
> After comparisons of the hot loop parts I found that IRA generates
> about 20 insns more which are some stores but mostly load. I did not
> find any problem with spilling in reload (that is after fixing one
> spilling problem about which I wrote in my previous email). Reload
> spills only two registers which lives throughout the hot parts.
I've seen quite a number of loads/stores using the same one or two
registers over and over again. For example, r12 was heavily used during
that hot loop, according to what i saw.
> So I concluded that the problem was actually in IRA allocation. I
> did not find any wrong in IRA implementation of coloring algorithm.
> Changing spill first heuristic there (we choose allocno with smaller
> number to spill) from
>
> SPILL_COST / (NUMBER_OF_LEFT_CONFLICTS * NUMBER_HARD_REGISTER_NEEDED
> + 1)
>
> to
>
> SPILL_COST * log (NREFS) / (NUMBER_OF_LEFT_CONFLICTS
> * NUMBER_HARD_REGISTER_NEEDED
> * LIVE_RANGE_LENGTH + 1)
>
> increased facerec score on Power6 from 1760 to 2039 (vs 1850 for the
> old RA). It looks very good but unfortunately the overall SPECFP2000
> score was not changed much and SPECINT2000 was not change at all.
> Some programs score became better the rest became worse.
>
> This is classical example of heuristic approach drawback. You never
> get best code using one heuristic for all programs. Sometimes the old
> RA will generate better code. What are goal should be is to achieve
> better code in "average", i.e. for some credible benchmark like
> SPECFP2000 and we can achieve this with IRA.
>
> It is hard to say what heuristic will work best for given program
> and make the choice automatically (especially without profiling
> because branch probability prediction algorithm is not that accurate).
> Although we could add an option to choose different spill heuristics
> in hope to achieve best overall score using machine learning algorithm
> like Google or Netflix use to find the better prediction
> (e.g. neighborhood based search). There is one promising project for
> GCC using this approach. It was reported by Grigory Fursin on this
> year GCC Summit. I think it is promising because different spill
> heuristics results in very different times (about 20% for facerec).
>
> Probably I'll submit the new heuristic because SPECFP2000 score a
> bit better (about 0.5%) with using this. But I need some time to
> check it on other platforms.
If you have any patches that you think are worth testing, let me know so
i can give them a go. I'll be following closely the progress on this
topic.
Thanks,
Luis