This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: IRA performance regressions on PPC

From: Luis Machado <luisgpm at linux dot vnet dot ibm dot com>
To: Vladimir Makarov <vmakarov at redhat dot com>
Cc: Jeff Law <law at redhat dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, gcc at gcc dot gnu dot org
Date: Sat, 13 Sep 2008 02:03:29 -0300
Subject: Re: IRA performance regressions on PPC
References: <1220623176.18640.42.camel@gargoyle> <48C16002.7090300@redhat.com> <1220633125.18640.55.camel@gargoyle> <48C16B57.8030805@redhat.com> <6dc9ffc80809051034r4c07f425lf4f72a5c56343f9e@mail.gmail.com> <1220709930.18640.64.camel@gargoyle> <6dc9ffc80809060749x6aa32cf8m668e785d16d5b491@mail.gmail.com> <1220725474.18640.70.camel@gargoyle> <6dc9ffc80809061135m7c07fa8cp862291a3af1f53e5@mail.gmail.com> <48C2F354.7020608@redhat.com> <48C52D01.20003@redhat.com> <1220984022.18640.98.camel@gargoyle> <48C92C9C.3070809@redhat.com> <48CAE438.1060905@redhat.com>
Reply-to: luisgpm at linux dot vnet dot ibm dot com

Hi Vladimir,

Firstly, thanks for looking into this.

>   Analysis of 187.facerec problem was actually easier than applu one.
> It has one very hot (80%) function localmove::graphRoutines.f90 and
> there is only one hot loop in the function.  Although the loop is
> pretty big because of inlining TopCostFct.  The loop contains a few
> if-statements and several switch-statements.  But only part of the loop
> body is hot.

I've been chasing the specific portion of code that performs badly and
i've come to the same conclusion. I could isolate some parts of the loop
that have a greater effect on the performance, but it's still a big
loop. I was going to open a PR for this. Might be a good idea to keep
track of the progress, and we can attach the testcases there.

>   After comparisons of the hot loop parts I found that IRA generates
> about 20 insns more which are some stores but mostly load.  I did not
> find any problem with spilling in reload (that is after fixing one
> spilling problem about which I wrote in my previous email).  Reload
> spills only two registers which lives throughout the hot parts.

I've seen quite a number of loads/stores using the same one or two
registers over and over again. For example, r12 was heavily used during
that hot loop, according to what i saw.

>   So I concluded that the problem was actually in IRA allocation.  I
> did not find any wrong in IRA implementation of coloring algorithm.
> Changing spill first heuristic there (we choose allocno with smaller
> number to spill) from
> 
>     SPILL_COST / (NUMBER_OF_LEFT_CONFLICTS * NUMBER_HARD_REGISTER_NEEDED 
> + 1)
> 
> to
> 
>     SPILL_COST * log (NREFS) / (NUMBER_OF_LEFT_CONFLICTS
>                                 * NUMBER_HARD_REGISTER_NEEDED
>                 * LIVE_RANGE_LENGTH + 1)
> 
> increased facerec score on Power6 from 1760 to 2039 (vs 1850 for the
> old RA).  It looks very good but unfortunately the overall SPECFP2000
> score was not changed much and SPECINT2000 was not change at all.
> Some programs score became better the rest became worse.
> 
>   This is classical example of heuristic approach drawback.  You never
> get best code using one heuristic for all programs.  Sometimes the old
> RA will generate better code.  What are goal should be is to achieve
> better code in "average", i.e. for some credible benchmark like
> SPECFP2000 and we can achieve this with IRA.
> 
>   It is hard to say what heuristic will work best for given program
> and make the choice automatically (especially without profiling
> because branch probability prediction algorithm is not that accurate).
> Although we could add an option to choose different spill heuristics
> in hope to achieve best overall score using machine learning algorithm
> like Google or Netflix use to find the better prediction
> (e.g. neighborhood based search).  There is one promising project for
> GCC using this approach.  It was reported by Grigory Fursin on this
> year GCC Summit.  I think it is promising because different spill
> heuristics results in very different times (about 20% for facerec).
> 
>   Probably I'll submit the new heuristic because SPECFP2000 score a
> bit better (about 0.5%) with using this.  But I need some time to
> check it on other platforms.

If you have any patches that you think are worth testing, let me know so
i can give them a go. I'll be following closely the progress on this
topic.

Thanks,
Luis

References:
- IRA performance regressions on PPC
  - From: Luis Machado
- Re: IRA performance regressions on PPC
  - From: Vladimir Makarov
- Re: IRA performance regressions on PPC
  - From: Luis Machado
- Re: IRA performance regressions on PPC
  - From: Vladimir Makarov
- Re: IRA performance regressions on PPC
  - From: H.J. Lu
- Re: IRA performance regressions on PPC
  - From: Luis Machado
- Re: IRA performance regressions on PPC
  - From: H.J. Lu
- Re: IRA performance regressions on PPC
  - From: Luis Machado
- Re: IRA performance regressions on PPC
  - From: H.J. Lu
- Re: IRA performance regressions on PPC
  - From: Jeff Law
- Re: IRA performance regressions on PPC
  - From: Vladimir Makarov
- Re: IRA performance regressions on PPC
  - From: Luis Machado
- Re: IRA performance regressions on PPC
  - From: Vladimir Makarov
- Re: IRA performance regressions on PPC
  - From: Vladimir Makarov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]