This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

From: Igor Zamyatin <izamyatin at gmail dot com>
To: davidxl at google dot com
Cc: richard dot guenther at gmail dot com, tejohnson at google dot com, reply at codereview dot appspotmail dot com, gcc-patches at gcc dot gnu dot org
Date: Fri, 27 Apr 2012 11:07:41 +0400
Subject: Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
References: <20120424212648.E6B696136C@tjsboxrox.mtv.corp.google.com> <m2y5pkmx4b.fsf@firstfloor.org> <CAAkRFZJmZWD9KOA5TPZGRi5pXkVN=aWvZg8DvuyiJsKqvV3Epw@mail.gmail.com> <0EFAB2BDD0F67E4FB6CCC8B9F87D756915407451@IRSMSX101.ger.corp.intel.com>

Are you sure that tree-level unrollers are turned on at O2? My
impression was that they work only at O3 or with f[unroll,peel]-loops
flags.

On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen <andi@firstfloor.org> wrote:
> tejohnson@google.com (Teresa Johnson) writes:
>
>> This patch adds heuristics to limit unrolling in loops with branches
>> that may increase branch mispredictions. It affects loops that are
>> not frequently iterated, and that are nested within a hot region of code that already contains many branch instructions.
>>
>> Performance tested with both internal benchmarks and with SPEC
>> 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems.
>> This improves performance of an internal search indexing benchmark by
>> close to 2% on all the tested Intel platforms. ?It also consistently
>> improves 445.gobmk (with FDO feedback where unrolling kicks in) by
>> close to 1% on AMD Opteron. Other performance effects are neutral.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. ?Is this ok for trunk?
>
> One problem with any unrolling heuristics is currently that gcc has
> both the tree level and the rtl level unroller. The tree one is even
> on at -O3. ?So if you tweak anything for one you have to affect both,
> otherwise the other may still do the wrong thing(tm).

Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2,
both of them are turned on, but gcc does not allow any code growth --
which makes them pretty useless at O2 (very few loops qualify). The
default max complete peel iteration is also too low compared with both
icc and llvm. ?This needs to be tuned.

David

>
> For some other tweaks I looked into a shared cost model some time ago.
> May be still needed.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

Follow-Ups:
- Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
  - From: Xinliang David Li

References:
- [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
  - From: Teresa Johnson
- Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
  - From: Andi Kleen
- Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
  - From: Xinliang David Li

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]