This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
- From: Teresa Johnson <tejohnson at google dot com>
- To: Andi Kleen <andi at firstfloor dot org>
- Cc: reply at codereview dot appspotmail dot com, gcc-patches at gcc dot gnu dot org
- Date: Wed, 25 Apr 2012 08:36:05 -0700
- Subject: Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)
- References: <20120424212648.E6B696136C@tjsboxrox.mtv.corp.google.com> <firstname.lastname@example.org>
On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen <email@example.com> wrote:
> firstname.lastname@example.org (Teresa Johnson) writes:
>> This patch adds heuristics to limit unrolling in loops with branches that may increase
>> branch mispredictions. It affects loops that are not frequently iterated, and that are
>> nested within a hot region of code that already contains many branch instructions.
>> Performance tested with both internal benchmarks and with SPEC 2000/2006 on a variety
>> of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD Opteron systems.
>> This improves performance of an internal search indexing benchmark by close to 2% on
>> all the tested Intel platforms. ?It also consistently improves 445.gobmk (with FDO feedback
>> where unrolling kicks in) by close to 1% on AMD Opteron. Other performance effects are
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. ?Is this ok for trunk?
> One problem with any unrolling heuristics is currently that gcc has both
> the tree level and the rtl level unroller. The tree one is even on at
> -O3. ?So if you tweak anything for one you have to affect both, otherwise the
> other may still do the wrong thing(tm).
It's true that the tree level unroller could benefit from taking
branch mispredict effects into account as well. But since that is only
performing full unrolling of constant trip count loops I suspect that
there will be additional things that need to be considered, such as
whether the full unrolling enables better optimization in the
surrounding code/loop. Hence I wanted to tackle that later.
> For some other tweaks I looked into a shared cost model some time ago.
> May be still needed.
Yes, I think it would be good to unify some of the profitability
checks between the two unrolling passes, or at least between the tree
and rtl level full unrollers/peelers.
> email@example.com -- Speaking for myself only
Teresa Johnson?|?Software Engineerfirstname.lastname@example.org?|?408-460-2413