This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>
- Cc: "'gcc-patches'" <gcc-patches at gcc dot gnu dot org>, "'Marcus Shawcroft'" <Marcus dot Shawcroft at arm dot com>, "'Kyrill Tkachov'" <kyrylo dot tkachov at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, richard dot earnshaw at arm dot com, ramana dot radhakrishnan at arm dot com
- Date: Mon, 14 Dec 2015 11:26:15 +0000
- Subject: Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
- Authentication-results: sourceware.org; auth=none
- References: <001b01d1110d$0008f890$001ae9b0$ at samsung dot com> <563A9040 dot 60805 at samsung dot com> <563BC15D dot 3080608 at samsung dot com> <564E4779 dot 6020702 at samsung dot com> <20151120115334 dot GA12442 at arm dot com> <5660AF1F dot 8040803 at samsung dot com>
On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:
> On 11/20/2015 05:53 AM, James Greenhalgh wrote:
> >On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
> >>On 11/05/2015 02:51 PM, Evandro Menezes wrote:
> >>>2015-11-05 Evandro Menezes <e.menezes@samsung.com>
> >>>
> >>> gcc/
> >>>
> >>> * config/aarch64/aarch64.c (aarch64_override_options_internal):
> >>> Increase loop peeling limit.
> >>>
> >>>This patch increases the limit for the number of peeled insns.
> >>>With this change, I noticed no major regression in either
> >>>Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
> >>>ones, improved significantly.
> >>>
> >>>I tested this tuning on Exynos M1 and on A57. ThunderX seems to
> >>>benefit from this tuning too. However, I'd appreciate comments
> >>>from other stakeholders.
> >>
> >>Ping.
> >I'd like to leave this for a call from the port maintainers. I can see why
> >this leads to more opportunities for vectorization, but I'm concerned about
> >the wider impact on code size. Certainly I wouldn't expect this to be our
> >default at -O2 and below.
> >
> >My gut feeling is that this doesn't really belong in the back-end (there are
> >presumably good reasons why the default for this parameter across GCC has
> >fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
> >like Marcus or Richard to make the call as to whether or not we take this
> >patch.
>
> Please, correct me if I'm wrong, but loop peeling is enabled only
> with loop unrolling (and with PGO). If so, then extra code size is
> not a concern, for this heuristic is only active when unrolling
> loops, when code size is already of secondary importance.
My understanding was that loop peeling is enabled from -O2 upwards, and
is also used to partially peel unaligned loops for vectorization (allowing
the vector code to be well aligned), or to completely peel inner loops which
may then become amenable to SLP vectorization.
If I'm wrong then I take back these objections. But I was sure this
parameter was used in a number of situations outside of just
-funroll-loops/-funroll-all-loops . Certainly I remember seeing performance
sensitivities to this parameter at -O3 in some internal workloads I was
analysing.
Thanks,
James