This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

From: Evandro Menezes <e dot menezes at samsung dot com>
To: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>
Cc: 'gcc-patches' <gcc-patches at gcc dot gnu dot org>, 'Marcus Shawcroft' <Marcus dot Shawcroft at arm dot com>, 'Kyrill Tkachov' <kyrylo dot tkachov at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, ramana dot radhakrishnan at arm dot com, richard dot guenther at gmail dot com
Date: Wed, 16 Mar 2016 14:48:09 -0500
Subject: Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
Authentication-results: sourceware.org; auth=none
References: <001b01d1110d$0008f890$001ae9b0$ at samsung dot com> <563A9040 dot 60805 at samsung dot com> <563BC15D dot 3080608 at samsung dot com> <564E4779 dot 6020702 at samsung dot com> <20151120115334 dot GA12442 at arm dot com> <5660AF1F dot 8040803 at samsung dot com> <20151214112614 dot GA18673 at arm dot com> <5670A38A dot 9030000 at samsung dot com> <56714A03 dot 1010407 at arm dot com> <5671C55C dot 20601 at samsung dot com> <56903E5C dot 1090808 at samsung dot com> <56B25921 dot 9060204 at samsung dot com>

On 02/03/16 13:46, Evandro Menezes wrote:

On 01/08/16 16:55, Evandro Menezes wrote:
On 12/16/2015 02:11 PM, Evandro Menezes wrote:
On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote:
On 15/12/15 23:34, Evandro Menezes wrote:
On 12/14/2015 05:26 AM, James Greenhalgh wrote:
On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:
On 11/20/2015 05:53 AM, James Greenhalgh wrote:
On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
On 11/05/2015 02:51 PM, Evandro Menezes wrote:
2015-11-05  Evandro Menezes <e.menezes@samsung.com>

    gcc/

        * config/aarch64/aarch64.c
(aarch64_override_options_internal):
        Increase loop peeling limit.

This patch increases the limit for the number of peeled insns.
With this change, I noticed no major regression in either
Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
ones, improved significantly.

I tested this tuning on Exynos M1 and on A57. ThunderX seems to
benefit from this tuning too.  However, I'd appreciate comments
>from other stakeholders.

Ping.
I'd like to leave this for a call from the port maintainers. I can
see why
this leads to more opportunities for vectorization, but I'm
concerned about
the wider impact on code size. Certainly I wouldn't expect this to
be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end
(there are
presumably good reasons why the default for this parameter across
GCC has
fluctuated from 400 to 100 to 200 over recent years), but as Isay, I'dlike Marcus or Richard to make the call as to whether or not wetake
this
patch.
Please, correct me if I'm wrong, but loop peeling is enabled only
with loop unrolling (and with PGO).  If so, then extra code size is
not a concern, for this heuristic is only active when unrolling
loops, when code size is already of secondary importance.
My understanding was that loop peeling is enabled from -O2upwards, and
is also used to partially peel unaligned loops for vectorization
(allowing
the vector code to be well aligned), or to completely peel innerloops
which
may then become amenable to SLP vectorization.

If I'm wrong then I take back these objections. But I was sure this
parameter was used in a number of situations outside of just
-funroll-loops/-funroll-all-loops . Certainly I remember seeing
performance
sensitivities to this parameter at -O3 in some internal workloadsI was
analysing.
Vectorization, including SLP, is only enabled at -O3, isn't it?  It
seems to me that peeling is only used by optimizations which already
lead to potential increase in code size.
For instance, with "-Ofast -funroll-all-loops", the total textsize for
the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
without it; with just "-O2", it is the same at 23.1MB regardlessof this
setting.

So it seems to me that this proposal should be neutral for up to -O2.
My preference would be to not diverge from the global parameter
settings. I haven't looked in detail at this parameter but itseems to
me there are two possible paths:
1) We could get agreement globally that the parameter should beincreased.
2) We could agree that this specific use of the parameter is distinct
from some other uses and deserves a new param in its own right with a
higher value.
Here's what I have observed, not only in AArch64: architecturesbenefit differently from certain loop optimizations, especiallythose dealing with vectorization. Be it because some have plenty ofregisters of more aggressive loop unrolling, or because some havelower costs to vectorize. With this, I'm trying to imply that theremay be the case to wiggle this parameter to suit loop optimizationsbetter to specific targets. While it is not the only parameterrelated to loop optimizations, it seems to be the one with thedesired effects, as exemplified by PPC, S390 and x86 (AOSP). Thoughthere is the possibility that they are actually side-effects, asRichard Biener perhaps implied in another reply.
Gents,

Any new thoughts on this proposal?
Ping?


Ping^2

--
Evandro Menezes

Follow-Ups:
- Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
  - From: Evandro Menezes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]