This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>
- Cc: Evandro Menezes <e dot menezes at samsung dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>
- Date: Wed, 16 Dec 2015 13:42:35 +0100
- Subject: Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
- Authentication-results: sourceware.org; auth=none
- References: <001b01d1110d$0008f890$001ae9b0$ at samsung dot com> <563A9040 dot 60805 at samsung dot com> <563BC15D dot 3080608 at samsung dot com> <564E4779 dot 6020702 at samsung dot com> <20151120115334 dot GA12442 at arm dot com> <5660AF1F dot 8040803 at samsung dot com> <20151214112614 dot GA18673 at arm dot com> <5670A38A dot 9030000 at samsung dot com> <56714A03 dot 1010407 at arm dot com>
On Wed, Dec 16, 2015 at 12:24 PM, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
> On 15/12/15 23:34, Evandro Menezes wrote:
>> On 12/14/2015 05:26 AM, James Greenhalgh wrote:
>>> On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:
>>>> On 11/20/2015 05:53 AM, James Greenhalgh wrote:
>>>>> On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
>>>>>> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
>>>>>>> 2015-11-05 Evandro Menezes <e.menezes@samsung.com>
>>>>>>>
>>>>>>> gcc/
>>>>>>>
>>>>>>> * config/aarch64/aarch64.c
>>>>>>> (aarch64_override_options_internal):
>>>>>>> Increase loop peeling limit.
>>>>>>>
>>>>>>> This patch increases the limit for the number of peeled insns.
>>>>>>> With this change, I noticed no major regression in either
>>>>>>> Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
>>>>>>> ones, improved significantly.
>>>>>>>
>>>>>>> I tested this tuning on Exynos M1 and on A57. ThunderX seems to
>>>>>>> benefit from this tuning too. However, I'd appreciate comments
>>>>>> >from other stakeholders.
>>>>>>
>>>>>> Ping.
>>>>> I'd like to leave this for a call from the port maintainers. I can
>>>>> see why
>>>>> this leads to more opportunities for vectorization, but I'm
>>>>> concerned about
>>>>> the wider impact on code size. Certainly I wouldn't expect this to
>>>>> be our
>>>>> default at -O2 and below.
>>>>>
>>>>> My gut feeling is that this doesn't really belong in the back-end
>>>>> (there are
>>>>> presumably good reasons why the default for this parameter across
>>>>> GCC has
>>>>> fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
>>>>> like Marcus or Richard to make the call as to whether or not we take
>>>>> this
>>>>> patch.
>>>> Please, correct me if I'm wrong, but loop peeling is enabled only
>>>> with loop unrolling (and with PGO). If so, then extra code size is
>>>> not a concern, for this heuristic is only active when unrolling
>>>> loops, when code size is already of secondary importance.
>>> My understanding was that loop peeling is enabled from -O2 upwards, and
>>> is also used to partially peel unaligned loops for vectorization
>>> (allowing
>>> the vector code to be well aligned), or to completely peel inner loops
>>> which
>>> may then become amenable to SLP vectorization.
>>>
>>> If I'm wrong then I take back these objections. But I was sure this
>>> parameter was used in a number of situations outside of just
>>> -funroll-loops/-funroll-all-loops . Certainly I remember seeing
>>> performance
>>> sensitivities to this parameter at -O3 in some internal workloads I was
>>> analysing.
>>
>> Vectorization, including SLP, is only enabled at -O3, isn't it? It
>> seems to me that peeling is only used by optimizations which already
>> lead to potential increase in code size.
>>
>> For instance, with "-Ofast -funroll-all-loops", the total text size for
>> the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
>> without it; with just "-O2", it is the same at 23.1MB regardless of this
>> setting.
>>
>> So it seems to me that this proposal should be neutral for up to -O2.
>>
>> Thank you,
>>
>
> My preference would be to not diverge from the global parameter
> settings. I haven't looked in detail at this parameter but it seems to
> me there are two possible paths:
>
> 1) We could get agreement globally that the parameter should be increased.
> 2) We could agree that this specific use of the parameter is distinct
> from some other uses and deserves a new param in its own right with a
> higher value.
I think the fix is to improve the unrolled size estimates by better taking into
account constant propagation and CSE opportunities. I have some ideas
here but not sure if I have enough free cycles to implement this for GCC 7.
Richard.
> R.