[PATCH 2/4][AArch64] Increase the loop peeling limit

Thu Dec 3 21:07:00 GMT 2015

On 11/20/2015 05:53 AM, James Greenhalgh wrote:
> On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
>> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
>>> 2015-11-05  Evandro Menezes <e.menezes@samsung.com>
>>>
>>>    gcc/
>>>
>>>        * config/aarch64/aarch64.c (aarch64_override_options_internal):
>>>        Increase loop peeling limit.
>>>
>>> This patch increases the limit for the number of peeled insns.
>>> With this change, I noticed no major regression in either
>>> Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
>>> ones, improved significantly.
>>>
>>> I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
>>> benefit from this tuning too.  However, I'd appreciate comments
>> >from other stakeholders.
>>
>> Ping.
> I'd like to leave this for a call from the port maintainers. I can see why
> this leads to more opportunities for vectorization, but I'm concerned about
> the wider impact on code size. Certainly I wouldn't expect this to be our
> default at -O2 and below.
>
> My gut feeling is that this doesn't really belong in the back-end (there are
> presumably good reasons why the default for this parameter across GCC has
> fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
> like Marcus or Richard to make the call as to whether or not we take this
> patch.

Please, correct me if I'm wrong, but loop peeling is enabled only with 
loop unrolling (and with PGO).  If so, then extra code size is not a 
concern, for this heuristic is only active when unrolling loops, when 
code size is already of secondary importance.

Thank you,

-- 
Evandro Menezes