This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Tree loop unroller pass

From: Richard Biener <richard dot guenther at gmail dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>,Kugan Vivekanandarajah <kugan dot vivekanandarajah at linaro dot org>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>,nd <nd at arm dot com>
Date: Fri, 16 Feb 2018 16:00:43 +0100
Subject: Re: [RFC] Tree loop unroller pass
Authentication-results: sourceware.org; auth=none
References: <DB6PR0801MB205363C0CDF8D756E2C2F0D383F60@DB6PR0801MB2053.eurprd08.prod.outlook.com> <CAELXzTPTYH-QMYijxoGD_T=CeqK0p3H5X5FLiqzr9+Hvm76P8g@mail.gmail.com>,<CAFiYyc1MOhnib68UHRBtb6=RGRe17d-qtWReWgy+1brvezQPxw@mail.gmail.com> <DB6PR0801MB20531444EF35A79FC97C059D83CB0@DB6PR0801MB2053.eurprd08.prod.outlook.com>

On February 16, 2018 3:22:22 PM GMT+01:00, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>Richard Biener wrote:
>>> This is a great plan - GCC urgently requires a good unroller!
>>
>> How so?
>
>I thought it is well-known for many years that the rtl unroller doesn't
>work properly.
>In practically all cases where LLVM beats GCC, it is due to unrolling
>small loops.
>
>You may have noticed how people have been enabling
>-fprefetch-loop-arrays by
>default in some AArch64 configurations and then strip out most/all
>prefetches in
>order to get the effect of tree unrolling... However the unroll
>parameters of this
>pass are even worse than -funroll-loops, so it ends up using crazy
>unroll factors.
>
>> To generate more ILP for modern out-of-order processors you need to
>be
>> able to do followup transforms that remove dependences.  So rather
>than
>> inventing magic params we should look at those transforms and key
>> unrolling on them.  Like we do in predictive commoning or other
>passes
>> that end up performing unrolling as part of their transform.
>
>This is why unrolling needs to be done at the tree level. Alias info is
>correct,
>addressing modes end up more optimal and the scheduler can now
>interleave 
>the iterations (often not possible after the rtl-unroller due to bad
>alias info).
> 
>> Our measurements on x86 concluded that unrolling isn't worth it, in
>fact
>> it very often hurts.  That was of course with saner params than the
>defaults
>> of the RTL unroller.
>>
>> Often you even have to fight with followup passes doing stuff that
>ends up
>> inreasing register pressure too much so we end up spilling.
>
>Yes that's why I mentioned we should only unroll small loops where
>there
>is always a benefit from reduced loop counter increments and branching.
>
>> So _please_ first get testcases we know unrolling will be beneficial
>on
>> and _also_ have a thorough description _why_.
>
>I'm sure we can find good examples. The why will be obvious just from
>instruction
>count.

With Ooo CPUs speculatively executing the next iterations I very much doubt that. 

Richard. 

>Wilco

Follow-Ups:
- Re: [RFC] Tree loop unroller pass
  - From: Wilco Dijkstra

References:
- Re: [RFC] Tree loop unroller pass
  - From: Wilco Dijkstra
- Re: [RFC] Tree loop unroller pass
  - From: Kugan Vivekanandarajah
- Re: [RFC] Tree loop unroller pass
  - From: Richard Biener
- Re: [RFC] Tree loop unroller pass
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]