This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Tree loop unroller pass

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Richard Biener <richard dot guenther at gmail dot com>, Kugan Vivekanandarajah <kugan dot vivekanandarajah at linaro dot org>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Fri, 16 Feb 2018 14:22:22 +0000
Subject: Re: [RFC] Tree loop unroller pass
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Nodisclaimer: True
References: <DB6PR0801MB205363C0CDF8D756E2C2F0D383F60@DB6PR0801MB2053.eurprd08.prod.outlook.com> <CAELXzTPTYH-QMYijxoGD_T=CeqK0p3H5X5FLiqzr9+Hvm76P8g@mail.gmail.com>,<CAFiYyc1MOhnib68UHRBtb6=RGRe17d-qtWReWgy+1brvezQPxw@mail.gmail.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Richard Biener wrote:
>> This is a great plan - GCC urgently requires a good unroller!
>
> How so?

I thought it is well-known for many years that the rtl unroller doesn't work properly.
In practically all cases where LLVM beats GCC, it is due to unrolling small loops.

You may have noticed how people have been enabling -fprefetch-loop-arrays by
default in some AArch64 configurations and then strip out most/all prefetches in
order to get the effect of tree unrolling... However the unroll parameters of this
pass are even worse than -funroll-loops, so it ends up using crazy unroll factors.

> To generate more ILP for modern out-of-order processors you need to be
> able to do followup transforms that remove dependences.  So rather than
> inventing magic params we should look at those transforms and key
> unrolling on them.  Like we do in predictive commoning or other passes
> that end up performing unrolling as part of their transform.

This is why unrolling needs to be done at the tree level. Alias info is correct,
addressing modes end up more optimal and the scheduler can now interleave 
the iterations (often not possible after the rtl-unroller due to bad alias info).

> Our measurements on x86 concluded that unrolling isn't worth it, in fact
> it very often hurts.  That was of course with saner params than the defaults
> of the RTL unroller.
>
> Often you even have to fight with followup passes doing stuff that ends up
> inreasing register pressure too much so we end up spilling.

Yes that's why I mentioned we should only unroll small loops where there
is always a benefit from reduced loop counter increments and branching.

> So _please_ first get testcases we know unrolling will be beneficial on
> and _also_ have a thorough description _why_.

I'm sure we can find good examples. The why will be obvious just from instruction
count.

Wilco

Follow-Ups:
- Re: [RFC] Tree loop unroller pass
  - From: Richard Biener
- Re: [RFC] Tree loop unroller pass
  - From: Michael Matz

References:
- Re: [RFC] Tree loop unroller pass
  - From: Wilco Dijkstra
- Re: [RFC] Tree loop unroller pass
  - From: Kugan Vivekanandarajah
- Re: [RFC] Tree loop unroller pass
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]