This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Tree loop unroller pass


Richard Biener wrote:
>> This is a great plan - GCC urgently requires a good unroller!
>
> How so?

I thought it is well-known for many years that the rtl unroller doesn't work properly.
In practically all cases where LLVM beats GCC, it is due to unrolling small loops.

You may have noticed how people have been enabling -fprefetch-loop-arrays by
default in some AArch64 configurations and then strip out most/all prefetches in
order to get the effect of tree unrolling... However the unroll parameters of this
pass are even worse than -funroll-loops, so it ends up using crazy unroll factors.

> To generate more ILP for modern out-of-order processors you need to be
> able to do followup transforms that remove dependences.  So rather than
> inventing magic params we should look at those transforms and key
> unrolling on them.  Like we do in predictive commoning or other passes
> that end up performing unrolling as part of their transform.

This is why unrolling needs to be done at the tree level. Alias info is correct,
addressing modes end up more optimal and the scheduler can now interleave 
the iterations (often not possible after the rtl-unroller due to bad alias info).
 
> Our measurements on x86 concluded that unrolling isn't worth it, in fact
> it very often hurts.  That was of course with saner params than the defaults
> of the RTL unroller.
>
> Often you even have to fight with followup passes doing stuff that ends up
> inreasing register pressure too much so we end up spilling.

Yes that's why I mentioned we should only unroll small loops where there
is always a benefit from reduced loop counter increments and branching.

> So _please_ first get testcases we know unrolling will be beneficial on
> and _also_ have a thorough description _why_.

I'm sure we can find good examples. The why will be obvious just from instruction
count.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]