This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch for suggestions]: How do we know a loop is the peeled version?
- From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
- To: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>
- Cc: Richard Guenther <richard dot guenther at gmail dot com>, Sebastian Pop <sebpop at gmail dot com>, Christian Borntraeger <borntraeger at de dot ibm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "uweigand at de dot ibm dot com" <uweigand at de dot ibm dot com>
- Date: Thu, 1 Jul 2010 21:38:18 +0200
- Subject: Re: [Patch for suggestions]: How do we know a loop is the peeled version?
- References: <D4C76825A6780047854A11E93CDE84D02F7763@SAUSEXMBP01.amd.com>
Hi,
> Just found that many optimizations (prefetch, loop unrolling) are performed on the peeled loops.
> This causes code size and compilation time increase without benefit.
>
> MODULE kinds
> INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
> END MODULE kinds
> ! --------------------------------------------------------------------
> PROGRAM TEST_FPU ! A number-crunching benchmark using matrix inversion.
> USE kinds ! Implemented by: David Frank Dave_Frank@hotmail.com
> IMPLICIT NONE ! Gauss routine by: Tim Prince N8TM@aol.com
> ! Crout routine by: James Van Buskirk torsop@ix.netcom.com
> ! Lapack routine by: Jos Bergervoet bergervo@IAEhv.nl
>
> REAL(RK8) :: pool(101, 101,1000), a(101, 101)
> INTEGER :: i
>
> DO i = 1,1000
> a = pool(:,:,i) ! get next matrix to invert
> END DO
>
> END PROGRAM TEST_FPU
>
> For this example (-O3 -fprefetch-loop-arrays -funroll-loops), the vectorizer peels the loop.
> And the prefetching and loop unrolling are performed on the peeled loops.
>
> In the attached patch, the vectorizer marked the loop as peeled, and the prefetching
> gives up. However, the RTL unroller could not get this information and still unroll the peeled
> loop.
>
> I need suggestion: How the optimizer recognizes that the loop is the peeled version (preloop or postloop)?
instead of the "peeled" flag, it might be better to make sure that the latter optimizers know that
the peeled loop does not roll enough (set nb_iterations_upper_bound/nb_iterations_estimate).
Passing the information from gimple to rtl loop optimizer is a long-standing problem. One possibility is
to keep the information about the loops up-to-date between the optimizers (few years ago, I made
sure that this is possible up to tree->rtl expansion, so making this work again should not be too hard;
handling the expansion and rtl passes might be a little more challenging, but not terribly so).
A less involved solution is to drop some notes in the instruction stream at the end of gimple loop
optimizer, and pick them up in the rtl one,
Zdenek