This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
- From: Dorit Naishlos <DORIT at il dot ibm dot com>
- To: "Giovanni Bajo" <giovannibajo at libero dot it>
- Cc: gcc-patches at gcc dot gnu dot org, "Richard Guenther" <rguenth at tat dot physik dot uni-tuebingen dot de>
- Date: Thu, 20 Jan 2005 17:32:33 +0200
- Subject: Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
> Richard Guenther <firstname.lastname@example.org> wrote:
> >> For instance, if it is a little too expensive we could activate it
> >> only at -O3, but I can't see numbers (again) supporting this patch.
> > The main reason for not enabling this by default is that it may
> > interact badly with vectorization according to Dorit.
> This is actually not a concern since -ftree-vectorize is disabled by
> Anyhow, I guess this is a bug in the vectorizer.
It's a loop vectorizer, so I don't really see it as a bug. The loop
vectorizer is supposed to look for vectorization opportunities in loops,
looking for parallelism across loop iterations.
> If this really really comes
> into Dorit's way badly, you can always make -ftree-vectorize disableit,
then we've crippled code quality when vectorization is on
> but it
> really should not be needed.
Once you have completely unrolled a loop you have eliminated it from being
a candidate for loop-vectorization (obviously, it's not a loop anymore). It
may not be such a tragedy because these are small loops to begin with, so
vectorization overhead may not justify vectorizing them, but I think it
should be the vectorizer's decision.
Indeed it will be a candidate for straight-line code vectorization (SLP
like), but this would be a different algorithm and a separate optimization
pass, with it's own limitations and costs. By the way, the way icc
implements SLP is by rerolling such straight-line code sequences, and then
these newly formed small loops are fed to the loop vectorizer like any
other loop. If we ever chose to take this route, we'll find ourselves
unrolling (for SRA) then rerolling for vectorization then unrolling
(Richard Guenther wrote in
> I did experiments with scheduling a second SRA pass after
> loop optimization, but that doesn't help ivopts - we'd need
> another loop pass after the second SRA this way.
Maybe adding this extra ivopts pass isn't that bad? If that would allow
scheduling vectorization then loop-unrolling then SRA I think that would be
One final thought - if we did end up enabling complete loop unrolling
before vectorization, could we maintain the information that this was once
a loop (i.e that flow_loop_find will find it as a loop with 1 iteration)?
don't know if it makes much sense, but at least this would allow a loop
based vectorizer to do something with it in the future, if/when it is
extended to look for intra-iteration opportunities, rather than only
cross-iteration opportunities (which is something I am considering to do,
but this would still be looking only at loops).
> > I'll get you numbers for
> > PR8361 and POOMA once a disable-checking bootstrap completed
> Giovanni Bajo