[patch] Loop distribution for single nested loops

Dorit Nuzman DORIT@il.ibm.com
Thu Dec 6 08:12:00 GMT 2007

Hi Sebastian,

> I could also "sell" this pass as fixing all the vectorization bugs
> concerning unaligned stores, so in some extent I did fixed bugs by
> proposing this new pass.  I hope you don't mind if in the next days
> I'm going to fix some missed-vectorization PRs ;-)

Not at all... :-) This is really good news! One thing to keep in mind is
that distributing a loop, thereby splitting two stores into separate loops,
is not necessarily the best way to handle misaligned stores. So we'd still
want to be able to handle misaligned-stores in the vectorizer (by means
other than avoiding them using loop distribution). (This is not saying
anything against applying loop-disrtibution, since the loop-distribution is
applied for other considerations (e.g. locality), and just happens to
help-out the vectorizer in terms of handling misalignment).

[Just as background for whoever is not familiar with this: the vectorizer
is able to handle only one misaligned store per loop because it doesn't
really have support for misaligned stores, so instead it peels a few
iterations in order to make that store aligned. Peeling would align all
stores (and loads) in the loop that have the same misalignment, but if for
example the misalignment of two stores in the loops is unknown - we
currently can't vectorize the loop, unless we use loop versioning.
There was a simple patch suggested a while back to add support for
misaligned stores for targets that directly support a misaligned store
(e.g. the movqu in SSE) -
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00604.html - but it was never
picked up. One of these days we'll also add support for misaligned stores
using a "realign_store" (like the "realign_load" we have for misaligned

One issue is that loop distribution may take away opportunities from the
outer-loop vectorizer. But the ways to deal with that should probably be to
extend the outer-loop vectorizer to work on loops that have more than one
loop nested in them, and/or provide ways for other loop-transformation
passes to query if a certain loop could/would be vetorized and at what
expected benefit, or have a single engine that applies most of our
loop-transformations taking both locality and vectorizability into account

About the vectorizer testsuite fails you are seeing (I'm sorry I currently
don't have time to check myself: so in none of the testcases we loose
vectorization opportunities due to loop distribution, right?) - one easy
way to deal with it is to disable loop-distribution by default in vect.exp
to preserve the current behavior of the tests, and then add other tests,
prefixed with "loop-dist-", for which vect.exp would enable
loop-distribution. These tests could be duplicates of some of the original
tests that due to loop distribution changed their behavior.


> > But, if it doesn't touch generic code and isn't on by default
> This is the case for the proposed pass.
> > I am not absolutely 100% against it.  But then I also do not see the
> > need to push it to 4.3.
> That's what we called "technology preview" when Daniel Berlin, David
> Edelsohn and I wanted to integrate the loop interchange in GCC 4.0 ;-)
> Except that this time I spent much more time to fix the bugs in this
> "technology preview" than I spent back in September 2004.  If I'm
> submitting this pass at this late point, it is also because I wanted
> to fix all the bugs that I could find, going through the transformed
> cases one by one ensuring that there is no wrong code generated, etc.
> Sebastian
> --
> AMD - GNU Tools

More information about the Gcc-patches mailing list