This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[patch] Loop-aware SLP


This patch adds a support of straight-line code vectorization inside loops.
This a merge from autovect branch.
So, for example, if you have an unrolled loop like this:

for (i=0; i<N; i++){
 a[4*i] = b[4*i] + x0;
 a[4*i+1] = b[4*i+1] + x1;
 a[4*i+2] = b[4*i+2] + x2;
 a[4*i+3] = b[4*i+3] + x3;
}

it will get vectorized we follows:

for (i=0; i<N; i++){
 va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x2,x3};
}

Note that, as opposed to loop vectorization, the loop count is not divided
by the vectorization factor.

This implementation also handles cases in which there are not enough scalar
stmts to fill up a vector, and unrolls the loop as necessary during the
vectorization analysis. So the following loop:

for (i=0; i<N; i++){
 a[2*i] = b[2*i] + x0;
 a[2*i+1] = b[2*i+1] + x1;
}

is vectorized into the following:

for (i=0; i<N/2; i++){
 va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x0,x1};
}


SLP opportunities are discovered based on strided access support analysis;
we start off from groups of adjacent memory-references (which is something
we already have), and analyze their def-use chain in search for sequences
of isomorphic operations that can be packed into vector operations. The
main differences between this implementation and the approach described in
the original SLP paper (http://portal.acm.org/citation.cfm?id=349320) are
the following:

- We focus on loops, rather then any basic-block in the program. We do that
because this way we can reuse all the existing loop-based vectorization
infrastructure. Also we expect that most of the worthwhile opportunities
are in loops, and in this case you loose if you work without the loop
context.

- The original SLP algorithm starts off from "seeds" of pairs of adjacent
loads/stores, and continues growing them and further merge in the
Combination stage of the algorithm (to get VS (Vector Size) sized groups).
Our seed is a groups of adjacent (interleaved) stores identified beforehand
by the interleaving analysis, so we don't need a combination stage at all.
We don't yet look at loads, and not sure we'd need to.

- Our approach combines both SLP-based and loop-based vectorization. So,
for example, if we have a loop like this:

for (i=0; i<N; i++){
 a[4*i] = b[4*i] + x0;
 a[4*i+1] = b[4*i+1] + x1;
 a[4*i+2] = b[4*i+2] + x2;
 a[4*i+3] = b[4*i+3] + x3;

 c[i] = 0;
}

we vectorize it as follows:

for (i=0; i<N/4; i++){
 va[16*i:16*i+3] = vb[16*i:16*i+3] + {x0,x1,x2,x3};
 va[16*i+4:16*i+7] = vb[16*i+4:16*i+7] + {x0,x1,x2,x3};
 va[16*i+8:16*i+11] = vb[16*i+8:16*i+11] + {x0,x1,x2,x3};
 va[16*i+12:16*i+15] = vb[16*i+12:16*i+15] + {x0,x1,x2,x3};

 vc[4*i:4*i+3] = {0,0,0,0};
}

(note that stmts 1,2,3,4 got vectorized using SLP, and stmt 5 got
vectorized using regular loop-vectorization).


We plan to extend this initial "SLP-like vectorization in loops" in the
following directions:
- support reduction
- support data permutations
- support interleaved accesses with gaps
- support loops with multiple types

More information can be found in our summit paper (
http://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=get&target=GCC2007-Proceedings.pdf).


This is a joint work with Dorit.

The patch is divided into 5 parts and is relative to this cleanup patch (
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00806.html). I am going to
submit these parts as a reply to this note.
Each part can be compiled (but with warnings about unused arguments and
variables) and passes vectorization tests. The whole patch passes bootstrap
with vectorization enabled and vectorization testcases on x86_64-linux. I
am going to run full regtesting and bootstrap on PowerPC.

O.K. for mainline once the testing is completed?

Thanks,
Ira

:ADDPATCH SSA (vectorizer):




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]