This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
This patch extends the initial support for outer-loop vectorization (
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html) to also support
outer-loops with memory-references in their inner-loop. For now, this
doesn't include support for misaligned accesses in the inner-loop (will add
this later), and also doesn't include (yet) support for 0-stride accesses -
i.e. accesses in the inner-loop that don't advance in the outer-loop. For
example, the following loop:
for (i=0; i<N; i++){
s=0;
for (j=0; j<M; j+=4)
s += a[i+j] * b[j];
a[i]=s;
}
...when vectorized would look something like this:
for (i=0; i<N; i+=4){
vs=[0,0,0,0]
for (j=0; j<M; j+=4)
vs += a[i+j,i+1+j,i+2+j,i+3+j] * b[j,j,j,j]
a[i,i+1,i+2,i+3] = vs
}
With this patch we know how to vectorize the a[i+j] access, but we don't
yet know how to vectorize the b[j] access (because it has no evolution in
the outer-loop we have to "splat" b[j] into all entries of the vector,
which requires special support).
To analyze the initial-address and step of inner-loop references relative
to the outer-loop I used the new datarefs stuff that Zdenek recently added
to mainline (the function split_constant_offset, and bits from
dr_analyze_innermost), so part of this patch is redundant with the next
merge of mainline to the branch. I basically take the BASE+INIT+OFFSET that
was computed relative to the inner-loop and analyze it relative to the
outer-loop (as discussed here
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00648.html). (Comments are
welcome)
The testcases that get vectorized with this patch are vect-outer-[2,2
a,2c,3,3c,4d].c. E.g., the following i-loops get vectorized:
float image[2*N][2*N] __attribute__ ((__aligned__(16)));
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < N; j++) {
diff += image[j][i];
}
out[i]=diff;
}
for (k=0; k<N; k++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j+=2) {
image[k][j][i] = j+i+k;
}
}
}
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j+=4) {
diff += in[j+i];
}
out[i]=diff;
}
Outer-loop vectorization is planned to be incorporated into 4.3. The main
functionality that is yet missing is support for:
1- 0-stride accesses (as explained above)
2- misaligned accesses in the inner-loop
3- multiple types in the inner-loop
4- strided-accesses in the inner-loop
5- unknown loop bound in the outer-loop
6- misaligned accessed in the outer-loop that require peeling/versioning
7- reduction detection improvements
I'm debating how much of (1)-(4) I want to implemented on the branch before
I start sending outer-loop-vectorization patches to mainline. (?)
Items (5),(6) depend on developing loop-transformation utilities on nested
loops (peeling, unrolling, versioning). Item (7) is a general limitation we
have regardless of outer-loops (i.e. - generalizing reduction detection).
These may or may not make it to 4.3.
Bootstrapped with vectorization enabled and tested on the vectorizer
testcases on powerpc-linux and i386-linux. Committed to autovect-branch.
dorit
* tree-vect-transform.c (vect_create_addr_base_for_vector_ref):
Takes
additional argument loop. Updated the function documentation.
Handle
the 'nested_in_vect_loop' case (the case when the dataref is in an
inner-loop nested in an outer-loop that is now being vectorized).
(vect_create_data_ref_ptr): Call
vect_create_addr_base_for_vector_ref
with additional argument. Handle the 'nested_in_vect_loop' case.
(vectorizable_load): Handle the 'nested_in_vect_loop' case.
(vect_gen_niters_for_prolog_loop): Call
vect_create_addr_base_for_vector_ref with additional argument.
(vect_create_cond_for_align_checks): Likewise.
* tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the
'nested_in_vect_loop' case.
(vect_analyze_data_ref_access): Likewise.
(vect_analyze_data_refs): Likewise. Call split_constant_offset.
* tree-vectorizer.c (new_stmt_vec_info): Initialize new fields.
(vect_supportable_dr_alignment): Handle the 'nested_in_vect_loop'
case.
* tree-vectorizer.h (_stmt_vec_info): Added new fields:
dr_base_address, dr_init, dr_offset, dr_step, and dr_aligned_to,
along
with new access functions for these fields:
STMT_VINFO_DR_BASE_ADDRESS,
STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and
STMT_VINFO_DR_ALIGNED_TO.
* tree-data-refs.c (split_constant_offset): New (brought over from
mainline).
* tree-data-refs.h (split_constant_offset): Likewise.
* gcc.dg/vect/vect-outer-1.c: New test.
* gcc.dg/vect/vect-outer-1a.c: New test.
* gcc.dg/vect/vect-outer-1b.c: New test.
* gcc.dg/vect/vect-outer-2.c: New test.
* gcc.dg/vect/vect-outer-2a.c: New test.
* gcc.dg/vect/vect-outer-2b.c: New test.
* gcc.dg/vect/vect-outer-3.c: New test.
* gcc.dg/vect/vect-outer-3a.c: New test.
* gcc.dg/vect/vect-outer-3b.c: New test.
* gcc.dg/vect/vect-outer-3c.c: New test.
* gcc.dg/vect/vect-outer-4.c: New test.
* gcc.dg/vect/vect-outer-4a.c: New test.
* gcc.dg/vect/vect-outer-4b.c: New test.
* gcc.dg/vect/vect-outer-4c.c: New test.
* gcc.dg/vect/vect-outer-4d.c: New test.
* gcc.dg/vect/vect-outer-5.c: New test.
* gcc.dg/vect/vect-outer-6.c: New test.
* gcc.dg/vect/vect-outer-fir.c: New test.
Patch:
(See attached file: memrefs.may23.txt)Attachment:
memrefs.may23.txt
Description: Text document
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |