This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This is a follow-up to http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01498.html, that removes the restriction that memory-references in the inner-loop have to have a nonzero step in the outer-loop. For example, with this patch we can vectorize the 'b[j]' access in the following loop: for (i=0; i<N; i++){ s=0; for (j=0; j<M; j+=4) s += a[i+j] * b[j]; a[i]=s; } ...into the following: for (i=0; i<N; i+=4){ vs=[0,0,0,0] for (j=0; j<M; j+=4){ va = a[i+j,i+1+j,i+2+j,i+3+j] vb = b[j,j,j,j] vs += va * vb } a[i,i+1,i+2,i+3] = vs } Note that because the access b[j] has no evolution in the outer-loop we have to duplicate the value b[j] into all entries of the vector vb. At the moment this is done by simply adding this duplication on top of the current scheme: i.e. we continue to generate a regular vector load, and then we extract the first element and duplicate it: vb = b[j,j+1,j+2,j+3] sb = BIT_FIELD_REF (vb, bitpos, bitsize) vb = {sb, sb, sb, sb} Another alternative would be to generate a scalar load instead of the vector load + BIT_FIELD_REF. Regardless of how 'sb' is obtained (via a scalar load or vector load + BIT_FIELD_REF), we get a pretty ugly code generated for Altivec, for the same problem reported in PR32107. I don't know if there's a solution for it at the rtl level, so I may try to do something about it at the tree level. In short, this stmt sequence above is something we'll want to revisit. Bootstrapped with vectorization enabled and tested on the vectorizer testcases on powerpc-linux and i386-linux. Committed to autovect-branch. dorit * tree-vect-analyze.c (vect_analyze_data_ref_access): Don't fail on zero step in the outer-loop for loads. * tree-vect-transform.c (vect_create_data_ref_ptr): Takes additional argument (inv_p). Support zero step in the outer-loop. (vect_init_vector): Takes additional argument (bsi). Use it, if available, to insert the vector initialization. (get_initial_def_for_induction): Pass additional argument in call to vect_init_vector. (vect_get_vec_def_for_operand): Likewise. (vectorizable_store): Pass additional argument in call to vect_create_data_ref_ptr. (vect_setup_realignment): Likewise. (vectorizable_load): Likewise. Handle invariant load. * gcc.dg/vect/vect-outer-4.c: Loop now vectorized. * gcc.dg/vect/vect-outer-4c.c: Loop now vectorized. * gcc.dg/vect/vect-outer-5.c: Loop now vectorized. * gcc.dg/vect/vect-outer-6.c: Loop now vectorized. Patch: (See attached file: invload.may29.txt)
Attachment:
invload.may29.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |