This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[autovect] [committed] support zero-step in outer-loop


This is a follow-up to
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01498.html, that removes the
restriction that memory-references in the inner-loop have to have a nonzero
step in the outer-loop. For example, with this patch we can vectorize the
'b[j]' access in the following loop:
            for (i=0; i<N; i++){
                  s=0;
                  for (j=0; j<M; j+=4)
                        s += a[i+j] * b[j];
                  a[i]=s;
            }

...into the following:
            for (i=0; i<N; i+=4){
                  vs=[0,0,0,0]
                  for (j=0; j<M; j+=4){
                        va = a[i+j,i+1+j,i+2+j,i+3+j]
                        vb = b[j,j,j,j]
                        vs += va * vb
                  }
                  a[i,i+1,i+2,i+3] = vs
            }

Note that because the access b[j] has no evolution in the outer-loop we
have to duplicate the value b[j] into all entries of the vector vb. At the
moment this is done by simply adding this duplication on top of the current
scheme: i.e. we continue to generate a regular vector load, and then we
extract the first element and duplicate it:
                        vb = b[j,j+1,j+2,j+3]
                        sb = BIT_FIELD_REF (vb, bitpos, bitsize)
                        vb = {sb, sb, sb, sb}

Another alternative would be to generate a scalar load instead of the
vector load + BIT_FIELD_REF. Regardless of how 'sb' is obtained (via a
scalar load or vector load + BIT_FIELD_REF), we get a pretty ugly code
generated for Altivec, for the same problem reported in PR32107. I don't
know if there's a solution for it at the rtl level, so I may try to do
something about it at the tree level. In short, this stmt sequence above is
something we'll want to revisit.

Bootstrapped with vectorization enabled and tested on the vectorizer
testcases on powerpc-linux and i386-linux. Committed to autovect-branch.

dorit

        * tree-vect-analyze.c (vect_analyze_data_ref_access): Don't fail on
        zero step in the outer-loop for loads.
        * tree-vect-transform.c (vect_create_data_ref_ptr): Takes
additional
        argument (inv_p). Support zero step in the outer-loop.
        (vect_init_vector): Takes additional argument (bsi). Use it, if
        available, to insert the vector initialization.
        (get_initial_def_for_induction): Pass additional argument in call
to
        vect_init_vector.
        (vect_get_vec_def_for_operand): Likewise.
        (vectorizable_store): Pass additional argument in call to
        vect_create_data_ref_ptr.
        (vect_setup_realignment): Likewise.
        (vectorizable_load): Likewise. Handle invariant load.

        *  gcc.dg/vect/vect-outer-4.c: Loop now vectorized.
        *  gcc.dg/vect/vect-outer-4c.c: Loop now vectorized.
        *  gcc.dg/vect/vect-outer-5.c: Loop now vectorized.
        *  gcc.dg/vect/vect-outer-6.c: Loop now vectorized.

Patch:

(See attached file: invload.may29.txt)

Attachment: invload.may29.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]