This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] [4.3 projects] outer-loop vectorization patch 2/n


Attached is the updated patch (part 2, updated to a more recent snapshot,
as there were a few changes that went into mainline in the meantime that
made the previous patch inapplicable)

Bootstrpped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
and tested on the vectorizer testcases.

dorit

(See attached file: updated-outerloop-patch2.txt)

> Hi,

> This is the second part of
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds support
> for memory-references in the inner-loop of outer-loop-vectorization. I'll
> use the following example to describe the features that were added:

> for (i=0; i<N; i++){
> s=0;
> for (j=0; j<M; j++)
> s += a[i+j] * b[j];
> a[i]=s;
> }

> The patch includes the following changes to the vectorizer:

> - To analyze the initial-address and step of inner-loop references
relative
> to the outer-loop, I used the function split_constant_offset. I basically
> take the BASE+INIT+OFFSET that was computed relative to the inner-loop
and
> analyze it relative to the outer-loop (as discussed here
> http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00648.html).
> - Generally the vectorizer does not support invariant accesses. One
> exception that we add here is memory-references in the inner-loop that
have
> a zero step in the outer-loop. For example, the b[j] access in the loop
> example above. Because the access b[j] has no evolution in the outer-loop
> we have to duplicate the value b[j] into all entries of the vector. At
the
> moment this is done by simply adding this duplication on top of the
current
> scheme: i.e. we continue to generate a regular vector load, and then we
> extract the first element and duplicate it:
> vb = b[j,j+1,j+2,j+3]
> sb = BIT_FIELD_REF (vb, bitpos, bitsize) # extract
> b[j]
> vb = {sb, sb, sb, sb}
> (there are better ways to do this - so this will be improved later).
> In order to be able to use the function 'vect_init_vector' to create the
> vector vb above, we extend it to poace the vector initialization at BSI
(a
> new argument passed to the function) instead of always insterting it at
the
> loop preheader.

> - Support misaligned accesses. In case the misalignment remains fixed
(i.e.
> the step (stride) of the accesses in the inner-loop is a multiple of the
> Vector Size (VS)), this can be vectorized using the optimized
> realignment-scheme (which used to be called the "software-pipelined"
> scheme, and is now called "optimized_explicit_realign"): the computation
of
> the misalignment can be taken out of the loop, and only one additional
> vector load is generated (before the loop) instead of 2 in each iteration
> (we basically do predictive-commoning here). In case the misalignment
does
> *not* remain fixed throughout the iterations of the loop (as is the case
in
> the example loop above), we cannot use the optimized scheme. Instead we
> need to compute the misalignment inside the inner-loop along with the two
> vector loads (this is the newly added "explicit_realign" scheme; for more
> details can also see
> http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00248.html).
> (this is also planned to be improved later).
> To support this a couple functions  - 'vect_setup_realignment' and
> 'vectorizable_load' - that used to support only the optimized realignment
> scheme were modified to support both schemes. See detailed documentation
in
> 'vect_setup_realignment' and 'vect_supportable_dr_alignment').

> - The functions that deal with creating/initializing/updating the pointer
> that is used for the vector loads/stores had to be modified a bit:
> * 'vect_create_data_ref_ptr' now needs to create an update chain both
> in the inner-loop and in the outer-loop. It also needs to consider if the
> outer-loop step is 0 (the only case where the pointer will not be bumped
by
> VS (Vector Size)). See detailed documentation in this function.
> * 'vect_create_addr_base_for_vector_ref' needs to know relative to
> which loop the address-base is requested (to know whether to use the
> step/offset/init relative to the inner or outer loop).
> * 'bump_vector_ptr' is extended to support bump amounts other than VS
> (we need to nump by VS-1 for the "explicit_realign" scheme).

> Bootstrpped on powerpc64-linux,
> bootstrapped with vectorization enabled on i386-linux,
> passed full regression testing on both platforms.

> I will wait at least a week to give people a chance to review and
comment.

> thanks,
> dorit

> ChangeLog:

> * tree-data-refs.c (split_constant_offset): Expose.
> * tree-data-refs.h (split_constant_offset): Add declaration.
>
> * tree-vectorizer.h (dr_alignment_support): Renamed
> dr_unaligned_software_pipeline to dr_explicit_realign_optimized.
> Added a new value dr_explicit_realign.
> (_stmt_vec_info): Added new fields: dr_base_address, dr_init,
> dr_offset, dr_step, and dr_aligned_to, along with new access
> functions for these fields: STMT_VINFO_DR_BASE_ADDRESS,
> STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and
> STMT_VINFO_DR_ALIGNED_TO.
>
> * tree-vectorizer.c (vect_supportable_dr_alignment): Add
> documentation.
> In case of outer-loop vectorization with non-fixed misalignment -
> use
> the dr_explicit_realign scheme instead of the optimized realignment
> scheme.
> (new_stmt_vec_info): Initialize new fields.
>
> * tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the
> 'nested_in_vect_loop' case. Change verbosity level.
> (vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop'
> case.
> Don't fail on zero step in the outer-loop for loads.
> (vect_analyze_data_refs): Call split_constant_offset to calculate
> base,
> offset and init relative to the outer-loop.
>
> * tree-vect-transform.c (vect_create_data_ref_ptr): Replace the
> unused
> BSI function argument with a new function argument - at_loop.
> Simplify the condition that determines STEP. Takes additional
> argument
> INV_P. Support outer-loop vectorization (handle the
> nested_in_vect_loop
> case), including zero step in the outer-loop. Call
> vect_create_addr_base_for_vector_ref with additional argument.
> (vect_create_addr_base_for_vector_ref): Takes additional argument
> LOOP.
> Updated function documentation. Handle the 'nested_in_vect_loop'
> case.
> Fixed and simplified calculation of step.
> (vectorizable_store): Call vect_create_data_ref_ptr with loop
> instead
> of bsi, and with additional argument. Call bump_vector_ptr with
> additional argument. Fix typos. Handle the 'nested_in_vect_loop'
> case.
> (vect_setup_realignment): Takes additional arguments INIT_ADDR and
> DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the
> case
> when the realignment setup needs to take place inside the loop.
> Support
> the dr_explicit_realign scheme. Allow generating the optimized
> realignment scheme for outer-loop vectorization. Added
> documentation.
> (vectorizable_load): Support the dr_explicit_realign scheme. Handle
> the
> 'nested_in_vect_loop' case, including loads that are invariant in
> the
> outer-loop and the realignment schemes. Handle the case when the
> realignment setup needs to take place inside the loop. Call
> vect_setup_realignment with additional arguments.  Call
> vect_create_data_ref_ptr with additional argument and with loop
> instead
> of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI.
> (vect_gen_niters_for_prolog_loop): Call
> vect_create_addr_base_for_vector_ref with additional arguments.
> (vect_create_cond_for_align_checks): Likewise.
> (bump_vector_ptr): Updated to support the new dr_explicit_realign
> scheme: takes additional argument bump; argument ptr_incr is now
> optional; updated documentation.
> (vect_init_vector): Takes additional argument (bsi). Use it, if
> available, to insert the vector initialization.
> (get_initial_def_for_induction): Pass additional argument in call
> to
> vect_init_vector.
> (vect_get_vec_def_for_operand): Likewise.
> (vect_setup_realignment): Likewise.
> (vectorizable_load): Likewise.
>
> testsuite/ChangeLog:

> * gcc.dg/vect/vect-117.c: Change inner-loop bound to
> unknown (so that outer-loop wont get analyzed).
> * gcc.dg/vect/vect-outer-1a.c: New test.
> * gcc.dg/vect/vect-outer-1b.c: New test.
> * gcc.dg/vect/vect-outer-1.c: New test.
> * gcc.dg/vect/vect-outer-2a.c: New test.
> * gcc.dg/vect/vect-outer-2b.c: New test.
> * gcc.dg/vect/vect-outer-2c.c: New test.
> * gcc.dg/vect/vect-outer-2.c: New test.
> * gcc.dg/vect/vect-outer-3a.c: New test.
> * gcc.dg/vect/vect-outer-3b.c: New test.
> * gcc.dg/vect/vect-outer-3c.c: New test.
> * gcc.dg/vect/vect-outer-3.c: New test.
> * gcc.dg/vect/vect-outer-4a.c: New test.
> * gcc.dg/vect/vect-outer-4b.c: New test.
> * gcc.dg/vect/vect-outer-4c.c: New test.
> * gcc.dg/vect/vect-outer-4d.c: New test.
> * gcc.dg/vect/vect-outer-4e.c: New test.
> * gcc.dg/vect/vect-outer-4f.c: New test.
> * gcc.dg/vect/vect-outer-4g.c: New test.
> * gcc.dg/vect/no-section-anchors-vect-outer-4h.c: New test.
> * gcc.dg/vect/vect-outer-4i.c: New test.
> * gcc.dg/vect/vect-outer-4j.c: New test.
> * gcc.dg/vect/vect-outer-4k.c: New test.
> * gcc.dg/vect/vect-outer-4l.c: New test.
> * gcc.dg/vect/vect-outer-4m.c: New test.
> * gcc.dg/vect/vect-outer-4.c: New test.
> * gcc.dg/vect/vect-outer-5.c: New test.
> * gcc.dg/vect/vect-outer-6.c: New test.
> * gcc.dg/vect/vect-outer-fir.c: New test.
> * gcc.dg/vect/vect-outer-fir-lb.c: New test.
> * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: New test.
>
> (See attached file: mainlineouterloopdiff23t.txt)

Attachment: updated-outerloop-patch2.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]