This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Attached is the updated patch (part 2, updated to a more recent snapshot, as there were a few changes that went into mainline in the meantime that made the previous patch inapplicable) Bootstrpped on powerpc64-linux, bootstrapped with vectorization enabled on i386-linux, and tested on the vectorizer testcases. dorit (See attached file: updated-outerloop-patch2.txt) > Hi, > This is the second part of > http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds support > for memory-references in the inner-loop of outer-loop-vectorization. I'll > use the following example to describe the features that were added: > for (i=0; i<N; i++){ > s=0; > for (j=0; j<M; j++) > s += a[i+j] * b[j]; > a[i]=s; > } > The patch includes the following changes to the vectorizer: > - To analyze the initial-address and step of inner-loop references relative > to the outer-loop, I used the function split_constant_offset. I basically > take the BASE+INIT+OFFSET that was computed relative to the inner-loop and > analyze it relative to the outer-loop (as discussed here > http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00648.html). > - Generally the vectorizer does not support invariant accesses. One > exception that we add here is memory-references in the inner-loop that have > a zero step in the outer-loop. For example, the b[j] access in the loop > example above. Because the access b[j] has no evolution in the outer-loop > we have to duplicate the value b[j] into all entries of the vector. At the > moment this is done by simply adding this duplication on top of the current > scheme: i.e. we continue to generate a regular vector load, and then we > extract the first element and duplicate it: > vb = b[j,j+1,j+2,j+3] > sb = BIT_FIELD_REF (vb, bitpos, bitsize) # extract > b[j] > vb = {sb, sb, sb, sb} > (there are better ways to do this - so this will be improved later). > In order to be able to use the function 'vect_init_vector' to create the > vector vb above, we extend it to poace the vector initialization at BSI (a > new argument passed to the function) instead of always insterting it at the > loop preheader. > - Support misaligned accesses. In case the misalignment remains fixed (i.e. > the step (stride) of the accesses in the inner-loop is a multiple of the > Vector Size (VS)), this can be vectorized using the optimized > realignment-scheme (which used to be called the "software-pipelined" > scheme, and is now called "optimized_explicit_realign"): the computation of > the misalignment can be taken out of the loop, and only one additional > vector load is generated (before the loop) instead of 2 in each iteration > (we basically do predictive-commoning here). In case the misalignment does > *not* remain fixed throughout the iterations of the loop (as is the case in > the example loop above), we cannot use the optimized scheme. Instead we > need to compute the misalignment inside the inner-loop along with the two > vector loads (this is the newly added "explicit_realign" scheme; for more > details can also see > http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00248.html). > (this is also planned to be improved later). > To support this a couple functions - 'vect_setup_realignment' and > 'vectorizable_load' - that used to support only the optimized realignment > scheme were modified to support both schemes. See detailed documentation in > 'vect_setup_realignment' and 'vect_supportable_dr_alignment'). > - The functions that deal with creating/initializing/updating the pointer > that is used for the vector loads/stores had to be modified a bit: > * 'vect_create_data_ref_ptr' now needs to create an update chain both > in the inner-loop and in the outer-loop. It also needs to consider if the > outer-loop step is 0 (the only case where the pointer will not be bumped by > VS (Vector Size)). See detailed documentation in this function. > * 'vect_create_addr_base_for_vector_ref' needs to know relative to > which loop the address-base is requested (to know whether to use the > step/offset/init relative to the inner or outer loop). > * 'bump_vector_ptr' is extended to support bump amounts other than VS > (we need to nump by VS-1 for the "explicit_realign" scheme). > Bootstrpped on powerpc64-linux, > bootstrapped with vectorization enabled on i386-linux, > passed full regression testing on both platforms. > I will wait at least a week to give people a chance to review and comment. > thanks, > dorit > ChangeLog: > * tree-data-refs.c (split_constant_offset): Expose. > * tree-data-refs.h (split_constant_offset): Add declaration. > > * tree-vectorizer.h (dr_alignment_support): Renamed > dr_unaligned_software_pipeline to dr_explicit_realign_optimized. > Added a new value dr_explicit_realign. > (_stmt_vec_info): Added new fields: dr_base_address, dr_init, > dr_offset, dr_step, and dr_aligned_to, along with new access > functions for these fields: STMT_VINFO_DR_BASE_ADDRESS, > STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET, STMT_VINFO_DR_STEP, and > STMT_VINFO_DR_ALIGNED_TO. > > * tree-vectorizer.c (vect_supportable_dr_alignment): Add > documentation. > In case of outer-loop vectorization with non-fixed misalignment - > use > the dr_explicit_realign scheme instead of the optimized realignment > scheme. > (new_stmt_vec_info): Initialize new fields. > > * tree-vect-analyze.c (vect_compute_data_ref_alignment): Handle the > 'nested_in_vect_loop' case. Change verbosity level. > (vect_analyze_data_ref_access): Handle the 'nested_in_vect_loop' > case. > Don't fail on zero step in the outer-loop for loads. > (vect_analyze_data_refs): Call split_constant_offset to calculate > base, > offset and init relative to the outer-loop. > > * tree-vect-transform.c (vect_create_data_ref_ptr): Replace the > unused > BSI function argument with a new function argument - at_loop. > Simplify the condition that determines STEP. Takes additional > argument > INV_P. Support outer-loop vectorization (handle the > nested_in_vect_loop > case), including zero step in the outer-loop. Call > vect_create_addr_base_for_vector_ref with additional argument. > (vect_create_addr_base_for_vector_ref): Takes additional argument > LOOP. > Updated function documentation. Handle the 'nested_in_vect_loop' > case. > Fixed and simplified calculation of step. > (vectorizable_store): Call vect_create_data_ref_ptr with loop > instead > of bsi, and with additional argument. Call bump_vector_ptr with > additional argument. Fix typos. Handle the 'nested_in_vect_loop' > case. > (vect_setup_realignment): Takes additional arguments INIT_ADDR and > DR_ALIGNMENT_SUPPORT. Returns another value AT_LOOP. Handle the > case > when the realignment setup needs to take place inside the loop. > Support > the dr_explicit_realign scheme. Allow generating the optimized > realignment scheme for outer-loop vectorization. Added > documentation. > (vectorizable_load): Support the dr_explicit_realign scheme. Handle > the > 'nested_in_vect_loop' case, including loads that are invariant in > the > outer-loop and the realignment schemes. Handle the case when the > realignment setup needs to take place inside the loop. Call > vect_setup_realignment with additional arguments. Call > vect_create_data_ref_ptr with additional argument and with loop > instead > of bsi. Fix 80-column overflow. Fix typos. Rename PHI_STMT to PHI. > (vect_gen_niters_for_prolog_loop): Call > vect_create_addr_base_for_vector_ref with additional arguments. > (vect_create_cond_for_align_checks): Likewise. > (bump_vector_ptr): Updated to support the new dr_explicit_realign > scheme: takes additional argument bump; argument ptr_incr is now > optional; updated documentation. > (vect_init_vector): Takes additional argument (bsi). Use it, if > available, to insert the vector initialization. > (get_initial_def_for_induction): Pass additional argument in call > to > vect_init_vector. > (vect_get_vec_def_for_operand): Likewise. > (vect_setup_realignment): Likewise. > (vectorizable_load): Likewise. > > testsuite/ChangeLog: > * gcc.dg/vect/vect-117.c: Change inner-loop bound to > unknown (so that outer-loop wont get analyzed). > * gcc.dg/vect/vect-outer-1a.c: New test. > * gcc.dg/vect/vect-outer-1b.c: New test. > * gcc.dg/vect/vect-outer-1.c: New test. > * gcc.dg/vect/vect-outer-2a.c: New test. > * gcc.dg/vect/vect-outer-2b.c: New test. > * gcc.dg/vect/vect-outer-2c.c: New test. > * gcc.dg/vect/vect-outer-2.c: New test. > * gcc.dg/vect/vect-outer-3a.c: New test. > * gcc.dg/vect/vect-outer-3b.c: New test. > * gcc.dg/vect/vect-outer-3c.c: New test. > * gcc.dg/vect/vect-outer-3.c: New test. > * gcc.dg/vect/vect-outer-4a.c: New test. > * gcc.dg/vect/vect-outer-4b.c: New test. > * gcc.dg/vect/vect-outer-4c.c: New test. > * gcc.dg/vect/vect-outer-4d.c: New test. > * gcc.dg/vect/vect-outer-4e.c: New test. > * gcc.dg/vect/vect-outer-4f.c: New test. > * gcc.dg/vect/vect-outer-4g.c: New test. > * gcc.dg/vect/no-section-anchors-vect-outer-4h.c: New test. > * gcc.dg/vect/vect-outer-4i.c: New test. > * gcc.dg/vect/vect-outer-4j.c: New test. > * gcc.dg/vect/vect-outer-4k.c: New test. > * gcc.dg/vect/vect-outer-4l.c: New test. > * gcc.dg/vect/vect-outer-4m.c: New test. > * gcc.dg/vect/vect-outer-4.c: New test. > * gcc.dg/vect/vect-outer-5.c: New test. > * gcc.dg/vect/vect-outer-6.c: New test. > * gcc.dg/vect/vect-outer-fir.c: New test. > * gcc.dg/vect/vect-outer-fir-lb.c: New test. > * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: New test. > > (See attached file: mainlineouterloopdiff23t.txt)
Attachment:
updated-outerloop-patch2.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |