[RFT] [patch] Improve realignment scheme in vectorizer
Ira Rosen
IRAR@il.ibm.com
Wed Jun 23 11:17:00 GMT 2010
Hi,
One of the techniques that vectorizer uses to align misaligned accesses is
to peel several scalar iterations and start vector loop from an aligned
access (or accesses). Currently peeling is used to simply align the first
unaligned store in the loop. The attached patch tries to improve the choice
of a data-ref to peel for.
In case when one of the data accesses can be vectorized only if we peel to
align it, the decision is easy. Otherwise, we distinguish between cases
with known and unknown misalignment values. For loops with accesses with
known misalignment we either count data-refs that will be aligned or
calculate cost (if cost model is enabled) for every possible number of
iterations of peeled loop. When all the alignments are unknown we use
peeling if there is a store in the loop or if it may align other accesses
as well, i.e., if misaligned accesses are supported by the target (unlike
load realignment scheme used for Altivec). We arbitrary choose either the
first data access in the loop or the first store if unaligned store cost is
greater than unaligned load cost.
There is no cost assigned to misaligned store in i386.c. So I just made it
the same as the cost of misaligned load.
Bootstrapped and tested on x86_64-suse-linux and powerpc64-suse-linux.
Tuned on Power7. I'd appreciate (performance) testing on other platforms.
Are the non-vectorizer parts OK for trunk?
Thanks,
Ira
ChangeLog:
2010-06-22 Ira Rosen <irar@il.ibm.com>
Revital Eres <eres@il.ibm.com>
* doc/tm.texi (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Document
new arguments.
* targhooks.c (default_builtin_vectorization_cost): Add new
arguments.
Handle unaligned store.
* targhooks.h (default_builtin_vectorization_cost): Add new
arguments.
* target.h (enum vect_cost_for_stmt): Add unaligned_store.
(builtin_vectorization_cost): Add new arguments.
* tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): Take
number
of iterations of prolog loop directly from
LOOP_PEELING_FOR_ALIGNMENT.
(vect_vfa_segment_size): Fix indentation.
* tree-vectorizer.h (struct _vect_peel_info): New.
(struct _vect_peel_extended_info): New.
(struct _loop_vec_info): Add new field for peeling hash table and a
macro for its access.
(VECT_MAX_COST): Define.
(vect_get_load_cost): Declare.
(vect_get_store_cost, vect_get_known_peeling_cost,
vect_get_single_scalar_iteraion_cost): Likewise.
(vect_supportable_dr_alignment): Add new argument.
* tree-vect-loop.c (new_loop_vec_info): Initialize peeling hash table
field.
(destroy_loop_vec_info): Free peeling hash table.
(vect_analyze_loop_form): Update call to builtin_vectorization_cost.
(vect_analyze_loop): Move vect_enhance_data_refs_alignment before
vect_analyze_slp. Fix indentation.
(vect_get_single_scalar_iteraion_cost): New function.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters): Rename byte_misalign to npeel.
Call vect_get_single_scalar_iteraion_cost instead of cost_for_stmt
per
statement. Move outside cost calculation inside unknown peeling case.
Call vect_get_known_peeling_cost for known amount of peeling.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Add data
reference to the print message of forced alignment.
(vect_verify_datarefs_alignment): Update call to
vect_supportable_dr_alignment.
(vect_get_data_access_cost): New function.
(vect_peeling_hash, vect_peeling_hash_eq, vect_peeling_hash_insert,
vect_peeling_hash_get_most_frequent,
vect_peeling_hash_get_lowest_cost,
vect_peeling_hash_choose_best_peeling): Likewise.
(vect_enhance_data_refs_alignment): Fix documentation. Use hash table
to store all the accesses in the loop and find best possible access
to
align using peeling for known alignment case. For unknown alignment
check if stores are preferred or if peeling is worthy.
(vect_find_same_alignment_drs): Analyze pairs of loads too.
(vect_supportable_dr_alignment): Add new argument and check aligned
accesses according to it.
* tree-vect-stmts.c (vect_get_stmt_cost): New function.
(cost_for_stmt): Call vect_get_stmt_cost.
(vect_model_simple_cost): Likewise.
(vect_model_store_cost): Call vect_get_stmt_cost. Call
vect_get_store_cost to calculate the cost of the statement.
(vect_get_store_cost): New function.
(vect_model_load_cost): Call vect_get_stmt_cost. Call
vect_get_load_cost to calculate the cost of the statement.
(vect_get_load_cost): New function.
(vectorizable_store): Update call to vect_supportable_dr_alignment.
(vectorizable_load): Likewise.
* config/spu/spu.c (spu_builtin_vectorization_cost): Add new
arguments.
* config/i386/i386.c (ix86_builtin_vectorization_cost): Add new
arguments. Handle unaligned store.
* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): New.
(rs6000_builtin_support_vector_misalignment): Return true for word
and
double word alignments for VSX.
* tree-vect-slp.c (vect_build_slp_tree): Update calls to
vect_supportable_dr_alignment and builtin_vectorization_cost.
testsuite/ChangeLog:
2010-06-22 Ira Rosen <irar@il.ibm.com>
Revital Eres <eres@il.ibm.com>
* gcc.dg/vect/vect-42.c: Don't expect peeling on targets that support
misaligned stores.
* gcc.dg/vect/vect-60.c, gcc.dg/vect/vect-56.c,
gcc.dg/vect/vect-93.c,
gcc.dg/vect/vect-96.c: Likewise.
* gcc.dg/vect/vect-109.c: Expect vectorization only on targets that
that support misaligned stores. Change the number of expected
misaligned
accesses.
* gcc.dg/vect/vect-peel-1.c: New test.
* gcc.dg/vect/vect-peel-2.c, gcc.dg/vect/vect-peel-3.c,
gcc.dg/vect/vect-peel-1.c: Likewise.
* gcc.dg/vect/vect-multitypes-1.c: Change the test to make it
vectorizable on all targets that support realignment.
* gcc.dg/vect/vect-multitypes-4.c: Likewise.
(See attached file: alignment.txt)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: alignment.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20100623/e9493e92/attachment.txt>
More information about the Gcc-patches
mailing list