This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, One of the techniques that vectorizer uses to align misaligned accesses is to peel several scalar iterations and start vector loop from an aligned access (or accesses). Currently peeling is used to simply align the first unaligned store in the loop. The attached patch tries to improve the choice of a data-ref to peel for. In case when one of the data accesses can be vectorized only if we peel to align it, the decision is easy. Otherwise, we distinguish between cases with known and unknown misalignment values. For loops with accesses with known misalignment we either count data-refs that will be aligned or calculate cost (if cost model is enabled) for every possible number of iterations of peeled loop. When all the alignments are unknown we use peeling if there is a store in the loop or if it may align other accesses as well, i.e., if misaligned accesses are supported by the target (unlike load realignment scheme used for Altivec). We arbitrary choose either the first data access in the loop or the first store if unaligned store cost is greater than unaligned load cost. There is no cost assigned to misaligned store in i386.c. So I just made it the same as the cost of misaligned load. Bootstrapped and tested on x86_64-suse-linux and powerpc64-suse-linux. Tuned on Power7. I'd appreciate (performance) testing on other platforms. Are the non-vectorizer parts OK for trunk? Thanks, Ira ChangeLog: 2010-06-22 Ira Rosen <irar@il.ibm.com> Revital Eres <eres@il.ibm.com> * doc/tm.texi (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Document new arguments. * targhooks.c (default_builtin_vectorization_cost): Add new arguments. Handle unaligned store. * targhooks.h (default_builtin_vectorization_cost): Add new arguments. * target.h (enum vect_cost_for_stmt): Add unaligned_store. (builtin_vectorization_cost): Add new arguments. * tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): Take number of iterations of prolog loop directly from LOOP_PEELING_FOR_ALIGNMENT. (vect_vfa_segment_size): Fix indentation. * tree-vectorizer.h (struct _vect_peel_info): New. (struct _vect_peel_extended_info): New. (struct _loop_vec_info): Add new field for peeling hash table and a macro for its access. (VECT_MAX_COST): Define. (vect_get_load_cost): Declare. (vect_get_store_cost, vect_get_known_peeling_cost, vect_get_single_scalar_iteraion_cost): Likewise. (vect_supportable_dr_alignment): Add new argument. * tree-vect-loop.c (new_loop_vec_info): Initialize peeling hash table field. (destroy_loop_vec_info): Free peeling hash table. (vect_analyze_loop_form): Update call to builtin_vectorization_cost. (vect_analyze_loop): Move vect_enhance_data_refs_alignment before vect_analyze_slp. Fix indentation. (vect_get_single_scalar_iteraion_cost): New function. (vect_get_known_peeling_cost): Likewise. (vect_estimate_min_profitable_iters): Rename byte_misalign to npeel. Call vect_get_single_scalar_iteraion_cost instead of cost_for_stmt per statement. Move outside cost calculation inside unknown peeling case. Call vect_get_known_peeling_cost for known amount of peeling. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Add data reference to the print message of forced alignment. (vect_verify_datarefs_alignment): Update call to vect_supportable_dr_alignment. (vect_get_data_access_cost): New function. (vect_peeling_hash, vect_peeling_hash_eq, vect_peeling_hash_insert, vect_peeling_hash_get_most_frequent, vect_peeling_hash_get_lowest_cost, vect_peeling_hash_choose_best_peeling): Likewise. (vect_enhance_data_refs_alignment): Fix documentation. Use hash table to store all the accesses in the loop and find best possible access to align using peeling for known alignment case. For unknown alignment check if stores are preferred or if peeling is worthy. (vect_find_same_alignment_drs): Analyze pairs of loads too. (vect_supportable_dr_alignment): Add new argument and check aligned accesses according to it. * tree-vect-stmts.c (vect_get_stmt_cost): New function. (cost_for_stmt): Call vect_get_stmt_cost. (vect_model_simple_cost): Likewise. (vect_model_store_cost): Call vect_get_stmt_cost. Call vect_get_store_cost to calculate the cost of the statement. (vect_get_store_cost): New function. (vect_model_load_cost): Call vect_get_stmt_cost. Call vect_get_load_cost to calculate the cost of the statement. (vect_get_load_cost): New function. (vectorizable_store): Update call to vect_supportable_dr_alignment. (vectorizable_load): Likewise. * config/spu/spu.c (spu_builtin_vectorization_cost): Add new arguments. * config/i386/i386.c (ix86_builtin_vectorization_cost): Add new arguments. Handle unaligned store. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): New. (rs6000_builtin_support_vector_misalignment): Return true for word and double word alignments for VSX. * tree-vect-slp.c (vect_build_slp_tree): Update calls to vect_supportable_dr_alignment and builtin_vectorization_cost. testsuite/ChangeLog: 2010-06-22 Ira Rosen <irar@il.ibm.com> Revital Eres <eres@il.ibm.com> * gcc.dg/vect/vect-42.c: Don't expect peeling on targets that support misaligned stores. * gcc.dg/vect/vect-60.c, gcc.dg/vect/vect-56.c, gcc.dg/vect/vect-93.c, gcc.dg/vect/vect-96.c: Likewise. * gcc.dg/vect/vect-109.c: Expect vectorization only on targets that that support misaligned stores. Change the number of expected misaligned accesses. * gcc.dg/vect/vect-peel-1.c: New test. * gcc.dg/vect/vect-peel-2.c, gcc.dg/vect/vect-peel-3.c, gcc.dg/vect/vect-peel-1.c: Likewise. * gcc.dg/vect/vect-multitypes-1.c: Change the test to make it vectorizable on all targets that support realignment. * gcc.dg/vect/vect-multitypes-4.c: Likewise. (See attached file: alignment.txt)
Attachment:
alignment.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |