This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Let the target choose a vectorisation alignment
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Sandiford <richard dot sandiford at linaro dot org>
- Date: Mon, 18 Sep 2017 15:39:45 +0200
- Subject: Re: Let the target choose a vectorisation alignment
- Authentication-results: sourceware.org; auth=none
- References: <877ewwgqvy.fsf@linaro.org>
On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
> there was also a hard-coded assumption that this was equal to the type
> size. This was inconvenient for SVE for two reasons:
>
> - When compiling for a specific power-of-2 SVE vector length, we might
> want to align to a full vector. However, the TYPE_ALIGN is governed
> by the ABI alignment, which is 128 bits regardless of size.
>
> - For vector-length-agnostic code it doesn't usually make sense to align,
> since the runtime vector length might not be a power of two. Even for
> power of two sizes, there's no guarantee that aligning to the previous
> 16 bytes will be an improveent.
>
> This patch therefore adds a target hook to control the preferred
> vectoriser (as opposed to ABI) alignment.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory. OK to install?
Did you specifically choose to pass the hook a vector type rather than
a mode? I suppose in peeling for alignment the target should be able to
prevent peeling by returning element alignment from the hook?
Ok.
Thanks,
Richard.
> Richard
>
>
> 2017-09-18 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * target.def (preferred_vector_alignment): New hook.
> * doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New
> hook.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_preferred_vector_alignment): Declare.
> * targhooks.c (default_preferred_vector_alignment): New function.
> * tree-vectorizer.h (dataref_aux): Add a target_alignment field.
> Expand commentary.
> (DR_TARGET_ALIGNMENT): New macro.
> (aligned_access_p): Update commentary.
> (vect_known_alignment_in_bytes): New function.
> * tree-vect-data-refs.c (vect_calculate_required_alignment): New
> function.
> (vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT.
> Calculate the misalignment based on the target alignment rather than
> the vector size.
> (vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT
> rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment.
> (vect_enhance_data_refs_alignment): Mask the byte misalignment with
> the target alignment, rather than masking the element misalignment
> with the number of elements in a vector. Also use the target
> alignment when calculating the maximum number of peels.
> (vect_find_same_alignment_drs): Use vect_calculate_required_alignment
> instead of TYPE_ALIGN_UNIT.
> (vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter.
> Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT.
> (vect_create_addr_base_for_vector_ref): Update call accordingly.
> (vect_create_data_ref_ptr): Likewise.
> (vect_setup_realignment): Realign by ANDing with
> -DR_TARGET_MISALIGNMENT.
> * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate
> the number of peels based on DR_TARGET_ALIGNMENT.
> * tree-vect-stmts.c (get_group_load_store_type): Compare the gap
> with the guaranteed alignment boundary when deciding whether
> overrun is OK.
> (vectorizable_mask_load_store): Interpret DR_MISALIGNMENT
> relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT.
> (ensure_base_align): Remove stmt_info parameter. Get the
> target base alignment from DR_TARGET_ALIGNMENT.
> (vectorizable_store): Update call accordingly. Interpret
> DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of
> TYPE_ALIGN_UNIT.
> (vectorizable_load): Likewise.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording
> of alignment message.
> * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise.
>
> Index: gcc/target.def
> ===================================================================
> *** gcc/target.def 2017-09-18 12:56:24.635070853 +0100
> --- gcc/target.def 2017-09-18 12:56:24.847378559 +0100
> *************** misalignment value (@var{misalign}).",
> *** 1820,1825 ****
> --- 1820,1839 ----
> int, (enum vect_cost_for_stmt type_of_cost, tree vectype, int misalign),
> default_builtin_vectorization_cost)
>
> + DEFHOOK
> + (preferred_vector_alignment,
> + "This hook returns the preferred alignment in bits for accesses to\n\
> + vectors of type @var{type} in vectorized code. This might be less than\n\
> + or greater than the ABI-defined value returned by\n\
> + @code{TARGET_VECTOR_ALIGNMENT}. It can be equal to the alignment of\n\
> + a single element, in which case the vectorizer will not try to optimize\n\
> + for alignment.\n\
> + \n\
> + The default hook returns @code{TYPE_ALIGN (@var{type})}, which is\n\
> + correct for most targets.",
> + HOST_WIDE_INT, (const_tree type),
> + default_preferred_vector_alignment)
> +
> /* Return true if vector alignment is reachable (by peeling N
> iterations) for the given scalar type. */
> DEFHOOK
> Index: gcc/doc/tm.texi.in
> ===================================================================
> *** gcc/doc/tm.texi.in 2017-09-18 12:56:24.635070853 +0100
> --- gcc/doc/tm.texi.in 2017-09-18 12:56:24.846475122 +0100
> *************** address; but often a machine-dependent
> *** 4086,4091 ****
> --- 4086,4093 ----
>
> @hook TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>
> + @hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
> +
> @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
>
> @hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
> Index: gcc/doc/tm.texi
> ===================================================================
> *** gcc/doc/tm.texi 2017-09-18 12:56:24.635070853 +0100
> --- gcc/doc/tm.texi 2017-09-18 12:56:24.846475122 +0100
> *************** For vector memory operations the cost ma
> *** 5754,5759 ****
> --- 5754,5771 ----
> misalignment value (@var{misalign}).
> @end deftypefn
>
> + @deftypefn {Target Hook} HOST_WIDE_INT TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT (const_tree @var{type})
> + This hook returns the preferred alignment in bits for accesses to
> + vectors of type @var{type} in vectorized code. This might be less than
> + or greater than the ABI-defined value returned by
> + @code{TARGET_VECTOR_ALIGNMENT}. It can be equal to the alignment of
> + a single element, in which case the vectorizer will not try to optimize
> + for alignment.
> +
> + The default hook returns @code{TYPE_ALIGN (@var{type})}, which is
> + correct for most targets.
> + @end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE (const_tree @var{type}, bool @var{is_packed})
> Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type @var{type}. @var{is_packed} is false if the scalar access using @var{type} is known to be naturally aligned.
> @end deftypefn
> Index: gcc/targhooks.h
> ===================================================================
> *** gcc/targhooks.h 2017-09-18 12:56:24.635070853 +0100
> --- gcc/targhooks.h 2017-09-18 12:56:24.847378559 +0100
> *************** extern tree default_builtin_reciprocal (
> *** 95,100 ****
> --- 95,101 ----
>
> extern HOST_WIDE_INT default_vector_alignment (const_tree);
>
> + extern HOST_WIDE_INT default_preferred_vector_alignment (const_tree);
> extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
> extern bool
> default_builtin_support_vector_misalignment (machine_mode mode,
> Index: gcc/targhooks.c
> ===================================================================
> *** gcc/targhooks.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/targhooks.c 2017-09-18 12:56:24.847378559 +0100
> *************** default_vector_alignment (const_tree typ
> *** 1175,1180 ****
> --- 1175,1189 ----
> return align;
> }
>
> + /* The default implementation of
> + TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. */
> +
> + HOST_WIDE_INT
> + default_preferred_vector_alignment (const_tree type)
> + {
> + return TYPE_ALIGN (type);
> + }
> +
> /* By default assume vectors of element TYPE require a multiple of the natural
> alignment of TYPE. TYPE is naturally aligned if IS_PACKED is false. */
> bool
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h 2017-09-18 12:56:24.635070853 +0100
> --- gcc/tree-vectorizer.h 2017-09-18 12:56:24.850088870 +0100
> *************** #define PURE_SLP_STMT(S)
> *** 790,796 ****
> --- 790,800 ----
> #define STMT_SLP_TYPE(S) (S)->slp_type
>
> struct dataref_aux {
> + /* The misalignment in bytes of the reference, or -1 if not known. */
> int misalignment;
> + /* The byte alignment that we'd ideally like the reference to have,
> + and the value that misalignment is measured against. */
> + int target_alignment;
> /* If true the alignment of base_decl needs to be increased. */
> bool base_misaligned;
> tree base_decl;
> *************** #define DR_MISALIGNMENT(DR) dr_misalignm
> *** 1037,1043 ****
> #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
> #define DR_MISALIGNMENT_UNKNOWN (-1)
>
> ! /* Return TRUE if the data access is aligned, and FALSE otherwise. */
>
> static inline bool
> aligned_access_p (struct data_reference *data_ref_info)
> --- 1041,1051 ----
> #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
> #define DR_MISALIGNMENT_UNKNOWN (-1)
>
> ! /* Only defined once DR_MISALIGNMENT is defined. */
> ! #define DR_TARGET_ALIGNMENT(DR) DR_VECT_AUX (DR)->target_alignment
> !
> ! /* Return true if data access DR is aligned to its target alignment
> ! (which may be less than a full vector). */
>
> static inline bool
> aligned_access_p (struct data_reference *data_ref_info)
> *************** known_alignment_for_access_p (struct dat
> *** 1054,1059 ****
> --- 1062,1080 ----
> return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
> }
>
> + /* Return the minimum alignment in bytes that the vectorized version
> + of DR is guaranteed to have. */
> +
> + static inline unsigned int
> + vect_known_alignment_in_bytes (struct data_reference *dr)
> + {
> + if (DR_MISALIGNMENT (dr) == DR_MISALIGNMENT_UNKNOWN)
> + return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr)));
> + if (DR_MISALIGNMENT (dr) == 0)
> + return DR_TARGET_ALIGNMENT (dr);
> + return DR_MISALIGNMENT (dr) & -DR_MISALIGNMENT (dr);
> + }
> +
> /* Return the behavior of DR with respect to the vectorization context
> (which for outer loop vectorization might not be the behavior recorded
> in DR itself). */
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> *** gcc/tree-vect-data-refs.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/tree-vect-data-refs.c 2017-09-18 12:56:24.849185433 +0100
> *************** vect_record_base_alignments (vec_info *v
> *** 775,780 ****
> --- 775,791 ----
> }
> }
>
> + /* Return the target alignment for the vectorized form of DR. */
> +
> + static unsigned int
> + vect_calculate_target_alignment (struct data_reference *dr)
> + {
> + gimple *stmt = DR_STMT (dr);
> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> + tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> + return targetm.vectorize.preferred_vector_alignment (vectype);
> + }
> +
> /* Function vect_compute_data_ref_alignment
>
> Compute the misalignment of the data reference DR.
> *************** vect_compute_data_ref_alignment (struct
> *** 811,816 ****
> --- 822,831 ----
> innermost_loop_behavior *drb = vect_dr_behavior (dr);
> bool step_preserves_misalignment_p;
>
> + unsigned HOST_WIDE_INT vector_alignment
> + = vect_calculate_target_alignment (dr) / BITS_PER_UNIT;
> + DR_TARGET_ALIGNMENT (dr) = vector_alignment;
> +
> /* No step for BB vectorization. */
> if (!loop)
> {
> *************** vect_compute_data_ref_alignment (struct
> *** 823,865 ****
> relative to the outer-loop (LOOP). This is ok only if the misalignment
> stays the same throughout the execution of the inner-loop, which is why
> we have to check that the stride of the dataref in the inner-loop evenly
> ! divides by the vector size. */
> else if (nested_in_vect_loop_p (loop, stmt))
> {
> step_preserves_misalignment_p
> ! = (DR_STEP_ALIGNMENT (dr)
> ! % GET_MODE_SIZE (TYPE_MODE (vectype))) == 0;
>
> if (dump_enabled_p ())
> {
> if (step_preserves_misalignment_p)
> dump_printf_loc (MSG_NOTE, vect_location,
> ! "inner step divides the vector-size.\n");
> else
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> ! "inner step doesn't divide the vector-size.\n");
> }
> }
>
> /* Similarly we can only use base and misalignment information relative to
> an innermost loop if the misalignment stays the same throughout the
> execution of the loop. As above, this is the case if the stride of
> ! the dataref evenly divides by the vector size. */
> else
> {
> unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> step_preserves_misalignment_p
> ! = ((DR_STEP_ALIGNMENT (dr) * vf)
> ! % GET_MODE_SIZE (TYPE_MODE (vectype))) == 0;
>
> if (!step_preserves_misalignment_p && dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> ! "step doesn't divide the vector-size.\n");
> }
>
> unsigned int base_alignment = drb->base_alignment;
> unsigned int base_misalignment = drb->base_misalignment;
> - unsigned HOST_WIDE_INT vector_alignment = TYPE_ALIGN_UNIT (vectype);
>
> /* Calculate the maximum of the pooled base address alignment and the
> alignment that we can compute for DR itself. */
> --- 838,878 ----
> relative to the outer-loop (LOOP). This is ok only if the misalignment
> stays the same throughout the execution of the inner-loop, which is why
> we have to check that the stride of the dataref in the inner-loop evenly
> ! divides by the vector alignment. */
> else if (nested_in_vect_loop_p (loop, stmt))
> {
> step_preserves_misalignment_p
> ! = (DR_STEP_ALIGNMENT (dr) % vector_alignment) == 0;
>
> if (dump_enabled_p ())
> {
> if (step_preserves_misalignment_p)
> dump_printf_loc (MSG_NOTE, vect_location,
> ! "inner step divides the vector alignment.\n");
> else
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> ! "inner step doesn't divide the vector"
> ! " alignment.\n");
> }
> }
>
> /* Similarly we can only use base and misalignment information relative to
> an innermost loop if the misalignment stays the same throughout the
> execution of the loop. As above, this is the case if the stride of
> ! the dataref evenly divides by the alignment. */
> else
> {
> unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> step_preserves_misalignment_p
> ! = ((DR_STEP_ALIGNMENT (dr) * vf) % vector_alignment) == 0;
>
> if (!step_preserves_misalignment_p && dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> ! "step doesn't divide the vector alignment.\n");
> }
>
> unsigned int base_alignment = drb->base_alignment;
> unsigned int base_misalignment = drb->base_misalignment;
>
> /* Calculate the maximum of the pooled base address alignment and the
> alignment that we can compute for DR itself. */
> *************** vect_update_misalignment_for_peel (struc
> *** 1007,1015 ****
> {
> bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0;
> int misal = DR_MISALIGNMENT (dr);
> - tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> misal += negative ? -npeel * dr_size : npeel * dr_size;
> ! misal &= (TYPE_ALIGN (vectype) / BITS_PER_UNIT) - 1;
> SET_DR_MISALIGNMENT (dr, misal);
> return;
> }
> --- 1020,1027 ----
> {
> bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0;
> int misal = DR_MISALIGNMENT (dr);
> misal += negative ? -npeel * dr_size : npeel * dr_size;
> ! misal &= DR_TARGET_ALIGNMENT (dr) - 1;
> SET_DR_MISALIGNMENT (dr, misal);
> return;
> }
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1657,1672 ****
> {
> if (known_alignment_for_access_p (dr))
> {
> ! unsigned int npeel_tmp = 0;
> bool negative = tree_int_cst_compare (DR_STEP (dr),
> size_zero_node) < 0;
>
> ! vectype = STMT_VINFO_VECTYPE (stmt_info);
> ! nelements = TYPE_VECTOR_SUBPARTS (vectype);
> ! mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr);
> if (DR_MISALIGNMENT (dr) != 0)
> ! npeel_tmp = (negative ? (mis - nelements)
> ! : (nelements - mis)) & (nelements - 1);
>
> /* For multiple types, it is possible that the bigger type access
> will have more than one peeling option. E.g., a loop with two
> --- 1669,1685 ----
> {
> if (known_alignment_for_access_p (dr))
> {
> ! unsigned int npeel_tmp = 0;
> bool negative = tree_int_cst_compare (DR_STEP (dr),
> size_zero_node) < 0;
>
> ! vectype = STMT_VINFO_VECTYPE (stmt_info);
> ! nelements = TYPE_VECTOR_SUBPARTS (vectype);
> ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr);
> ! unsigned int dr_size = vect_get_scalar_dr_size (dr);
> ! mis = (negative ? DR_MISALIGNMENT (dr) : -DR_MISALIGNMENT (dr));
> if (DR_MISALIGNMENT (dr) != 0)
> ! npeel_tmp = (mis & (target_align - 1)) / dr_size;
>
> /* For multiple types, it is possible that the bigger type access
> will have more than one peeling option. E.g., a loop with two
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1701,1707 ****
> {
> vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
> dr, npeel_tmp);
> ! npeel_tmp += nelements;
> }
>
> one_misalignment_known = true;
> --- 1714,1720 ----
> {
> vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
> dr, npeel_tmp);
> ! npeel_tmp += target_align / dr_size;
> }
>
> one_misalignment_known = true;
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1922,1928 ****
> stmt = DR_STMT (dr0);
> stmt_info = vinfo_for_stmt (stmt);
> vectype = STMT_VINFO_VECTYPE (stmt_info);
> - nelements = TYPE_VECTOR_SUBPARTS (vectype);
>
> if (known_alignment_for_access_p (dr0))
> {
> --- 1935,1940 ----
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1935,1943 ****
> updating DR_MISALIGNMENT values. The peeling factor is the
> vectorization factor minus the misalignment as an element
> count. */
> ! mis = DR_MISALIGNMENT (dr0) / vect_get_scalar_dr_size (dr0);
> ! npeel = ((negative ? mis - nelements : nelements - mis)
> ! & (nelements - 1));
> }
>
> /* For interleaved data access every iteration accesses all the
> --- 1947,1956 ----
> updating DR_MISALIGNMENT values. The peeling factor is the
> vectorization factor minus the misalignment as an element
> count. */
> ! mis = negative ? DR_MISALIGNMENT (dr0) : -DR_MISALIGNMENT (dr0);
> ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr0);
> ! npeel = ((mis & (target_align - 1))
> ! / vect_get_scalar_dr_size (dr0));
> }
>
> /* For interleaved data access every iteration accesses all the
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1976,1985 ****
> unsigned max_peel = npeel;
> if (max_peel == 0)
> {
> ! gimple *dr_stmt = DR_STMT (dr0);
> ! stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt);
> ! tree vtype = STMT_VINFO_VECTYPE (vinfo);
> ! max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1;
> }
> if (max_peel > max_allowed_peel)
> {
> --- 1989,1996 ----
> unsigned max_peel = npeel;
> if (max_peel == 0)
> {
> ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr0);
> ! max_peel = target_align / vect_get_scalar_dr_size (dr0) - 1;
> }
> if (max_peel > max_allowed_peel)
> {
> *************** vect_find_same_alignment_drs (struct dat
> *** 2201,2208 ****
> if (diff != 0)
> {
> /* Get the wider of the two alignments. */
> ! unsigned int align_a = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmtinfo_a));
> ! unsigned int align_b = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmtinfo_b));
> unsigned int max_align = MAX (align_a, align_b);
>
> /* Require the gap to be a multiple of the larger vector alignment. */
> --- 2212,2221 ----
> if (diff != 0)
> {
> /* Get the wider of the two alignments. */
> ! unsigned int align_a = (vect_calculate_target_alignment (dra)
> ! / BITS_PER_UNIT);
> ! unsigned int align_b = (vect_calculate_target_alignment (drb)
> ! / BITS_PER_UNIT);
> unsigned int max_align = MAX (align_a, align_b);
>
> /* Require the gap to be a multiple of the larger vector alignment. */
> *************** vect_get_new_ssa_name (tree type, enum v
> *** 3995,4010 ****
> /* Duplicate ptr info and set alignment/misaligment on NAME from DR. */
>
> static void
> ! vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr,
> ! stmt_vec_info stmt_info)
> {
> duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
> - unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info));
> int misalign = DR_MISALIGNMENT (dr);
> if (misalign == DR_MISALIGNMENT_UNKNOWN)
> mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
> else
> ! set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign);
> }
>
> /* Function vect_create_addr_base_for_vector_ref.
> --- 4008,4022 ----
> /* Duplicate ptr info and set alignment/misaligment on NAME from DR. */
>
> static void
> ! vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr)
> {
> duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
> int misalign = DR_MISALIGNMENT (dr);
> if (misalign == DR_MISALIGNMENT_UNKNOWN)
> mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
> else
> ! set_ptr_info_alignment (SSA_NAME_PTR_INFO (name),
> ! DR_TARGET_ALIGNMENT (dr), misalign);
> }
>
> /* Function vect_create_addr_base_for_vector_ref.
> *************** vect_create_addr_base_for_vector_ref (gi
> *** 4109,4115 ****
> && TREE_CODE (addr_base) == SSA_NAME
> && !SSA_NAME_PTR_INFO (addr_base))
> {
> ! vect_duplicate_ssa_name_ptr_info (addr_base, dr, stmt_info);
> if (offset || byte_offset)
> mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (addr_base));
> }
> --- 4121,4127 ----
> && TREE_CODE (addr_base) == SSA_NAME
> && !SSA_NAME_PTR_INFO (addr_base))
> {
> ! vect_duplicate_ssa_name_ptr_info (addr_base, dr);
> if (offset || byte_offset)
> mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (addr_base));
> }
> *************** vect_create_data_ref_ptr (gimple *stmt,
> *** 4368,4375 ****
> /* Copy the points-to information if it exists. */
> if (DR_PTR_INFO (dr))
> {
> ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr, stmt_info);
> ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr, stmt_info);
> }
> if (ptr_incr)
> *ptr_incr = incr;
> --- 4380,4387 ----
> /* Copy the points-to information if it exists. */
> if (DR_PTR_INFO (dr))
> {
> ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr);
> ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr);
> }
> if (ptr_incr)
> *ptr_incr = incr;
> *************** vect_create_data_ref_ptr (gimple *stmt,
> *** 4398,4405 ****
> /* Copy the points-to information if it exists. */
> if (DR_PTR_INFO (dr))
> {
> ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr, stmt_info);
> ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr, stmt_info);
> }
> if (ptr_incr)
> *ptr_incr = incr;
> --- 4410,4417 ----
> /* Copy the points-to information if it exists. */
> if (DR_PTR_INFO (dr))
> {
> ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr);
> ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr);
> }
> if (ptr_incr)
> *ptr_incr = incr;
> *************** vect_setup_realignment (gimple *stmt, gi
> *** 5003,5012 ****
> new_temp = copy_ssa_name (ptr);
> else
> new_temp = make_ssa_name (TREE_TYPE (ptr));
> new_stmt = gimple_build_assign
> (new_temp, BIT_AND_EXPR, ptr,
> ! build_int_cst (TREE_TYPE (ptr),
> ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> new_bb = gsi_insert_on_edge_immediate (pe, new_stmt);
> gcc_assert (!new_bb);
> data_ref
> --- 5015,5024 ----
> new_temp = copy_ssa_name (ptr);
> else
> new_temp = make_ssa_name (TREE_TYPE (ptr));
> + unsigned int align = DR_TARGET_ALIGNMENT (dr);
> new_stmt = gimple_build_assign
> (new_temp, BIT_AND_EXPR, ptr,
> ! build_int_cst (TREE_TYPE (ptr), -(HOST_WIDE_INT) align));
> new_bb = gsi_insert_on_edge_immediate (pe, new_stmt);
> gcc_assert (!new_bb);
> data_ref
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> *** gcc/tree-vect-loop-manip.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/tree-vect-loop-manip.c 2017-09-18 12:56:24.849185433 +0100
> *************** vect_gen_prolog_loop_niters (loop_vec_in
> *** 956,963 ****
> gimple *dr_stmt = DR_STMT (dr);
> stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt);
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> ! int vectype_align = TYPE_ALIGN (vectype) / BITS_PER_UNIT;
> ! int nelements = TYPE_VECTOR_SUBPARTS (vectype);
>
> if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> {
> --- 956,962 ----
> gimple *dr_stmt = DR_STMT (dr);
> stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt);
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr);
>
> if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> {
> *************** vect_gen_prolog_loop_niters (loop_vec_in
> *** 978,1009 ****
> tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
> &stmts, offset);
> tree type = unsigned_type_for (TREE_TYPE (start_addr));
> ! tree vectype_align_minus_1 = build_int_cst (type, vectype_align - 1);
> ! HOST_WIDE_INT elem_size =
> ! int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> tree elem_size_log = build_int_cst (type, exact_log2 (elem_size));
> ! tree nelements_minus_1 = build_int_cst (type, nelements - 1);
> ! tree nelements_tree = build_int_cst (type, nelements);
> ! tree byte_misalign;
> ! tree elem_misalign;
> !
> ! /* Create: byte_misalign = addr & (vectype_align - 1) */
> ! byte_misalign =
> ! fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr),
> ! vectype_align_minus_1);
> !
> ! /* Create: elem_misalign = byte_misalign / element_size */
> ! elem_misalign =
> ! fold_build2 (RSHIFT_EXPR, type, byte_misalign, elem_size_log);
>
> ! /* Create: (niters_type) (nelements - elem_misalign)&(nelements - 1) */
> if (negative)
> ! iters = fold_build2 (MINUS_EXPR, type, elem_misalign, nelements_tree);
> else
> ! iters = fold_build2 (MINUS_EXPR, type, nelements_tree, elem_misalign);
> ! iters = fold_build2 (BIT_AND_EXPR, type, iters, nelements_minus_1);
> iters = fold_convert (niters_type, iters);
> ! *bound = nelements - 1;
> }
>
> if (dump_enabled_p ())
> --- 977,1012 ----
> tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
> &stmts, offset);
> tree type = unsigned_type_for (TREE_TYPE (start_addr));
> ! tree target_align_minus_1 = build_int_cst (type, target_align - 1);
> ! HOST_WIDE_INT elem_size
> ! = int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> tree elem_size_log = build_int_cst (type, exact_log2 (elem_size));
> ! HOST_WIDE_INT align_in_elems = target_align / elem_size;
> ! tree align_in_elems_minus_1 = build_int_cst (type, align_in_elems - 1);
> ! tree align_in_elems_tree = build_int_cst (type, align_in_elems);
> ! tree misalign_in_bytes;
> ! tree misalign_in_elems;
> !
> ! /* Create: misalign_in_bytes = addr & (target_align - 1). */
> ! misalign_in_bytes
> ! = fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr),
> ! target_align_minus_1);
> !
> ! /* Create: misalign_in_elems = misalign_in_bytes / element_size. */
> ! misalign_in_elems
> ! = fold_build2 (RSHIFT_EXPR, type, misalign_in_bytes, elem_size_log);
>
> ! /* Create: (niters_type) ((align_in_elems - misalign_in_elems)
> ! & (align_in_elems - 1)). */
> if (negative)
> ! iters = fold_build2 (MINUS_EXPR, type, misalign_in_elems,
> ! align_in_elems_tree);
> else
> ! iters = fold_build2 (MINUS_EXPR, type, align_in_elems_tree,
> ! misalign_in_elems);
> ! iters = fold_build2 (BIT_AND_EXPR, type, iters, align_in_elems_minus_1);
> iters = fold_convert (niters_type, iters);
> ! *bound = align_in_elems - 1;
> }
>
> if (dump_enabled_p ())
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> *** gcc/tree-vect-stmts.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/tree-vect-stmts.c 2017-09-18 12:56:24.850088870 +0100
> *************** get_group_load_store_type (gimple *stmt,
> *** 1737,1742 ****
> --- 1737,1743 ----
> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> struct loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
> gimple *first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
> + data_reference *first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
> unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
> bool single_element_p = (stmt == first_stmt
> && !GROUP_NEXT_ELEMENT (stmt_info));
> *************** get_group_load_store_type (gimple *stmt,
> *** 1780,1789 ****
> " non-consecutive accesses\n");
> return false;
> }
> ! /* If the access is aligned an overrun is fine. */
> if (overrun_p
> ! && aligned_access_p
> ! (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt))))
> overrun_p = false;
> if (overrun_p && !can_overrun_p)
> {
> --- 1781,1793 ----
> " non-consecutive accesses\n");
> return false;
> }
> ! /* An overrun is fine if the trailing elements are smaller
> ! than the alignment boundary B. Every vector access will
> ! be a multiple of B and so we are guaranteed to access a
> ! non-gap element in the same B-sized block. */
> if (overrun_p
> ! && gap < (vect_known_alignment_in_bytes (first_dr)
> ! / vect_get_scalar_dr_size (first_dr)))
> overrun_p = false;
> if (overrun_p && !can_overrun_p)
> {
> *************** get_group_load_store_type (gimple *stmt,
> *** 1804,1817 ****
> /* If there is a gap at the end of the group then these optimizations
> would access excess elements in the last iteration. */
> bool would_overrun_p = (gap != 0);
> ! /* If the access is aligned an overrun is fine, but only if the
> ! overrun is not inside an unused vector (if the gap is as large
> ! or larger than a vector). */
> if (would_overrun_p
> ! && gap < nunits
> ! && aligned_access_p
> ! (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt))))
> would_overrun_p = false;
> if (!STMT_VINFO_STRIDED_P (stmt_info)
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (stmt) > 0)
> --- 1808,1822 ----
> /* If there is a gap at the end of the group then these optimizations
> would access excess elements in the last iteration. */
> bool would_overrun_p = (gap != 0);
> ! /* An overrun is fine if the trailing elements are smaller than the
> ! alignment boundary B. Every vector access will be a multiple of B
> ! and so we are guaranteed to access a non-gap element in the
> ! same B-sized block. */
> if (would_overrun_p
> ! && gap < (vect_known_alignment_in_bytes (first_dr)
> ! / vect_get_scalar_dr_size (first_dr)))
> would_overrun_p = false;
> +
> if (!STMT_VINFO_STRIDED_P (stmt_info)
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (stmt) > 0)
> *************** vectorizable_mask_load_store (gimple *st
> *** 2351,2357 ****
> TYPE_SIZE_UNIT (vectype));
> }
>
> ! align = TYPE_ALIGN_UNIT (vectype);
> if (aligned_access_p (dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (dr) == -1)
> --- 2356,2362 ----
> TYPE_SIZE_UNIT (vectype));
> }
>
> ! align = DR_TARGET_ALIGNMENT (dr);
> if (aligned_access_p (dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (dr) == -1)
> *************** vectorizable_mask_load_store (gimple *st
> *** 2404,2410 ****
> TYPE_SIZE_UNIT (vectype));
> }
>
> ! align = TYPE_ALIGN_UNIT (vectype);
> if (aligned_access_p (dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (dr) == -1)
> --- 2409,2415 ----
> TYPE_SIZE_UNIT (vectype));
> }
>
> ! align = DR_TARGET_ALIGNMENT (dr);
> if (aligned_access_p (dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (dr) == -1)
> *************** vectorizable_operation (gimple *stmt, gi
> *** 5553,5577 ****
> return true;
> }
>
> ! /* A helper function to ensure data reference DR's base alignment
> ! for STMT_INFO. */
>
> static void
> ! ensure_base_align (stmt_vec_info stmt_info, struct data_reference *dr)
> {
> if (!dr->aux)
> return;
>
> if (DR_VECT_AUX (dr)->base_misaligned)
> {
> - tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> tree base_decl = DR_VECT_AUX (dr)->base_decl;
>
> if (decl_in_symtab_p (base_decl))
> ! symtab_node::get (base_decl)->increase_alignment (TYPE_ALIGN (vectype));
> else
> {
> ! SET_DECL_ALIGN (base_decl, TYPE_ALIGN (vectype));
> DECL_USER_ALIGN (base_decl) = 1;
> }
> DR_VECT_AUX (dr)->base_misaligned = false;
> --- 5558,5582 ----
> return true;
> }
>
> ! /* A helper function to ensure data reference DR's base alignment. */
>
> static void
> ! ensure_base_align (struct data_reference *dr)
> {
> if (!dr->aux)
> return;
>
> if (DR_VECT_AUX (dr)->base_misaligned)
> {
> tree base_decl = DR_VECT_AUX (dr)->base_decl;
>
> + unsigned int align_base_to = DR_TARGET_ALIGNMENT (dr) * BITS_PER_UNIT;
> +
> if (decl_in_symtab_p (base_decl))
> ! symtab_node::get (base_decl)->increase_alignment (align_base_to);
> else
> {
> ! SET_DECL_ALIGN (base_decl, align_base_to);
> DECL_USER_ALIGN (base_decl) = 1;
> }
> DR_VECT_AUX (dr)->base_misaligned = false;
> *************** vectorizable_store (gimple *stmt, gimple
> *** 5775,5781 ****
>
> /* Transform. */
>
> ! ensure_base_align (stmt_info, dr);
>
> if (memory_access_type == VMAT_GATHER_SCATTER)
> {
> --- 5780,5786 ----
>
> /* Transform. */
>
> ! ensure_base_align (dr);
>
> if (memory_access_type == VMAT_GATHER_SCATTER)
> {
> *************** vectorizable_store (gimple *stmt, gimple
> *** 6417,6423 ****
> dataref_offset
> ? dataref_offset
> : build_int_cst (ref_type, 0));
> ! align = TYPE_ALIGN_UNIT (vectype);
> if (aligned_access_p (first_dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (first_dr) == -1)
> --- 6422,6428 ----
> dataref_offset
> ? dataref_offset
> : build_int_cst (ref_type, 0));
> ! align = DR_TARGET_ALIGNMENT (first_dr);
> if (aligned_access_p (first_dr))
> misalign = 0;
> else if (DR_MISALIGNMENT (first_dr) == -1)
> *************** vectorizable_load (gimple *stmt, gimple_
> *** 6813,6819 ****
>
> /* Transform. */
>
> ! ensure_base_align (stmt_info, dr);
>
> if (memory_access_type == VMAT_GATHER_SCATTER)
> {
> --- 6818,6824 ----
>
> /* Transform. */
>
> ! ensure_base_align (dr);
>
> if (memory_access_type == VMAT_GATHER_SCATTER)
> {
> *************** vectorizable_load (gimple *stmt, gimple_
> *** 7512,7518 ****
> dataref_offset
> ? dataref_offset
> : build_int_cst (ref_type, 0));
> ! align = TYPE_ALIGN_UNIT (vectype);
> if (alignment_support_scheme == dr_aligned)
> {
> gcc_assert (aligned_access_p (first_dr));
> --- 7517,7523 ----
> dataref_offset
> ? dataref_offset
> : build_int_cst (ref_type, 0));
> ! align = DR_TARGET_ALIGNMENT (dr);
> if (alignment_support_scheme == dr_aligned)
> {
> gcc_assert (aligned_access_p (first_dr));
> *************** vectorizable_load (gimple *stmt, gimple_
> *** 7555,7565 ****
> ptr = copy_ssa_name (dataref_ptr);
> else
> ptr = make_ssa_name (TREE_TYPE (dataref_ptr));
> new_stmt = gimple_build_assign
> (ptr, BIT_AND_EXPR, dataref_ptr,
> build_int_cst
> (TREE_TYPE (dataref_ptr),
> ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> data_ref
> = build2 (MEM_REF, vectype, ptr,
> --- 7560,7571 ----
> ptr = copy_ssa_name (dataref_ptr);
> else
> ptr = make_ssa_name (TREE_TYPE (dataref_ptr));
> + unsigned int align = DR_TARGET_ALIGNMENT (first_dr);
> new_stmt = gimple_build_assign
> (ptr, BIT_AND_EXPR, dataref_ptr,
> build_int_cst
> (TREE_TYPE (dataref_ptr),
> ! -(HOST_WIDE_INT) align));
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> data_ref
> = build2 (MEM_REF, vectype, ptr,
> *************** vectorizable_load (gimple *stmt, gimple_
> *** 7581,7588 ****
> new_stmt = gimple_build_assign
> (NULL_TREE, BIT_AND_EXPR, ptr,
> build_int_cst
> ! (TREE_TYPE (ptr),
> ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> ptr = copy_ssa_name (ptr, new_stmt);
> gimple_assign_set_lhs (new_stmt, ptr);
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> --- 7587,7593 ----
> new_stmt = gimple_build_assign
> (NULL_TREE, BIT_AND_EXPR, ptr,
> build_int_cst
> ! (TREE_TYPE (ptr), -(HOST_WIDE_INT) align));
> ptr = copy_ssa_name (ptr, new_stmt);
> gimple_assign_set_lhs (new_stmt, ptr);
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> *************** vectorizable_load (gimple *stmt, gimple_
> *** 7592,7611 ****
> break;
> }
> case dr_explicit_realign_optimized:
> ! if (TREE_CODE (dataref_ptr) == SSA_NAME)
> ! new_temp = copy_ssa_name (dataref_ptr);
> ! else
> ! new_temp = make_ssa_name (TREE_TYPE (dataref_ptr));
> ! new_stmt = gimple_build_assign
> ! (new_temp, BIT_AND_EXPR, dataref_ptr,
> ! build_int_cst
> ! (TREE_TYPE (dataref_ptr),
> ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> ! vect_finish_stmt_generation (stmt, new_stmt, gsi);
> ! data_ref
> ! = build2 (MEM_REF, vectype, new_temp,
> ! build_int_cst (ref_type, 0));
> ! break;
> default:
> gcc_unreachable ();
> }
> --- 7597,7618 ----
> break;
> }
> case dr_explicit_realign_optimized:
> ! {
> ! if (TREE_CODE (dataref_ptr) == SSA_NAME)
> ! new_temp = copy_ssa_name (dataref_ptr);
> ! else
> ! new_temp = make_ssa_name (TREE_TYPE (dataref_ptr));
> ! unsigned int align = DR_TARGET_ALIGNMENT (first_dr);
> ! new_stmt = gimple_build_assign
> ! (new_temp, BIT_AND_EXPR, dataref_ptr,
> ! build_int_cst (TREE_TYPE (dataref_ptr),
> ! -(HOST_WIDE_INT) align));
> ! vect_finish_stmt_generation (stmt, new_stmt, gsi);
> ! data_ref
> ! = build2 (MEM_REF, vectype, new_temp,
> ! build_int_cst (ref_type, 0));
> ! break;
> ! }
> default:
> gcc_unreachable ();
> }
> Index: gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-outer-3a.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-outer-3a.c 2017-09-18 12:56:24.849185433 +0100
> *************** int main (void)
> *** 49,52 ****
> }
>
> /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" 1 "vect" } } */
> --- 49,52 ----
> }
>
> /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector alignment" 1 "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c 2017-09-18 12:56:24.635070853 +0100
> --- gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c 2017-09-18 12:56:24.847378559 +0100
> *************** int main (void)
> *** 49,52 ****
> }
>
> /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" 1 "vect" } } */
> --- 49,52 ----
> }
>
> /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector alignment" 1 "vect" } } */