[RFC] #pragma omp scan inclusive vectorization
Richard Biener
rguenther@suse.de
Mon Jun 17 06:35:00 GMT 2019
On Fri, 14 Jun 2019, Jakub Jelinek wrote:
> Hi!
>
> OpenMP 5.0 introduced scan reductions, like:
> #pragma omp simd reduction (inscan, +:r)
> for (int i = 0; i < 1024; i++)
> {
> r += a[i];
> #pragma omp scan inclusive(r)
> b[i] = r;
> }
> where there are 2 parts of code in each iteration, one which is supposed
> to compute the value for the privatized reduction variable (the private
> copy is initialized with a neutral element of the operation at the
> start of that part), and then the #pragma omp scan is supposed to
> change that private variable to include (in this case) inclusive partial
> sums. E.g. PSTL we now have in libstdc++-v3/include/pstl/ makes use of
> these when available to implement std::*_scan. It can be done also in
> worksharing loops, but I'll get to that later.
>
> Anyway, the problem is that e.g. with OpenMP user defined reductions,
> the initializer and combiner of the reduction aren't simple operations
> during OpenMP lowering, it can be a method call or constructor call etc.,
> so we need something that preserves those initializer and combiner snippets
> in the IL for the vectorizer to be able to optimize them if they are
> simplified enough, on the other side it needs to be something that the
> normal optimizers are able to optimize and that actually works even when
> the vectorization isn't performed.
>
> The following (incomplete, but far enough that for non-user defined
> reductions it handles the inclusive scan) patch handles that by using
> more magic, it adds variants to the .GOMP_SIMD_LANE builtin and uses those,
> the old one (0) in the user code, another variant (1) in the initializer
> and another variant (2) in the combiner pattern, which the vectorizer then
> needs to pattern recognize and either vectorize, or punt on vectorization.
> If it vectorizes it, it emits code like (optimized dump):
> <bb 5> [local count: 708669599]:
> # ivtmp.27_45 = PHI <0(4), ivtmp.27_12(5)>
> # D__lsm.39_80 = PHI <D__lsm.39_47(4), _64(5)>
> vect__4.15_49 = MEM[base: a_23(D), index: ivtmp.27_45, offset: 0B];
> _57 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, vect__4.15_49, { 0, 8, 9, 10, 11, 12, 13, 14 }>;
> _58 = vect__4.15_49 + _57;
> _59 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _58, { 0, 1, 8, 9, 10, 11, 12, 13 }>;
> _60 = _58 + _59;
> _61 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _60, { 0, 1, 2, 3, 8, 9, 10, 11 }>;
> _62 = _60 + _61;
> _63 = _62 + D__lsm.39_80;
> _64 = VEC_PERM_EXPR <_63, _63, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> MEM[base: b_32(D), index: ivtmp.27_45, offset: 0B] = _63;
> ivtmp.27_12 = ivtmp.27_45 + 32;
> if (ivtmp.27_12 != 4096)
> goto <bb 5>; [83.33%]
> else
> goto <bb 6>; [16.67%]
> where the _57 ... _64 sequence is the implementation of the scan directive.
>
> Does this look reasonable?
Ugh, not pretty but probably best we can do. Btw, can you please
add support for the SLP case and group_size == 1? I know I'm slow
with the branch ripping out the non-SLP path but it would save me
some extra work (possibly).
Thanks,
Richard.
> BTW, unfortunately SSE2 can't handle these permutations, probably I'll need
> optionally some other sequence if they aren't supported (only SSE4 does).
> In particular, what could be done is use whole vector shifts and
> VEC_COND_EXPR to blend the neutral element in.
>
> --- gcc/tree-vect-stmts.c.jj 2019-06-13 13:28:36.636155362 +0200
> +++ gcc/tree-vect-stmts.c 2019-06-14 19:05:18.150502242 +0200
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
> #include "tree-ssa-loop-niter.h"
> #include "gimple-fold.h"
> #include "regs.h"
> +#include "attribs.h"
>
> /* For lang_hooks.types.type_for_mode. */
> #include "langhooks.h"
> @@ -3257,7 +3258,7 @@ vectorizable_call (stmt_vec_info stmt_in
> if (nargs == 0 || nargs > 4)
> return false;
>
> - /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic. */
> + /* Ignore the arguments of IFN_GOMP_SIMD_LANE, they are magic. */
> combined_fn cfn = gimple_call_combined_fn (stmt);
> if (cfn == CFN_GOMP_SIMD_LANE)
> {
> @@ -6320,6 +6321,456 @@ get_group_alias_ptr_type (stmt_vec_info
> }
>
>
> +/* Function check_scan_store.
> +
> + Check magic stores for #pragma omp scan {in,ex}clusive reductions. */
> +
> +static bool
> +check_scan_store (stmt_vec_info stmt_info, tree vectype,
> + enum vect_def_type rhs_dt, bool slp, tree mask,
> + vect_memory_access_type memory_access_type)
> +{
> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> + dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> + tree ref_type;
> +
> + gcc_assert (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1);
> + if (slp
> + || mask
> + || memory_access_type != VMAT_CONTIGUOUS
> + || TREE_CODE (DR_BASE_ADDRESS (dr_info->dr)) != ADDR_EXPR
> + || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0))
> + || loop_vinfo == NULL
> + || LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> + || STMT_VINFO_GROUPED_ACCESS (stmt_info)
> + || !integer_zerop (DR_OFFSET (dr_info->dr))
> + || !integer_zerop (DR_INIT (dr_info->dr))
> + || !(ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)))
> + || !alias_sets_conflict_p (get_alias_set (vectype),
> + get_alias_set (TREE_TYPE (ref_type))))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "unsupported OpenMP scan store.\n");
> + return false;
> + }
> +
> + /* We need to pattern match code built by OpenMP lowering and simplified
> + by following optimizations into something we can handle.
> + #pragma omp simd reduction(inscan,+:r)
> + for (...)
> + {
> + r += something ();
> + #pragma omp scan inclusive (r)
> + use (r);
> + }
> + shall have body with:
> + // Initialization for input phase, store the reduction initializer:
> + _20 = .GOMP_SIMD_LANE (simduid.3_14(D), 0);
> + _21 = .GOMP_SIMD_LANE (simduid.3_14(D), 1);
> + D.2042[_21] = 0;
> + // Actual input phase:
> + ...
> + r.0_5 = D.2042[_20];
> + _6 = _4 + r.0_5;
> + D.2042[_20] = _6;
> + // Initialization for scan phase:
> + _25 = .GOMP_SIMD_LANE (simduid.3_14(D), 2);
> + _26 = D.2043[_25];
> + _27 = D.2042[_25];
> + _28 = _26 + _27;
> + D.2043[_25] = _28;
> + D.2042[_25] = _28;
> + // Actual scan phase:
> + ...
> + r.1_8 = D.2042[_20];
> + ...
> + The "omp simd array" variable D.2042 holds the privatized copy used
> + inside of the loop and D.2043 is another one that holds copies of
> + the current original list item. The separate GOMP_SIMD_LANE ifn
> + kinds are there in order to allow optimizing the initializer store
> + and combiner sequence, e.g. if it is originally some C++ish user
> + defined reduction, but allow the vectorizer to pattern recognize it
> + and turn into the appropriate vectorized scan. */
> +
> + if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 2)
> + {
> + /* Match the D.2042[_21] = 0; store above. Just require that
> + it is a constant or external definition store. */
> + if (rhs_dt != vect_constant_def && rhs_dt != vect_external_def)
> + {
> + fail_init:
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "unsupported OpenMP scan initializer store.\n");
> + return false;
> + }
> +
> + if (! loop_vinfo->scan_map)
> + loop_vinfo->scan_map = new hash_map<tree, tree>;
> + tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> + tree &cached = loop_vinfo->scan_map->get_or_insert (var);
> + if (cached)
> + goto fail_init;
> + cached = gimple_assign_rhs1 (STMT_VINFO_STMT (stmt_info));
> +
> + /* These stores can be vectorized normally. */
> + return true;
> + }
> +
> + if (rhs_dt != vect_internal_def)
> + {
> + fail:
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "unsupported OpenMP scan combiner pattern.\n");
> + return false;
> + }
> +
> + gimple *stmt = STMT_VINFO_STMT (stmt_info);
> + tree rhs = gimple_assign_rhs1 (stmt);
> + if (TREE_CODE (rhs) != SSA_NAME)
> + goto fail;
> +
> + use_operand_p use_p;
> + imm_use_iterator iter;
> + gimple *other_store_stmt = NULL;
> + FOR_EACH_IMM_USE_FAST (use_p, iter, rhs)
> + {
> + gimple *use_stmt = USE_STMT (use_p);
> + if (use_stmt == stmt || is_gimple_debug (use_stmt))
> + continue;
> + if (gimple_bb (use_stmt) != gimple_bb (stmt)
> + || !gimple_store_p (use_stmt)
> + || other_store_stmt)
> + goto fail;
> + other_store_stmt = use_stmt;
> + }
> + if (other_store_stmt == NULL)
> + goto fail;
> + stmt_vec_info other_store_stmt_info
> + = loop_vinfo->lookup_stmt (other_store_stmt);
> + if (other_store_stmt_info == NULL
> + || STMT_VINFO_SIMD_LANE_ACCESS_P (other_store_stmt_info) != 3)
> + goto fail;
> +
> + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> + if (gimple_bb (def_stmt) != gimple_bb (stmt)
> + || !is_gimple_assign (def_stmt)
> + || gimple_assign_rhs_class (def_stmt) != GIMPLE_BINARY_RHS)
> + goto fail;
> +
> + enum tree_code code = gimple_assign_rhs_code (def_stmt);
> + /* For pointer addition, we should use the normal plus for the vector
> + operation. */
> + switch (code)
> + {
> + case POINTER_PLUS_EXPR:
> + code = PLUS_EXPR;
> + break;
> + case MULT_HIGHPART_EXPR:
> + goto fail;
> + default:
> + break;
> + }
> + if (TREE_CODE_LENGTH (code) != binary_op || !commutative_tree_code (code))
> + goto fail;
> +
> + tree rhs1 = gimple_assign_rhs1 (def_stmt);
> + tree rhs2 = gimple_assign_rhs2 (def_stmt);
> + if (TREE_CODE (rhs1) != SSA_NAME
> + || TREE_CODE (rhs2) != SSA_NAME)
> + goto fail;
> +
> + gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> + gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> + if (gimple_bb (load1_stmt) != gimple_bb (stmt)
> + || !gimple_assign_load_p (load1_stmt)
> + || gimple_bb (load2_stmt) != gimple_bb (stmt)
> + || !gimple_assign_load_p (load2_stmt))
> + goto fail;
> +
> + stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> + stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> + if (load1_stmt_info == NULL
> + || load2_stmt_info == NULL
> + || STMT_VINFO_SIMD_LANE_ACCESS_P (load1_stmt_info) != 3
> + || STMT_VINFO_SIMD_LANE_ACCESS_P (load2_stmt_info) != 3)
> + goto fail;
> +
> + if (operand_equal_p (gimple_assign_lhs (stmt),
> + gimple_assign_rhs1 (load2_stmt), 0))
> + {
> + std::swap (rhs1, rhs2);
> + std::swap (load1_stmt, load2_stmt);
> + std::swap (load1_stmt_info, load2_stmt_info);
> + }
> + if (!operand_equal_p (gimple_assign_lhs (stmt),
> + gimple_assign_rhs1 (load1_stmt), 0)
> + || !operand_equal_p (gimple_assign_lhs (other_store_stmt),
> + gimple_assign_rhs1 (load2_stmt), 0))
> + goto fail;
> +
> + dr_vec_info *other_dr_info = STMT_VINFO_DR_INFO (other_store_stmt_info);
> + if (TREE_CODE (DR_BASE_ADDRESS (other_dr_info->dr)) != ADDR_EXPR
> + || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0)))
> + goto fail;
> +
> + tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> + tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0);
> + if (!lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var1))
> + || !lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var2))
> + || (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> + == (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var2))))
> + goto fail;
> +
> + if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> + std::swap (var1, var2);
> +
> + if (loop_vinfo->scan_map == NULL)
> + goto fail;
> + tree *init = loop_vinfo->scan_map->get (var1);
> + if (init == NULL)
> + goto fail;
> +
> + /* The IL is as expected, now check if we can actually vectorize it.
> + _26 = D.2043[_25];
> + _27 = D.2042[_25];
> + _28 = _26 + _27;
> + D.2043[_25] = _28;
> + D.2042[_25] = _28;
> + should be vectorized as (where _40 is the vectorized rhs
> + from the D.2042[_21] = 0; store):
> + _30 = MEM <vector(8) int> [(int *)&D.2043];
> + _31 = MEM <vector(8) int> [(int *)&D.2042];
> + _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
> + _33 = _31 + _32;
> + // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
> + _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
> + _35 = _33 + _34;
> + // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> + // _31[1]+.._31[4], ... _31[4]+.._31[7] };
> + _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
> + _37 = _35 + _36;
> + // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> + // _31[0]+.._31[4], ... _31[0]+.._31[7] };
> + _38 = _30 + _37;
> + _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> + MEM <vector(8) int> [(int *)&D.2043] = _39;
> + MEM <vector(8) int> [(int *)&D.2042] = _38; */
> + enum machine_mode vec_mode = TYPE_MODE (vectype);
> + optab optab = optab_for_tree_code (code, vectype, optab_default);
> + if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
> + goto fail;
> +
> + unsigned HOST_WIDE_INT nunits;
> + if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> + goto fail;
> + int units_log2 = exact_log2 (nunits);
> + if (units_log2 <= 0)
> + goto fail;
> +
> + for (int i = 0; i <= units_log2; ++i)
> + {
> + unsigned HOST_WIDE_INT j, k;
> + vec_perm_builder sel (nunits, nunits, 1);
> + sel.quick_grow (nunits);
> + if (i == units_log2)
> + {
> + for (j = 0; j < nunits; ++j)
> + sel[j] = nunits - 1;
> + }
> + else
> + {
> + for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> + sel[j] = nunits + j;
> + for (k = 0; j < nunits; ++j, ++k)
> + sel[j] = k;
> + }
> + vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> + if (!can_vec_perm_const_p (vec_mode, indices))
> + goto fail;
> + }
> +
> + return true;
> +}
> +
> +
> +/* Function vectorizable_scan_store.
> +
> + Helper of vectorizable_score, arguments like on vectorizable_store.
> + Handle only the transformation, checking is done in check_scan_store. */
> +
> +static bool
> +vectorizable_scan_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> + stmt_vec_info *vec_stmt, int ncopies)
> +{
> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> + dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> + tree ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr));
> + vec_info *vinfo = stmt_info->vinfo;
> + tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "transform scan store. ncopies = %d\n", ncopies);
> +
> + gimple *stmt = STMT_VINFO_STMT (stmt_info);
> + tree rhs = gimple_assign_rhs1 (stmt);
> + gcc_assert (TREE_CODE (rhs) == SSA_NAME);
> +
> + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> + enum tree_code code = gimple_assign_rhs_code (def_stmt);
> + if (code == POINTER_PLUS_EXPR)
> + code = PLUS_EXPR;
> + gcc_assert (TREE_CODE_LENGTH (code) == binary_op
> + && commutative_tree_code (code));
> + tree rhs1 = gimple_assign_rhs1 (def_stmt);
> + tree rhs2 = gimple_assign_rhs2 (def_stmt);
> + gcc_assert (TREE_CODE (rhs1) == SSA_NAME && TREE_CODE (rhs2) == SSA_NAME);
> + gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> + gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> + stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> + stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> + dr_vec_info *load1_dr_info = STMT_VINFO_DR_INFO (load1_stmt_info);
> + dr_vec_info *load2_dr_info = STMT_VINFO_DR_INFO (load2_stmt_info);
> + tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (load1_dr_info->dr), 0);
> + tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (load2_dr_info->dr), 0);
> +
> + if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> + {
> + std::swap (rhs1, rhs2);
> + std::swap (var1, var2);
> + }
> +
> + tree *init = loop_vinfo->scan_map->get (var1);
> + gcc_assert (init);
> +
> + tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> + bool inscan_var_store
> + = lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var)) != NULL;
> +
> + unsigned HOST_WIDE_INT nunits;
> + if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> + gcc_unreachable ();
> + int units_log2 = exact_log2 (nunits);
> + gcc_assert (units_log2 > 0);
> + auto_vec<tree, 16> perms;
> + perms.quick_grow (units_log2 + 1);
> + for (int i = 0; i <= units_log2; ++i)
> + {
> + unsigned HOST_WIDE_INT j, k;
> + vec_perm_builder sel (nunits, nunits, 1);
> + sel.quick_grow (nunits);
> + if (i == units_log2)
> + {
> + for (j = 0; j < nunits; ++j)
> + sel[j] = nunits - 1;
> + }
> + else
> + {
> + for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> + sel[j] = nunits + j;
> + for (k = 0; j < nunits; ++j, ++k)
> + sel[j] = k;
> + }
> + vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> + perms[i] = vect_gen_perm_mask_checked (vectype, indices);
> + }
> +
> + stmt_vec_info prev_stmt_info = NULL;
> + tree vec_oprnd1 = NULL_TREE;
> + tree vec_oprnd2 = NULL_TREE;
> + tree vec_oprnd3 = NULL_TREE;
> + tree dataref_ptr = unshare_expr (DR_BASE_ADDRESS (dr_info->dr));
> + tree dataref_offset = build_int_cst (ref_type, 0);
> + tree bump = vect_get_data_ptr_increment (dr_info, vectype, VMAT_CONTIGUOUS);
> + tree orig = NULL_TREE;
> + for (int j = 0; j < ncopies; j++)
> + {
> + stmt_vec_info new_stmt_info;
> + if (j == 0)
> + {
> + vec_oprnd1 = vect_get_vec_def_for_operand (*init, stmt_info);
> + vec_oprnd2 = vect_get_vec_def_for_operand (rhs1, stmt_info);
> + vec_oprnd3 = vect_get_vec_def_for_operand (rhs2, stmt_info);
> + orig = vec_oprnd3;
> + }
> + else
> + {
> + vec_oprnd1 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd1);
> + vec_oprnd2 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd2);
> + vec_oprnd3 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd3);
> + if (!inscan_var_store)
> + dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> + }
> +
> + tree v = vec_oprnd2;
> + for (int i = 0; i < units_log2; ++i)
> + {
> + tree new_temp = make_ssa_name (vectype);
> + gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
> + vec_oprnd1, perms[i]);
> + new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> + if (prev_stmt_info == NULL)
> + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
> + else
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> +
> + tree new_temp2 = make_ssa_name (vectype);
> + g = gimple_build_assign (new_temp2, code, v, new_temp);
> + new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> +
> + v = new_temp2;
> + }
> +
> + tree new_temp = make_ssa_name (vectype);
> + gimple *g = gimple_build_assign (new_temp, code, orig, v);
> + new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> +
> + orig = make_ssa_name (vectype);
> + g = gimple_build_assign (orig, VEC_PERM_EXPR, new_temp, new_temp,
> + perms[units_log2]);
> + new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> +
> + if (!inscan_var_store)
> + {
> + tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> + dataref_offset);
> + vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> + g = gimple_build_assign (data_ref, new_temp);
> + new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> + }
> + }
> +
> + if (inscan_var_store)
> + for (int j = 0; j < ncopies; j++)
> + {
> + if (j != 0)
> + dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +
> + tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> + dataref_offset);
> + vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> + gimple *g = gimple_build_assign (data_ref, orig);
> + stmt_vec_info new_stmt_info
> + = vect_finish_stmt_generation (stmt_info, g, gsi);
> + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> + prev_stmt_info = new_stmt_info;
> + }
> + return true;
> +}
> +
> +
> /* Function vectorizable_store.
>
> Check if STMT_INFO defines a non scalar data-ref (array/pointer/structure)
> @@ -6514,6 +6965,13 @@ vectorizable_store (stmt_vec_info stmt_i
> group_size = vec_num = 1;
> }
>
> + if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1 && !vec_stmt)
> + {
> + if (!check_scan_store (stmt_info, vectype, rhs_dt, slp, mask,
> + memory_access_type))
> + return false;
> + }
> +
> if (!vec_stmt) /* transformation not required. */
> {
> STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
> @@ -6737,6 +7195,8 @@ vectorizable_store (stmt_vec_info stmt_i
> }
> return true;
> }
> + else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 3)
> + return vectorizable_scan_store (stmt_info, gsi, vec_stmt, ncopies);
>
> if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
> @@ -7162,7 +7622,7 @@ vectorizable_store (stmt_vec_info stmt_i
> gcc_assert (useless_type_conversion_p (vectype,
> TREE_TYPE (vec_oprnd)));
> bool simd_lane_access_p
> - = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> + = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
> if (simd_lane_access_p
> && !loop_masks
> && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> @@ -8347,7 +8807,7 @@ vectorizable_load (stmt_vec_info stmt_in
> if (j == 0)
> {
> bool simd_lane_access_p
> - = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> + = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
> if (simd_lane_access_p
> && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> --- gcc/tree-vect-data-refs.c.jj 2019-06-13 12:06:17.786472401 +0200
> +++ gcc/tree-vect-data-refs.c 2019-06-14 09:52:14.920718040 +0200
> @@ -3003,6 +3003,13 @@ vect_analyze_data_ref_accesses (vec_info
> || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
> break;
>
> + /* Different .GOMP_SIMD_LANE calls still give the same lane,
> + just hold extra information. */
> + if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_a)
> + && STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_b)
> + && data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb)) == 0)
> + break;
> +
> /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */
> HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
> HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> @@ -4101,7 +4108,8 @@ vect_find_stmt_data_reference (loop_p lo
> DR_STEP_ALIGNMENT (newdr)
> = highest_pow2_factor (step);
> /* Mark as simd-lane access. */
> - newdr->aux = (void *)-1;
> + tree arg2 = gimple_call_arg (def, 1);
> + newdr->aux = (void *) (-1 - tree_to_uhwi (arg2));
> free_data_ref (dr);
> datarefs->safe_push (newdr);
> return opt_result::success ();
> @@ -4210,14 +4218,17 @@ vect_analyze_data_refs (vec_info *vinfo,
> }
>
> /* See if this was detected as SIMD lane access. */
> - if (dr->aux == (void *)-1)
> + if (dr->aux == (void *)-1
> + || dr->aux == (void *)-2
> + || dr->aux == (void *)-3)
> {
> if (nested_in_vect_loop_p (loop, stmt_info))
> return opt_result::failure_at (stmt_info->stmt,
> "not vectorized:"
> " data ref analysis failed: %G",
> stmt_info->stmt);
> - STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
> + STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info)
> + = -(uintptr_t) dr->aux;
> }
>
> tree base = get_base_address (DR_REF (dr));
> --- gcc/tree-vectorizer.h.jj 2019-06-13 12:50:31.597926603 +0200
> +++ gcc/tree-vectorizer.h 2019-06-14 16:51:53.155792356 +0200
> @@ -491,6 +491,10 @@ typedef struct _loop_vec_info : public v
> /* Map of IV base/step expressions to inserted name in the preheader. */
> hash_map<tree_operand_hash, tree> *ivexpr_map;
>
> + /* Map of OpenMP "omp simd array" scan variables to corresponding
> + rhs of the store of the initializer. */
> + hash_map<tree, tree> *scan_map;
> +
> /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
> applied to the loop, i.e., no unrolling is needed, this is 1. */
> poly_uint64 slp_unrolling_factor;
> @@ -913,7 +917,7 @@ struct _stmt_vec_info {
> bool strided_p;
>
> /* For both loads and stores. */
> - bool simd_lane_access_p;
> + unsigned simd_lane_access_p : 2;
>
> /* Classifies how the load or store is going to be implemented
> for loop vectorization. */
> --- gcc/tree-ssa-dce.c.jj 2019-06-13 13:28:36.763153374 +0200
> +++ gcc/tree-ssa-dce.c 2019-06-13 14:20:14.889711910 +0200
> @@ -1339,14 +1339,14 @@ eliminate_unnecessary_stmts (void)
> update_stmt (stmt);
> release_ssa_name (name);
>
> - /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
> + /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON
> without lhs is not needed. */
> if (gimple_call_internal_p (stmt))
> switch (gimple_call_internal_fn (stmt))
> {
> case IFN_GOMP_SIMD_LANE:
> - if (gimple_call_num_args (stmt) >= 2
> - && !integer_nonzerop (gimple_call_arg (stmt, 1)))
> + if (gimple_call_num_args (stmt) >= 3
> + && !integer_nonzerop (gimple_call_arg (stmt, 2)))
> break;
> /* FALLTHRU */
> case IFN_ASAN_POISON:
> --- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj 2019-06-14 19:00:40.918765225 +0200
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c 2019-06-14 19:01:43.755798987 +0200
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#include "tree-vect.h"
> +
> +int r, a[1024], b[1024];
> +
> +__attribute__((noipa)) void
> +foo (int *a, int *b)
> +{
> + #pragma omp simd reduction (inscan, +:r)
> + for (int i = 0; i < 1024; i++)
> + {
> + r += a[i];
> + #pragma omp scan inclusive(r)
> + b[i] = r;
> + }
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> + int s = 0;
> + #pragma omp simd reduction (inscan, +:s)
> + for (int i = 0; i < 1024; i++)
> + {
> + s += 2 * a[i];
> + #pragma omp scan inclusive(s)
> + b[i] = s;
> + }
> + return s;
> +}
> +
> +int
> +main ()
> +{
> + int s = 0;
> + check_vect ();
> + for (int i = 0; i < 1024; ++i)
> + {
> + a[i] = i;
> + b[i] = -1;
> + asm ("" : "+g" (i));
> + }
> + foo (a, b);
> + if (r != 1024 * 1023 / 2)
> + abort ();
> + for (int i = 0; i < 1024; ++i)
> + {
> + s += i;
> + if (b[i] != s)
> + abort ();
> + else
> + b[i] = 25;
> + }
> + if (bar () != 1024 * 1023)
> + abort ();
> + s = 0;
> + for (int i = 0; i < 1024; ++i)
> + {
> + s += 2 * i;
> + if (b[i] != s)
> + abort ();
> + }
> + return 0;
> +}
> --- gcc/omp-low.c.jj 2019-06-13 13:28:36.611155753 +0200
> +++ gcc/omp-low.c 2019-06-14 18:54:14.976699854 +0200
> @@ -141,6 +141,9 @@ struct omp_context
> /* True if lower_omp_1 should look up lastprivate conditional in parent
> context. */
> bool combined_into_simd_safelen0;
> +
> + /* True if there is nested scan context with inclusive clause. */
> + bool scan_inclusive;
> };
>
> static splay_tree all_contexts;
> @@ -3329,11 +3332,15 @@ scan_omp_1_stmt (gimple_stmt_iterator *g
> scan_omp_single (as_a <gomp_single *> (stmt), ctx);
> break;
>
> + case GIMPLE_OMP_SCAN:
> + if (tree clauses = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> + if (OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_INCLUSIVE)
> + ctx->scan_inclusive = true;
> + /* FALLTHRU */
> case GIMPLE_OMP_SECTION:
> case GIMPLE_OMP_MASTER:
> case GIMPLE_OMP_ORDERED:
> case GIMPLE_OMP_CRITICAL:
> - case GIMPLE_OMP_SCAN:
> case GIMPLE_OMP_GRID_BODY:
> ctx = new_omp_context (stmt, ctx);
> scan_omp (gimple_omp_body_ptr (stmt), ctx);
> @@ -3671,6 +3678,7 @@ struct omplow_simd_context {
> omplow_simd_context () { memset (this, 0, sizeof (*this)); }
> tree idx;
> tree lane;
> + tree lastlane;
> vec<tree, va_heap> simt_eargs;
> gimple_seq simt_dlist;
> poly_uint64_pod max_vf;
> @@ -3682,7 +3690,8 @@ struct omplow_simd_context {
>
> static bool
> lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
> - omplow_simd_context *sctx, tree &ivar, tree &lvar)
> + omplow_simd_context *sctx, tree &ivar,
> + tree &lvar, tree *rvar = NULL)
> {
> if (known_eq (sctx->max_vf, 0U))
> {
> @@ -3738,7 +3747,27 @@ lower_rec_simd_input_clauses (tree new_v
> = tree_cons (get_identifier ("omp simd array"), NULL,
> DECL_ATTRIBUTES (avar));
> gimple_add_tmp_var (avar);
> - ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
> + tree iavar = avar;
> + if (rvar)
> + {
> + /* For inscan reductions, create another array temporary,
> + which will hold the reduced value. */
> + iavar = create_tmp_var_raw (atype);
> + if (TREE_ADDRESSABLE (new_var))
> + TREE_ADDRESSABLE (iavar) = 1;
> + DECL_ATTRIBUTES (iavar)
> + = tree_cons (get_identifier ("omp simd array"), NULL,
> + tree_cons (get_identifier ("omp simd inscan"), NULL,
> + DECL_ATTRIBUTES (iavar)));
> + gimple_add_tmp_var (iavar);
> + ctx->cb.decl_map->put (avar, iavar);
> + if (sctx->lastlane == NULL_TREE)
> + sctx->lastlane = create_tmp_var (unsigned_type_node);
> + *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
> + sctx->lastlane, NULL_TREE, NULL_TREE);
> + TREE_THIS_NOTRAP (*rvar) = 1;
> + }
> + ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
> NULL_TREE, NULL_TREE);
> lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
> NULL_TREE, NULL_TREE);
> @@ -3814,7 +3843,7 @@ lower_rec_input_clauses (tree clauses, g
> omplow_simd_context sctx = omplow_simd_context ();
> tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
> tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
> - gimple_seq llist[3] = { };
> + gimple_seq llist[4] = { };
> tree nonconst_simd_if = NULL_TREE;
>
> copyin_seq = NULL;
> @@ -5324,12 +5353,32 @@ lower_rec_input_clauses (tree clauses, g
> new_vard = TREE_OPERAND (new_var, 0);
> gcc_assert (DECL_P (new_vard));
> }
> + tree rvar = NULL_TREE, *rvarp = NULL;
> + if (is_simd
> + && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> + && OMP_CLAUSE_REDUCTION_INSCAN (c))
> + rvarp = &rvar;
> if (is_simd
> && lower_rec_simd_input_clauses (new_var, ctx, &sctx,
> - ivar, lvar))
> + ivar, lvar, rvarp))
> {
> + if (new_vard != new_var)
> + {
> + SET_DECL_VALUE_EXPR (new_vard,
> + build_fold_addr_expr (lvar));
> + DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> + }
> +
> tree ref = build_outer_var_ref (var, ctx);
>
> + if (rvarp)
> + {
> + gimplify_assign (ivar, ref, &llist[0]);
> + ref = build_outer_var_ref (var, ctx);
> + gimplify_assign (ref, rvar, &llist[3]);
> + break;
> + }
> +
> gimplify_assign (unshare_expr (ivar), x, &llist[0]);
>
> if (sctx.is_simt)
> @@ -5346,12 +5395,6 @@ lower_rec_input_clauses (tree clauses, g
> ref = build_outer_var_ref (var, ctx);
> gimplify_assign (ref, x, &llist[1]);
>
> - if (new_vard != new_var)
> - {
> - SET_DECL_VALUE_EXPR (new_vard,
> - build_fold_addr_expr (lvar));
> - DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> - }
> }
> else
> {
> @@ -5456,14 +5499,23 @@ lower_rec_input_clauses (tree clauses, g
> if (sctx.lane)
> {
> gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
> - 1 + (nonconst_simd_if != NULL),
> - uid, nonconst_simd_if);
> + 2 + (nonconst_simd_if != NULL),
> + uid, integer_zero_node,
> + nonconst_simd_if);
> gimple_call_set_lhs (g, sctx.lane);
> gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
> gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
> g = gimple_build_assign (sctx.lane, INTEGER_CST,
> build_int_cst (unsigned_type_node, 0));
> gimple_seq_add_stmt (ilist, g);
> + if (sctx.lastlane)
> + {
> + g = gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
> + 2, uid, sctx.lane);
> + gimple_call_set_lhs (g, sctx.lastlane);
> + gimple_seq_add_stmt (dlist, g);
> + gimple_seq_add_seq (dlist, llist[3]);
> + }
> /* Emit reductions across SIMT lanes in log_2(simt_vf) steps. */
> if (llist[2])
> {
> @@ -5865,6 +5917,7 @@ lower_lastprivate_clauses (tree clauses,
> new_var = build4 (ARRAY_REF, TREE_TYPE (val),
> TREE_OPERAND (val, 0), lastlane,
> NULL_TREE, NULL_TREE);
> + TREE_THIS_NOTRAP (new_var) = 1;
> }
> }
> else if (maybe_simt)
> @@ -8371,6 +8424,108 @@ lower_omp_ordered (gimple_stmt_iterator
> }
>
>
> +/* Expand code for an OpenMP scan directive and the structured block
> + before the scan directive. */
> +
> +static void
> +lower_omp_scan (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> +{
> + gimple *stmt = gsi_stmt (*gsi_p);
> + bool has_clauses
> + = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)) != NULL;
> + tree lane = NULL_TREE;
> + gimple_seq before = NULL;
> + omp_context *octx = ctx->outer;
> + gcc_assert (octx);
> + bool input_phase = has_clauses ^ octx->scan_inclusive;
> + if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR
> + && (gimple_omp_for_kind (octx->stmt) & GF_OMP_FOR_SIMD)
> + && !gimple_omp_for_combined_into_p (octx->stmt)
> + && octx->scan_inclusive)
> + {
> + if (tree c = omp_find_clause (gimple_omp_for_clauses (octx->stmt),
> + OMP_CLAUSE__SIMDUID_))
> + {
> + tree uid = OMP_CLAUSE__SIMDUID__DECL (c);
> + lane = create_tmp_var (unsigned_type_node);
> + tree t = build_int_cst (integer_type_node, 1 + !input_phase);
> + gimple *g
> + = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 2, uid, t);
> + gimple_call_set_lhs (g, lane);
> + gimple_seq_add_stmt (&before, g);
> + }
> + for (tree c = gimple_omp_for_clauses (octx->stmt);
> + c; c = OMP_CLAUSE_CHAIN (c))
> + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> + && OMP_CLAUSE_REDUCTION_INSCAN (c))
> + {
> + tree var = OMP_CLAUSE_DECL (c);
> + tree new_var = lookup_decl (var, octx);
> + tree val = new_var;
> + tree var2 = NULL_TREE;
> + if (DECL_HAS_VALUE_EXPR_P (new_var))
> + {
> + val = DECL_VALUE_EXPR (new_var);
> + if (TREE_CODE (val) == ARRAY_REF
> + && VAR_P (TREE_OPERAND (val, 0)))
> + {
> + tree v = TREE_OPERAND (val, 0);
> + if (lookup_attribute ("omp simd array",
> + DECL_ATTRIBUTES (v)))
> + {
> + val = unshare_expr (val);
> + TREE_OPERAND (val, 1) = lane;
> + if (!input_phase)
> + {
> + var2 = lookup_decl (v, octx);
> + var2 = build4 (ARRAY_REF, TREE_TYPE (val),
> + var2, lane, NULL_TREE, NULL_TREE);
> + TREE_THIS_NOTRAP (var2) = 1;
> + }
> + else
> + var2 = val;
> + }
> + }
> + }
> + if (var2 == NULL_TREE)
> + gcc_unreachable ();
> + if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> + {
> + gcc_unreachable ();
> + }
> + else
> + {
> + if (input_phase)
> + {
> + /* input phase. Set val to initializer before
> + the body. */
> + tree x = omp_reduction_init (c, TREE_TYPE (new_var));
> + gimplify_assign (val, x, &before);
> + }
> + else
> + {
> + /* scan phase. */
> + enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
> + if (code == MINUS_EXPR)
> + code = PLUS_EXPR;
> +
> + tree x = build2 (code, TREE_TYPE (var2),
> + unshare_expr (var2), unshare_expr (val));
> + gimplify_assign (unshare_expr (var2), x, &before);
> + gimplify_assign (val, var2, &before);
> + }
> + }
> + }
> + }
> + else if (has_clauses)
> + sorry_at (gimple_location (stmt),
> + "%<#pragma omp scan%> not supported yet");
> + gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> + gsi_insert_seq_after (gsi_p, before, GSI_SAME_STMT);
> + gsi_replace (gsi_p, gimple_build_nop (), true);
> +}
> +
> +
> /* Gimplify a GIMPLE_OMP_CRITICAL statement. This is a relatively simple
> substitution of a couple of function calls. But in the NAMED case,
> requires that languages coordinate a symbol name. It is therefore
> @@ -10843,11 +10998,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p
> case GIMPLE_OMP_SCAN:
> ctx = maybe_lookup_ctx (stmt);
> gcc_assert (ctx);
> - gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> - if (gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> - sorry_at (gimple_location (stmt),
> - "%<#pragma omp scan%> not supported yet");
> - gsi_replace (gsi_p, gimple_build_nop (), true);
> + lower_omp_scan (gsi_p, ctx);
> break;
> case GIMPLE_OMP_CRITICAL:
> ctx = maybe_lookup_ctx (stmt);
> --- gcc/tree-vect-loop.c.jj 2019-06-13 13:28:36.581156223 +0200
> +++ gcc/tree-vect-loop.c 2019-06-14 14:53:10.734986707 +0200
> @@ -824,6 +824,7 @@ _loop_vec_info::_loop_vec_info (struct l
> peeling_for_alignment (0),
> ptr_mask (0),
> ivexpr_map (NULL),
> + scan_map (NULL),
> slp_unrolling_factor (1),
> single_scalar_iteration_cost (0),
> vectorizable (false),
> @@ -863,8 +864,8 @@ _loop_vec_info::_loop_vec_info (struct l
> gimple *stmt = gsi_stmt (si);
> gimple_set_uid (stmt, 0);
> add_stmt (stmt);
> - /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, the
> - second argument is the #pragma omp simd if (x) condition, when 0,
> + /* If .GOMP_SIMD_LANE call for the current loop has 3 arguments, the
> + third argument is the #pragma omp simd if (x) condition, when 0,
> loop shouldn't be vectorized, when non-zero constant, it should
> be vectorized normally, otherwise versioned with vectorized loop
> done if the condition is non-zero at runtime. */
> @@ -872,12 +873,12 @@ _loop_vec_info::_loop_vec_info (struct l
> && is_gimple_call (stmt)
> && gimple_call_internal_p (stmt)
> && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
> - && gimple_call_num_args (stmt) >= 2
> + && gimple_call_num_args (stmt) >= 3
> && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
> && (loop_in->simduid
> == SSA_NAME_VAR (gimple_call_arg (stmt, 0))))
> {
> - tree arg = gimple_call_arg (stmt, 1);
> + tree arg = gimple_call_arg (stmt, 2);
> if (integer_zerop (arg) || TREE_CODE (arg) == SSA_NAME)
> simd_if_cond = arg;
> else
> @@ -959,6 +960,7 @@ _loop_vec_info::~_loop_vec_info ()
>
> release_vec_loop_masks (&masks);
> delete ivexpr_map;
> + delete scan_map;
>
> loop->aux = NULL;
> }
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)
More information about the Gcc-patches
mailing list