[RFC] #pragma omp scan inclusive vectorization

Richard Biener rguenther@suse.de
Mon Jun 17 06:35:00 GMT 2019


On Fri, 14 Jun 2019, Jakub Jelinek wrote:

> Hi!
> 
> OpenMP 5.0 introduced scan reductions, like:
>   #pragma omp simd reduction (inscan, +:r)
>   for (int i = 0; i < 1024; i++)
>     {
>       r += a[i];
>       #pragma omp scan inclusive(r)
>       b[i] = r;
>     }
> where there are 2 parts of code in each iteration, one which is supposed
> to compute the value for the privatized reduction variable (the private
> copy is initialized with a neutral element of the operation at the
> start of that part), and then the #pragma omp scan is supposed to
> change that private variable to include (in this case) inclusive partial
> sums.  E.g. PSTL we now have in libstdc++-v3/include/pstl/ makes use of
> these when available to implement std::*_scan.  It can be done also in
> worksharing loops, but I'll get to that later.
> 
> Anyway, the problem is that e.g. with OpenMP user defined reductions,
> the initializer and combiner of the reduction aren't simple operations
> during OpenMP lowering, it can be a method call or constructor call etc.,
> so we need something that preserves those initializer and combiner snippets
> in the IL for the vectorizer to be able to optimize them if they are
> simplified enough, on the other side it needs to be something that the
> normal optimizers are able to optimize and that actually works even when
> the vectorization isn't performed.
> 
> The following (incomplete, but far enough that for non-user defined
> reductions it handles the inclusive scan) patch handles that by using
> more magic, it adds variants to the .GOMP_SIMD_LANE builtin and uses those,
> the old one (0) in the user code, another variant (1) in the initializer
> and another variant (2) in the combiner pattern, which the vectorizer then
> needs to pattern recognize and either vectorize, or punt on vectorization.
> If it vectorizes it, it emits code like (optimized dump):
>   <bb 5> [local count: 708669599]:
>   # ivtmp.27_45 = PHI <0(4), ivtmp.27_12(5)>
>   # D__lsm.39_80 = PHI <D__lsm.39_47(4), _64(5)>
>   vect__4.15_49 = MEM[base: a_23(D), index: ivtmp.27_45, offset: 0B];
>   _57 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, vect__4.15_49, { 0, 8, 9, 10, 11, 12, 13, 14 }>;
>   _58 = vect__4.15_49 + _57;
>   _59 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _58, { 0, 1, 8, 9, 10, 11, 12, 13 }>;
>   _60 = _58 + _59;
>   _61 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _60, { 0, 1, 2, 3, 8, 9, 10, 11 }>;
>   _62 = _60 + _61;
>   _63 = _62 + D__lsm.39_80;
>   _64 = VEC_PERM_EXPR <_63, _63, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>   MEM[base: b_32(D), index: ivtmp.27_45, offset: 0B] = _63;
>   ivtmp.27_12 = ivtmp.27_45 + 32;
>   if (ivtmp.27_12 != 4096)
>     goto <bb 5>; [83.33%]
>   else
>     goto <bb 6>; [16.67%]
> where the _57 ... _64 sequence is the implementation of the scan directive.
> 
> Does this look reasonable?

Ugh, not pretty but probably best we can do.  Btw, can you please
add support for the SLP case and group_size == 1?  I know I'm slow
with the branch ripping out the non-SLP path but it would save me
some extra work (possibly).

Thanks,
Richard.

> BTW, unfortunately SSE2 can't handle these permutations, probably I'll need
> optionally some other sequence if they aren't supported (only SSE4 does).
> In particular, what could be done is use whole vector shifts and
> VEC_COND_EXPR to blend the neutral element in.
> 
> --- gcc/tree-vect-stmts.c.jj	2019-06-13 13:28:36.636155362 +0200
> +++ gcc/tree-vect-stmts.c	2019-06-14 19:05:18.150502242 +0200
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssa-loop-niter.h"
>  #include "gimple-fold.h"
>  #include "regs.h"
> +#include "attribs.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -3257,7 +3258,7 @@ vectorizable_call (stmt_vec_info stmt_in
>    if (nargs == 0 || nargs > 4)
>      return false;
>  
> -  /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic.  */
> +  /* Ignore the arguments of IFN_GOMP_SIMD_LANE, they are magic.  */
>    combined_fn cfn = gimple_call_combined_fn (stmt);
>    if (cfn == CFN_GOMP_SIMD_LANE)
>      {
> @@ -6320,6 +6321,456 @@ get_group_alias_ptr_type (stmt_vec_info
>  }
>  
>  
> +/* Function check_scan_store.
> +
> +   Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
> +
> +static bool
> +check_scan_store (stmt_vec_info stmt_info, tree vectype,
> +		  enum vect_def_type rhs_dt, bool slp, tree mask,
> +		  vect_memory_access_type memory_access_type)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type;
> +
> +  gcc_assert (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1);
> +  if (slp
> +      || mask
> +      || memory_access_type != VMAT_CONTIGUOUS
> +      || TREE_CODE (DR_BASE_ADDRESS (dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0))
> +      || loop_vinfo == NULL
> +      || LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +      || STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +      || !integer_zerop (DR_OFFSET (dr_info->dr))
> +      || !integer_zerop (DR_INIT (dr_info->dr))
> +      || !(ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)))
> +      || !alias_sets_conflict_p (get_alias_set (vectype),
> +				 get_alias_set (TREE_TYPE (ref_type))))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "unsupported OpenMP scan store.\n");
> +      return false;
> +    }
> +
> +  /* We need to pattern match code built by OpenMP lowering and simplified
> +     by following optimizations into something we can handle.
> +     #pragma omp simd reduction(inscan,+:r)
> +     for (...)
> +       {
> +	 r += something ();
> +	 #pragma omp scan inclusive (r)
> +	 use (r);
> +       }
> +     shall have body with:
> +       // Initialization for input phase, store the reduction initializer:
> +       _20 = .GOMP_SIMD_LANE (simduid.3_14(D), 0);
> +       _21 = .GOMP_SIMD_LANE (simduid.3_14(D), 1);
> +       D.2042[_21] = 0;
> +       // Actual input phase:
> +       ...
> +       r.0_5 = D.2042[_20];
> +       _6 = _4 + r.0_5;
> +       D.2042[_20] = _6;
> +       // Initialization for scan phase:
> +       _25 = .GOMP_SIMD_LANE (simduid.3_14(D), 2);
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +       // Actual scan phase:
> +       ...
> +       r.1_8 = D.2042[_20];
> +       ...
> +     The "omp simd array" variable D.2042 holds the privatized copy used
> +     inside of the loop and D.2043 is another one that holds copies of
> +     the current original list item.  The separate GOMP_SIMD_LANE ifn
> +     kinds are there in order to allow optimizing the initializer store
> +     and combiner sequence, e.g. if it is originally some C++ish user
> +     defined reduction, but allow the vectorizer to pattern recognize it
> +     and turn into the appropriate vectorized scan.  */
> +
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 2)
> +    {
> +      /* Match the D.2042[_21] = 0; store above.  Just require that
> +	 it is a constant or external definition store.  */
> +      if (rhs_dt != vect_constant_def && rhs_dt != vect_external_def)
> +	{
> +	 fail_init:
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			     "unsupported OpenMP scan initializer store.\n");
> +	  return false;
> +	}
> +
> +      if (! loop_vinfo->scan_map)
> +	loop_vinfo->scan_map = new hash_map<tree, tree>;
> +      tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +      tree &cached = loop_vinfo->scan_map->get_or_insert (var);
> +      if (cached)
> +	goto fail_init;
> +      cached = gimple_assign_rhs1 (STMT_VINFO_STMT (stmt_info));
> +
> +      /* These stores can be vectorized normally.  */
> +      return true;
> +    }
> +
> +  if (rhs_dt != vect_internal_def)
> +    {
> +     fail:
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "unsupported OpenMP scan combiner pattern.\n");
> +      return false;
> +    }
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    goto fail;
> +
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *other_store_stmt = NULL;
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, rhs)
> +    {
> +      gimple *use_stmt = USE_STMT (use_p);
> +      if (use_stmt == stmt || is_gimple_debug (use_stmt))
> +	continue;
> +      if (gimple_bb (use_stmt) != gimple_bb (stmt)
> +	  || !gimple_store_p (use_stmt)
> +	  || other_store_stmt)
> +	goto fail;
> +      other_store_stmt = use_stmt;
> +    }
> +  if (other_store_stmt == NULL)
> +    goto fail;
> +  stmt_vec_info other_store_stmt_info
> +    = loop_vinfo->lookup_stmt (other_store_stmt);
> +  if (other_store_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (other_store_stmt_info) != 3)
> +    goto fail;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  if (gimple_bb (def_stmt) != gimple_bb (stmt)
> +      || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_class (def_stmt) != GIMPLE_BINARY_RHS)
> +    goto fail;
> +
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  /* For pointer addition, we should use the normal plus for the vector
> +     operation.  */
> +  switch (code)
> +    {
> +    case POINTER_PLUS_EXPR:
> +      code = PLUS_EXPR;
> +      break;
> +    case MULT_HIGHPART_EXPR:
> +      goto fail;
> +    default:
> +      break;
> +    }
> +  if (TREE_CODE_LENGTH (code) != binary_op || !commutative_tree_code (code))
> +    goto fail;
> +
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  if (TREE_CODE (rhs1) != SSA_NAME
> +      || TREE_CODE (rhs2) != SSA_NAME)
> +    goto fail;
> +
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  if (gimple_bb (load1_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load1_stmt)
> +      || gimple_bb (load2_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load2_stmt))
> +    goto fail;
> +
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  if (load1_stmt_info == NULL
> +      || load2_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load1_stmt_info) != 3
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load2_stmt_info) != 3)
> +    goto fail;
> +
> +  if (operand_equal_p (gimple_assign_lhs (stmt),
> +		       gimple_assign_rhs1 (load2_stmt), 0))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (load1_stmt, load2_stmt);
> +      std::swap (load1_stmt_info, load2_stmt_info);
> +    }
> +  if (!operand_equal_p (gimple_assign_lhs (stmt),
> +			gimple_assign_rhs1 (load1_stmt), 0)
> +      || !operand_equal_p (gimple_assign_lhs (other_store_stmt),
> +			   gimple_assign_rhs1 (load2_stmt), 0))
> +    goto fail;
> +
> +  dr_vec_info *other_dr_info = STMT_VINFO_DR_INFO (other_store_stmt_info);
> +  if (TREE_CODE (DR_BASE_ADDRESS (other_dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0)))
> +    goto fail;
> +
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0);
> +  if (!lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var1))
> +      || !lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var2))
> +      || (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +	 == (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var2))))
> +    goto fail;
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    std::swap (var1, var2);
> +
> +  if (loop_vinfo->scan_map == NULL)
> +    goto fail;
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  if (init == NULL)
> +    goto fail;
> +
> +  /* The IL is as expected, now check if we can actually vectorize it.
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +     should be vectorized as (where _40 is the vectorized rhs
> +     from the D.2042[_21] = 0; store):
> +       _30 = MEM <vector(8) int> [(int *)&D.2043];
> +       _31 = MEM <vector(8) int> [(int *)&D.2042];
> +       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
> +       _33 = _31 + _32;
> +       // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
> +       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
> +       _35 = _33 + _34;
> +       // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
> +       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
> +       _37 = _35 + _36;
> +       // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
> +       _38 = _30 + _37;
> +       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> +       MEM <vector(8) int> [(int *)&D.2043] = _39;
> +       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
> +  enum machine_mode vec_mode = TYPE_MODE (vectype);
> +  optab optab = optab_for_tree_code (code, vectype, optab_default);
> +  if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
> +    goto fail;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    goto fail;
> +  int units_log2 = exact_log2 (nunits);
> +  if (units_log2 <= 0)
> +    goto fail;
> +
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +	{
> +	  for (j = 0; j < nunits; ++j)
> +	    sel[j] = nunits - 1;
> +	}
> +      else
> +	{
> +	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +	    sel[j] = nunits + j;
> +	  for (k = 0; j < nunits; ++j, ++k)
> +	    sel[j] = k;
> +	}
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      if (!can_vec_perm_const_p (vec_mode, indices))
> +	goto fail;
> +    }
> +
> +  return true;
> +}
> +
> +
> +/* Function vectorizable_scan_store.
> +
> +   Helper of vectorizable_score, arguments like on vectorizable_store.
> +   Handle only the transformation, checking is done in check_scan_store.  */
> +
> +static bool
> +vectorizable_scan_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +			 stmt_vec_info *vec_stmt, int ncopies)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr));
> +  vec_info *vinfo = stmt_info->vinfo;
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "transform scan store. ncopies = %d\n", ncopies);
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  gcc_assert (TREE_CODE (rhs) == SSA_NAME);
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  if (code == POINTER_PLUS_EXPR)
> +    code = PLUS_EXPR;
> +  gcc_assert (TREE_CODE_LENGTH (code) == binary_op
> +	      && commutative_tree_code (code));
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME && TREE_CODE (rhs2) == SSA_NAME);
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  dr_vec_info *load1_dr_info = STMT_VINFO_DR_INFO (load1_stmt_info);
> +  dr_vec_info *load2_dr_info = STMT_VINFO_DR_INFO (load2_stmt_info);
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (load1_dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (load2_dr_info->dr), 0);
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (var1, var2);
> +    }
> +
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  gcc_assert (init);
> +
> +  tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  bool inscan_var_store
> +    = lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var)) != NULL;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    gcc_unreachable ();
> +  int units_log2 = exact_log2 (nunits);
> +  gcc_assert (units_log2 > 0);
> +  auto_vec<tree, 16> perms;
> +  perms.quick_grow (units_log2 + 1);
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +	{
> +	  for (j = 0; j < nunits; ++j)
> +	    sel[j] = nunits - 1;
> +	}
> +      else
> +	{
> +	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +	    sel[j] = nunits + j;
> +	  for (k = 0; j < nunits; ++j, ++k)
> +	    sel[j] = k;
> +	}
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
> +    }
> +
> +  stmt_vec_info prev_stmt_info = NULL;
> +  tree vec_oprnd1 = NULL_TREE;
> +  tree vec_oprnd2 = NULL_TREE;
> +  tree vec_oprnd3 = NULL_TREE;
> +  tree dataref_ptr = unshare_expr (DR_BASE_ADDRESS (dr_info->dr));
> +  tree dataref_offset = build_int_cst (ref_type, 0);
> +  tree bump = vect_get_data_ptr_increment (dr_info, vectype, VMAT_CONTIGUOUS);
> +  tree orig = NULL_TREE;
> +  for (int j = 0; j < ncopies; j++)
> +    {
> +      stmt_vec_info new_stmt_info;
> +      if (j == 0)
> +	{
> +	  vec_oprnd1 = vect_get_vec_def_for_operand (*init, stmt_info);
> +	  vec_oprnd2 = vect_get_vec_def_for_operand (rhs1, stmt_info);
> +	  vec_oprnd3 = vect_get_vec_def_for_operand (rhs2, stmt_info);
> +	  orig = vec_oprnd3;
> +	}
> +      else
> +	{
> +	  vec_oprnd1 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd1);
> +	  vec_oprnd2 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd2);
> +	  vec_oprnd3 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd3);
> +	  if (!inscan_var_store)
> +	    dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +	}
> +
> +      tree v = vec_oprnd2;
> +      for (int i = 0; i < units_log2; ++i)
> +	{
> +	  tree new_temp = make_ssa_name (vectype);
> +	  gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
> +					   vec_oprnd1, perms[i]);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  if (prev_stmt_info == NULL)
> +	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
> +	  else
> +	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +
> +	  tree new_temp2 = make_ssa_name (vectype);
> +	  g = gimple_build_assign (new_temp2, code, v, new_temp);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +
> +	  v = new_temp2;
> +	}
> +
> +      tree new_temp = make_ssa_name (vectype);
> +      gimple *g = gimple_build_assign (new_temp, code, orig, v);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      orig = make_ssa_name (vectype);
> +      g = gimple_build_assign (orig, VEC_PERM_EXPR, new_temp, new_temp,
> +			       perms[units_log2]);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      if (!inscan_var_store)
> +	{
> +	  tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +				       dataref_offset);
> +	  vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +	  g = gimple_build_assign (data_ref, new_temp);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +	}
> +    }
> +
> +  if (inscan_var_store)
> +    for (int j = 0; j < ncopies; j++)
> +      {
> +	if (j != 0)
> +	  dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +
> +	tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +				     dataref_offset);
> +	vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +	gimple *g = gimple_build_assign (data_ref, orig);
> +	stmt_vec_info new_stmt_info
> +	  = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	prev_stmt_info = new_stmt_info;
> +      }
> +  return true;
> +}
> +
> +
>  /* Function vectorizable_store.
>  
>     Check if STMT_INFO defines a non scalar data-ref (array/pointer/structure)
> @@ -6514,6 +6965,13 @@ vectorizable_store (stmt_vec_info stmt_i
>        group_size = vec_num = 1;
>      }
>  
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1 && !vec_stmt)
> +    {
> +      if (!check_scan_store (stmt_info, vectype, rhs_dt, slp, mask,
> +			     memory_access_type))
> +	return false;
> +    }
> +
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
> @@ -6737,6 +7195,8 @@ vectorizable_store (stmt_vec_info stmt_i
>  	}
>        return true;
>      }
> +  else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 3)
> +    return vectorizable_scan_store (stmt_info, gsi, vec_stmt, ncopies);
>  
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
> @@ -7162,7 +7622,7 @@ vectorizable_store (stmt_vec_info stmt_i
>  	  gcc_assert (useless_type_conversion_p (vectype,
>  						 TREE_TYPE (vec_oprnd)));
>  	  bool simd_lane_access_p
> -	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>  	  if (simd_lane_access_p
>  	      && !loop_masks
>  	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> @@ -8347,7 +8807,7 @@ vectorizable_load (stmt_vec_info stmt_in
>        if (j == 0)
>  	{
>  	  bool simd_lane_access_p
> -	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>  	  if (simd_lane_access_p
>  	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
>  	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> --- gcc/tree-vect-data-refs.c.jj	2019-06-13 12:06:17.786472401 +0200
> +++ gcc/tree-vect-data-refs.c	2019-06-14 09:52:14.920718040 +0200
> @@ -3003,6 +3003,13 @@ vect_analyze_data_ref_accesses (vec_info
>  	      || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>  	    break;
>  
> +	  /* Different .GOMP_SIMD_LANE calls still give the same lane,
> +	     just hold extra information.  */
> +	  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_a)
> +	      && STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_b)
> +	      && data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb)) == 0)
> +	    break;
> +
>  	  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>  	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>  	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> @@ -4101,7 +4108,8 @@ vect_find_stmt_data_reference (loop_p lo
>  			  DR_STEP_ALIGNMENT (newdr)
>  			    = highest_pow2_factor (step);
>  			  /* Mark as simd-lane access.  */
> -			  newdr->aux = (void *)-1;
> +			  tree arg2 = gimple_call_arg (def, 1);
> +			  newdr->aux = (void *) (-1 - tree_to_uhwi (arg2));
>  			  free_data_ref (dr);
>  			  datarefs->safe_push (newdr);
>  			  return opt_result::success ();
> @@ -4210,14 +4218,17 @@ vect_analyze_data_refs (vec_info *vinfo,
>          }
>  
>        /* See if this was detected as SIMD lane access.  */
> -      if (dr->aux == (void *)-1)
> +      if (dr->aux == (void *)-1
> +	  || dr->aux == (void *)-2
> +	  || dr->aux == (void *)-3)
>  	{
>  	  if (nested_in_vect_loop_p (loop, stmt_info))
>  	    return opt_result::failure_at (stmt_info->stmt,
>  					   "not vectorized:"
>  					   " data ref analysis failed: %G",
>  					   stmt_info->stmt);
> -	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
> +	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info)
> +	    = -(uintptr_t) dr->aux;
>  	}
>  
>        tree base = get_base_address (DR_REF (dr));
> --- gcc/tree-vectorizer.h.jj	2019-06-13 12:50:31.597926603 +0200
> +++ gcc/tree-vectorizer.h	2019-06-14 16:51:53.155792356 +0200
> @@ -491,6 +491,10 @@ typedef struct _loop_vec_info : public v
>    /* Map of IV base/step expressions to inserted name in the preheader.  */
>    hash_map<tree_operand_hash, tree> *ivexpr_map;
>  
> +  /* Map of OpenMP "omp simd array" scan variables to corresponding
> +     rhs of the store of the initializer.  */
> +  hash_map<tree, tree> *scan_map;
> +
>    /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
>       applied to the loop, i.e., no unrolling is needed, this is 1.  */
>    poly_uint64 slp_unrolling_factor;
> @@ -913,7 +917,7 @@ struct _stmt_vec_info {
>    bool strided_p;
>  
>    /* For both loads and stores.  */
> -  bool simd_lane_access_p;
> +  unsigned simd_lane_access_p : 2;
>  
>    /* Classifies how the load or store is going to be implemented
>       for loop vectorization.  */
> --- gcc/tree-ssa-dce.c.jj	2019-06-13 13:28:36.763153374 +0200
> +++ gcc/tree-ssa-dce.c	2019-06-13 14:20:14.889711910 +0200
> @@ -1339,14 +1339,14 @@ eliminate_unnecessary_stmts (void)
>  		  update_stmt (stmt);
>  		  release_ssa_name (name);
>  
> -		  /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
> +		  /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON
>  		     without lhs is not needed.  */
>  		  if (gimple_call_internal_p (stmt))
>  		    switch (gimple_call_internal_fn (stmt))
>  		      {
>  		      case IFN_GOMP_SIMD_LANE:
> -			if (gimple_call_num_args (stmt) >= 2
> -			    && !integer_nonzerop (gimple_call_arg (stmt, 1)))
> +			if (gimple_call_num_args (stmt) >= 3
> +			    && !integer_nonzerop (gimple_call_arg (stmt, 2)))
>  			  break;
>  			/* FALLTHRU */
>  		      case IFN_ASAN_POISON:
> --- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj	2019-06-14 19:00:40.918765225 +0200
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c	2019-06-14 19:01:43.755798987 +0200
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#include "tree-vect.h"
> +
> +int r, a[1024], b[1024];
> +
> +__attribute__((noipa)) void
> +foo (int *a, int *b)
> +{
> +  #pragma omp simd reduction (inscan, +:r)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      r += a[i];
> +      #pragma omp scan inclusive(r)
> +      b[i] = r;
> +    }
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  int s = 0;
> +  #pragma omp simd reduction (inscan, +:s)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      s += 2 * a[i];
> +      #pragma omp scan inclusive(s)
> +      b[i] = s;
> +    }
> +  return s;
> +}
> +
> +int
> +main ()
> +{
> +  int s = 0;
> +  check_vect ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      a[i] = i;
> +      b[i] = -1;
> +      asm ("" : "+g" (i));
> +    }
> +  foo (a, b);
> +  if (r != 1024 * 1023 / 2)
> +    abort ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += i;
> +      if (b[i] != s)
> +	abort ();
> +      else
> +	b[i] = 25;
> +    }
> +  if (bar () != 1024 * 1023)
> +    abort ();
> +  s = 0;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += 2 * i;
> +      if (b[i] != s)
> +	abort ();
> +    }
> +  return 0;
> +}
> --- gcc/omp-low.c.jj	2019-06-13 13:28:36.611155753 +0200
> +++ gcc/omp-low.c	2019-06-14 18:54:14.976699854 +0200
> @@ -141,6 +141,9 @@ struct omp_context
>    /* True if lower_omp_1 should look up lastprivate conditional in parent
>       context.  */
>    bool combined_into_simd_safelen0;
> +
> +  /* True if there is nested scan context with inclusive clause.  */
> +  bool scan_inclusive;
>  };
>  
>  static splay_tree all_contexts;
> @@ -3329,11 +3332,15 @@ scan_omp_1_stmt (gimple_stmt_iterator *g
>        scan_omp_single (as_a <gomp_single *> (stmt), ctx);
>        break;
>  
> +    case GIMPLE_OMP_SCAN:
> +      if (tree clauses = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> +	if (OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_INCLUSIVE)
> +	  ctx->scan_inclusive = true;
> +      /* FALLTHRU */
>      case GIMPLE_OMP_SECTION:
>      case GIMPLE_OMP_MASTER:
>      case GIMPLE_OMP_ORDERED:
>      case GIMPLE_OMP_CRITICAL:
> -    case GIMPLE_OMP_SCAN:
>      case GIMPLE_OMP_GRID_BODY:
>        ctx = new_omp_context (stmt, ctx);
>        scan_omp (gimple_omp_body_ptr (stmt), ctx);
> @@ -3671,6 +3678,7 @@ struct omplow_simd_context {
>    omplow_simd_context () { memset (this, 0, sizeof (*this)); }
>    tree idx;
>    tree lane;
> +  tree lastlane;
>    vec<tree, va_heap> simt_eargs;
>    gimple_seq simt_dlist;
>    poly_uint64_pod max_vf;
> @@ -3682,7 +3690,8 @@ struct omplow_simd_context {
>  
>  static bool
>  lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
> -			      omplow_simd_context *sctx, tree &ivar, tree &lvar)
> +			      omplow_simd_context *sctx, tree &ivar,
> +			      tree &lvar, tree *rvar = NULL)
>  {
>    if (known_eq (sctx->max_vf, 0U))
>      {
> @@ -3738,7 +3747,27 @@ lower_rec_simd_input_clauses (tree new_v
>  	= tree_cons (get_identifier ("omp simd array"), NULL,
>  		     DECL_ATTRIBUTES (avar));
>        gimple_add_tmp_var (avar);
> -      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
> +      tree iavar = avar;
> +      if (rvar)
> +	{
> +	  /* For inscan reductions, create another array temporary,
> +	     which will hold the reduced value.  */
> +	  iavar = create_tmp_var_raw (atype);
> +	  if (TREE_ADDRESSABLE (new_var))
> +	    TREE_ADDRESSABLE (iavar) = 1;
> +	  DECL_ATTRIBUTES (iavar)
> +	    = tree_cons (get_identifier ("omp simd array"), NULL,
> +			 tree_cons (get_identifier ("omp simd inscan"), NULL,
> +				    DECL_ATTRIBUTES (iavar)));
> +	  gimple_add_tmp_var (iavar);
> +	  ctx->cb.decl_map->put (avar, iavar);
> +	  if (sctx->lastlane == NULL_TREE)
> +	    sctx->lastlane = create_tmp_var (unsigned_type_node);
> +	  *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
> +			  sctx->lastlane, NULL_TREE, NULL_TREE);
> +	  TREE_THIS_NOTRAP (*rvar) = 1;
> +	}
> +      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
>  		     NULL_TREE, NULL_TREE);
>        lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
>  		     NULL_TREE, NULL_TREE);
> @@ -3814,7 +3843,7 @@ lower_rec_input_clauses (tree clauses, g
>    omplow_simd_context sctx = omplow_simd_context ();
>    tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
>    tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
> -  gimple_seq llist[3] = { };
> +  gimple_seq llist[4] = { };
>    tree nonconst_simd_if = NULL_TREE;
>  
>    copyin_seq = NULL;
> @@ -5324,12 +5353,32 @@ lower_rec_input_clauses (tree clauses, g
>  		      new_vard = TREE_OPERAND (new_var, 0);
>  		      gcc_assert (DECL_P (new_vard));
>  		    }
> +		  tree rvar = NULL_TREE, *rvarp = NULL;
> +		  if (is_simd
> +		      && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +		      && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +		    rvarp = &rvar;
>  		  if (is_simd
>  		      && lower_rec_simd_input_clauses (new_var, ctx, &sctx,
> -						       ivar, lvar))
> +						       ivar, lvar, rvarp))
>  		    {
> +		      if (new_vard != new_var)
> +			{
> +			  SET_DECL_VALUE_EXPR (new_vard,
> +					       build_fold_addr_expr (lvar));
> +			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> +			}
> +
>  		      tree ref = build_outer_var_ref (var, ctx);
>  
> +		      if (rvarp)
> +			{
> +			  gimplify_assign (ivar, ref, &llist[0]);
> +			  ref = build_outer_var_ref (var, ctx);
> +			  gimplify_assign (ref, rvar, &llist[3]);
> +			  break;
> +			}
> +
>  		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
>  
>  		      if (sctx.is_simt)
> @@ -5346,12 +5395,6 @@ lower_rec_input_clauses (tree clauses, g
>  		      ref = build_outer_var_ref (var, ctx);
>  		      gimplify_assign (ref, x, &llist[1]);
>  
> -		      if (new_vard != new_var)
> -			{
> -			  SET_DECL_VALUE_EXPR (new_vard,
> -					       build_fold_addr_expr (lvar));
> -			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> -			}
>  		    }
>  		  else
>  		    {
> @@ -5456,14 +5499,23 @@ lower_rec_input_clauses (tree clauses, g
>    if (sctx.lane)
>      {
>        gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
> -					      1 + (nonconst_simd_if != NULL),
> -					      uid, nonconst_simd_if);
> +					      2 + (nonconst_simd_if != NULL),
> +					      uid, integer_zero_node,
> +					      nonconst_simd_if);
>        gimple_call_set_lhs (g, sctx.lane);
>        gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
>        gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
>        g = gimple_build_assign (sctx.lane, INTEGER_CST,
>  			       build_int_cst (unsigned_type_node, 0));
>        gimple_seq_add_stmt (ilist, g);
> +      if (sctx.lastlane)
> +	{
> +	  g = gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
> +					  2, uid, sctx.lane);
> +	  gimple_call_set_lhs (g, sctx.lastlane);
> +	  gimple_seq_add_stmt (dlist, g);
> +	  gimple_seq_add_seq (dlist, llist[3]);
> +	}
>        /* Emit reductions across SIMT lanes in log_2(simt_vf) steps.  */
>        if (llist[2])
>  	{
> @@ -5865,6 +5917,7 @@ lower_lastprivate_clauses (tree clauses,
>  		  new_var = build4 (ARRAY_REF, TREE_TYPE (val),
>  				    TREE_OPERAND (val, 0), lastlane,
>  				    NULL_TREE, NULL_TREE);
> +		  TREE_THIS_NOTRAP (new_var) = 1;
>  		}
>  	    }
>  	  else if (maybe_simt)
> @@ -8371,6 +8424,108 @@ lower_omp_ordered (gimple_stmt_iterator
>  }
>  
>  
> +/* Expand code for an OpenMP scan directive and the structured block
> +   before the scan directive.  */
> +
> +static void
> +lower_omp_scan (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> +{
> +  gimple *stmt = gsi_stmt (*gsi_p);
> +  bool has_clauses
> +    = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)) != NULL;
> +  tree lane = NULL_TREE;
> +  gimple_seq before = NULL;
> +  omp_context *octx = ctx->outer;
> +  gcc_assert (octx);
> +  bool input_phase = has_clauses ^ octx->scan_inclusive;
> +  if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR
> +      && (gimple_omp_for_kind (octx->stmt) & GF_OMP_FOR_SIMD)
> +      && !gimple_omp_for_combined_into_p (octx->stmt)
> +      && octx->scan_inclusive)
> +    {
> +      if (tree c = omp_find_clause (gimple_omp_for_clauses (octx->stmt),
> +				    OMP_CLAUSE__SIMDUID_))
> +	{
> +	  tree uid = OMP_CLAUSE__SIMDUID__DECL (c);
> +	  lane = create_tmp_var (unsigned_type_node);
> +	  tree t = build_int_cst (integer_type_node, 1 + !input_phase);
> +	  gimple *g
> +	    = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 2, uid, t);
> +	  gimple_call_set_lhs (g, lane);
> +	  gimple_seq_add_stmt (&before, g);
> +	}
> +      for (tree c = gimple_omp_for_clauses (octx->stmt);
> +	   c; c = OMP_CLAUSE_CHAIN (c))
> +	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +	    && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +	  {
> +	    tree var = OMP_CLAUSE_DECL (c);
> +	    tree new_var = lookup_decl (var, octx);
> +	    tree val = new_var;
> +	    tree var2 = NULL_TREE;
> +	    if (DECL_HAS_VALUE_EXPR_P (new_var))
> +	      {
> +		val = DECL_VALUE_EXPR (new_var);
> +		if (TREE_CODE (val) == ARRAY_REF
> +		    && VAR_P (TREE_OPERAND (val, 0)))
> +		  {
> +		    tree v = TREE_OPERAND (val, 0);
> +		    if (lookup_attribute ("omp simd array",
> +					  DECL_ATTRIBUTES (v)))
> +		      {
> +			val = unshare_expr (val);
> +			TREE_OPERAND (val, 1) = lane;
> +			if (!input_phase)
> +			  {
> +			    var2 = lookup_decl (v, octx);
> +			    var2 = build4 (ARRAY_REF, TREE_TYPE (val),
> +					   var2, lane, NULL_TREE, NULL_TREE);
> +			    TREE_THIS_NOTRAP (var2) = 1;
> +			  }
> +			else
> +			  var2 = val;
> +		      }
> +		  }
> +	      }
> +	    if (var2 == NULL_TREE)
> +	      gcc_unreachable ();
> +	    if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> +	      {
> +		gcc_unreachable ();
> +	      }
> +	    else
> +	      {
> +		if (input_phase)
> +		  {
> +		    /* input phase.  Set val to initializer before
> +		       the body.  */
> +		    tree x = omp_reduction_init (c, TREE_TYPE (new_var));
> +		    gimplify_assign (val, x, &before);
> +		  }
> +		else
> +		  {
> +		    /* scan phase.  */
> +		    enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
> +		    if (code == MINUS_EXPR)
> +		      code = PLUS_EXPR;
> +
> +		    tree x = build2 (code, TREE_TYPE (var2),
> +				     unshare_expr (var2), unshare_expr (val));
> +		    gimplify_assign (unshare_expr (var2), x, &before);
> +		    gimplify_assign (val, var2, &before);
> +		  }
> +	      }
> +	  }
> +    }
> +  else if (has_clauses)
> +    sorry_at (gimple_location (stmt),
> +	      "%<#pragma omp scan%> not supported yet");
> +  gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> +  gsi_insert_seq_after (gsi_p, before, GSI_SAME_STMT);
> +  gsi_replace (gsi_p, gimple_build_nop (), true);
> +}
> +
> +
>  /* Gimplify a GIMPLE_OMP_CRITICAL statement.  This is a relatively simple
>     substitution of a couple of function calls.  But in the NAMED case,
>     requires that languages coordinate a symbol name.  It is therefore
> @@ -10843,11 +10998,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p
>      case GIMPLE_OMP_SCAN:
>        ctx = maybe_lookup_ctx (stmt);
>        gcc_assert (ctx);
> -      gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> -      if (gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> -	sorry_at (gimple_location (stmt),
> -		  "%<#pragma omp scan%> not supported yet");
> -      gsi_replace (gsi_p, gimple_build_nop (), true);
> +      lower_omp_scan (gsi_p, ctx);
>        break;
>      case GIMPLE_OMP_CRITICAL:
>        ctx = maybe_lookup_ctx (stmt);
> --- gcc/tree-vect-loop.c.jj	2019-06-13 13:28:36.581156223 +0200
> +++ gcc/tree-vect-loop.c	2019-06-14 14:53:10.734986707 +0200
> @@ -824,6 +824,7 @@ _loop_vec_info::_loop_vec_info (struct l
>      peeling_for_alignment (0),
>      ptr_mask (0),
>      ivexpr_map (NULL),
> +    scan_map (NULL),
>      slp_unrolling_factor (1),
>      single_scalar_iteration_cost (0),
>      vectorizable (false),
> @@ -863,8 +864,8 @@ _loop_vec_info::_loop_vec_info (struct l
>  	  gimple *stmt = gsi_stmt (si);
>  	  gimple_set_uid (stmt, 0);
>  	  add_stmt (stmt);
> -	  /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, the
> -	     second argument is the #pragma omp simd if (x) condition, when 0,
> +	  /* If .GOMP_SIMD_LANE call for the current loop has 3 arguments, the
> +	     third argument is the #pragma omp simd if (x) condition, when 0,
>  	     loop shouldn't be vectorized, when non-zero constant, it should
>  	     be vectorized normally, otherwise versioned with vectorized loop
>  	     done if the condition is non-zero at runtime.  */
> @@ -872,12 +873,12 @@ _loop_vec_info::_loop_vec_info (struct l
>  	      && is_gimple_call (stmt)
>  	      && gimple_call_internal_p (stmt)
>  	      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
> -	      && gimple_call_num_args (stmt) >= 2
> +	      && gimple_call_num_args (stmt) >= 3
>  	      && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
>  	      && (loop_in->simduid
>  		  == SSA_NAME_VAR (gimple_call_arg (stmt, 0))))
>  	    {
> -	      tree arg = gimple_call_arg (stmt, 1);
> +	      tree arg = gimple_call_arg (stmt, 2);
>  	      if (integer_zerop (arg) || TREE_CODE (arg) == SSA_NAME)
>  		simd_if_cond = arg;
>  	      else
> @@ -959,6 +960,7 @@ _loop_vec_info::~_loop_vec_info ()
>  
>    release_vec_loop_masks (&masks);
>    delete ivexpr_map;
> +  delete scan_map;
>  
>    loop->aux = NULL;
>  }
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)


More information about the Gcc-patches mailing list