[PATCH][RFC] Add versioning for constant strides for vectorization

Jack Howarth howarth@bromo.med.uc.edu
Sat Mar 13 19:19:00 GMT 2010


On Fri, Jan 23, 2009 at 05:08:43PM +0100, Richard Guenther wrote:
> 
> This patch adds the capability to the vectorizer to perform versioning
> for the case of a constant (suitable) stride.  For example for
> 
> subroutine to_product_of(self,a,b,a1,a2)
>   complex(kind=8) :: self (:)
>   complex(kind=8), intent(in) :: a(:,:)
>   complex(kind=8), intent(in) :: b(:)
>   integer a1,a2
>   do i = 1,a1
>     do j = 1,a2
>       self(i) = self(i) + a(j,i)*b(j)
>     end do
>   end do
> end subroutine
> 
> we can only apply vectorization if the strides of the fastest dimension
> of self, a and b are one (they are loaded from the passed array
> descriptors and thus appear as (loop invariant) variables).
> 
> During the implementation of this I noticed that peeling for
> number of iterations (we have to unroll the above loop twice, and so
> for an odd number of iterations have a epilogue loop for the remaining
> iteration(s)) does not play well with versioning and we end up
> vectorizing the wrong loop.  So I just disabled versioning if we
> apply peeling with an epilogue loop and instead attach the versioning
> condition to the pre-condition of the main loop that skips directly
> to the epilogue if the number of iterations is too small.  We obviously
> can use the epilogue loop as the non-vectorized version.
> 
> This patch also inserts an extra copyprop and dce pass before the
> vectorizer so it can recognize the reduction in the above testcase
> (LIM has made that reduction non-obvious).  So I noticed that
> copyprop does not preserve loop-closed SSA form and fixed that as well.
> 
> Some earlier version bootstrapped and tested ok on 
> x86_64-unknown-linux-gnu, a final attempt is still running.
> 
> I didn't yet performance test this extensively, but it might need
> cost-model adjustments and/or need to wait until we have profile
> feedback to properly seed vectorizer analysis here.  A micro-benchmark
> based on the above loop shows around 15% improvement on AMD K10.
> 
> Feedback (and ppc testing) is still welcome of course.
> 
> Thanks,
> Richard.
> 
> 2009-01-23  Richard Guenther  <rguenther@suse.de>
> 
> 	* passes.c (init_optimization_passes): Add copy-prop and dce
> 	before vectorization.
> 	* Makefile.in (tree-ssa-copy.o): Add $(CFGLOOP_H) dependency.
> 	* tree-ssa-copy.c (init_copy_prop): Do not propagate through
> 	single-argument PHIs if we are in loop-closed SSA form.
> 	* tree-data-ref.c (dr_analyze_innermost): Allow affine offsets.
> 	* tree-vect-analyze.c (vect_check_interleaving): Check that
> 	DR_STEP is constant.
> 	(vect_enhance_data_refs_alignment): If versioning for strides
> 	is required do not peel.
> 	(vect_analyze_data_ref_access): Allow non-constant step of
> 	a specific form, remember them for versioning.
> 	* params.def (vect-max-version-for-stride-checks): New param.
> 	(vect-version-for-stride-value): Likewise.
> 	* tree-vectorizer.c (slpeel_add_loop_guard): Pass extra guards
> 	for the pre-condition.
> 	(slpeel_tree_peel_loop_to_edge): Likewise.
> 	(new_loop_vec_info): Allocate stride versioning data.
> 	(destroy_loop_vec_info): Free stride versioning data.
> 	* tree-vectorizer.h (struct _loop_vec_info): Add variable_strides
> 	field.
> 	(LOOP_VINFO_VARIABLE_STRIDES): Define.
> 	(slpeel_tree_peel_loop_to_edge): Adjust declaration.
> 	* tree-vect-transform.c (vect_build_loop_niters): Take an
> 	optional sequence to append stmts.
> 	(vect_generate_tmps_on_preheader): Likewise.
> 	(vect_do_peeling_for_loop_bound): Take extra guards for the
> 	pre-condition.
> 	(vect_do_peeling_for_alignment): Adjust.
> 	(vect_create_cond_for_stride_checks): New function.
> 	(vect_loop_versioning): Take stmt and stmt list to put pre-condition
> 	guards if we are going to peel.  Do not apply versioning in that
> 	case.
> 	(vect_transform_loop): If we are peeling for loop bound only
> 	record extra pre-conditions, do not apply loop versioning.
> 
> 	* gcc.dg/vect/fast-math-vect-complex-5.c: New testcase.
> 	* gfortran.dg/vect/fast-math-vect-complex-1.f90: Likewise.
> 	* gfortran.dg/vect/fast-math-vect-stride-1.f90: Likewise.
> 
> Index: trunk/gcc/passes.c
> ===================================================================
> *** trunk.orig/gcc/passes.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/passes.c	2009-01-23 16:48:50.000000000 +0100
> *************** init_optimization_passes (void)
> *** 659,664 ****
> --- 659,666 ----
>   	  NEXT_PASS (pass_graphite_transforms);
>   	  NEXT_PASS (pass_iv_canon);
>   	  NEXT_PASS (pass_if_conversion);
> + 	  NEXT_PASS (pass_copy_prop);
> + 	  NEXT_PASS (pass_dce_loop);
>   	  NEXT_PASS (pass_vectorize);
>   	    {
>   	      struct opt_pass **p = &pass_vectorize.pass.sub;
> Index: trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do compile } */
> + /* { dg-require-effective-target vect_double } */
> + 
> + #define NUM 64
> + _Complex double ad[NUM], bd[NUM], cd[NUM];
> + 
> + void testd(void)
> + {
> +   int i;
> +   int j;
> + 
> +   for (i = 0; i < NUM; i++)
> +     for (j = 0; j < NUM; j++)
> +       cd[i] = cd[i] + ad[j] * bd[j];
> + }
> + 
> + /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> + /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,16 ----
> + ! { dg-do compile }
> + 
> + subroutine to_product_of(self,a,b,a1,a2)
> +   complex(kind=8) :: self (:)
> +   complex(kind=8), intent(in) :: a(:,:)
> +   complex(kind=8), intent(in) :: b(:)
> +   integer a1,a2
> +   do i = 1,a1
> +     do j = 1,a2
> +       self(i) = self(i) + a(i,j)*b(j)
> +     end do
> +   end do
> + end subroutine
> + 
> + ! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/tree-data-ref.c
> ===================================================================
> *** trunk.orig/gcc/tree-data-ref.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-data-ref.c	2009-01-23 16:48:50.000000000 +0100
> *************** dr_analyze_innermost (struct data_refere
> *** 708,714 ****
>         offset_iv.base = ssize_int (0);
>         offset_iv.step = ssize_int (0);
>       }
> !   else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>   	fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> --- 708,714 ----
>         offset_iv.base = ssize_int (0);
>         offset_iv.step = ssize_int (0);
>       }
> !   else if (!simple_iv (loop, stmt, poffset, &offset_iv, true))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>   	fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> Index: trunk/gcc/tree-vect-analyze.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-analyze.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-analyze.c	2009-01-23 16:48:50.000000000 +0100
> *************** vect_check_interleaving (struct data_ref
> *** 1109,1114 ****
> --- 1109,1116 ----
>     type_size_b = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb))));
>   
>     if (type_size_a != type_size_b
> +       || TREE_CODE (DR_STEP (dra)) != INTEGER_CST
> +       || TREE_CODE (DR_STEP (drb)) != INTEGER_CST
>         || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb))
>         || !types_compatible_p (TREE_TYPE (DR_REF (dra)), 
>                                 TREE_TYPE (DR_REF (drb))))
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1825,1830 ****
> --- 1827,1833 ----
>     gimple stmt;
>     stmt_vec_info stmt_info;
>     int vect_versioning_for_alias_required;
> +   int vect_versioning_for_strides_required;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1892,1904 ****
>   
>     vect_versioning_for_alias_required =
>       (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
>   
>     /* Temporarily, if versioning for alias is required, we disable peeling
>        until we support peeling and versioning.  Often peeling for alignment
>        will require peeling for loop-bound, which in turn requires that we
>        know how to adjust the loop ivs after the loop.  */
>     if (vect_versioning_for_alias_required
> !        || !vect_can_advance_ivs_p (loop_vinfo)
>         || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>       do_peeling = false;
>   
> --- 1895,1910 ----
>   
>     vect_versioning_for_alias_required =
>       (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
> +   vect_versioning_for_strides_required =
> +     !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
>   
>     /* Temporarily, if versioning for alias is required, we disable peeling
>        until we support peeling and versioning.  Often peeling for alignment
>        will require peeling for loop-bound, which in turn requires that we
>        know how to adjust the loop ivs after the loop.  */
>     if (vect_versioning_for_alias_required
> !       || vect_versioning_for_strides_required
> !       || !vect_can_advance_ivs_p (loop_vinfo)
>         || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>       do_peeling = false;
>   
> *************** vect_analyze_data_ref_access (struct dat
> *** 2349,2357 ****
>     stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> !   HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
>   
> !   if (!step)
>       {
>         if (vect_print_dump_info (REPORT_DETAILS))
>   	fprintf (vect_dump, "bad data-ref access");
> --- 2355,2364 ----
>     stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> !   HOST_WIDE_INT dr_step;
>   
> !   if (!step
> !       || TREE_CODE (step) != INTEGER_CST)
>       {
>         if (vect_print_dump_info (REPORT_DETAILS))
>   	fprintf (vect_dump, "bad data-ref access");
> *************** vect_analyze_data_ref_access (struct dat
> *** 2359,2364 ****
> --- 2366,2372 ----
>       }
>   
>     /* Don't allow invariant accesses.  */
> +   dr_step = TREE_INT_CST_LOW (step);
>     if (dr_step == 0)
>       return false; 
>   
> *************** vect_analyze_data_refs (loop_vec_info lo
> *** 3563,3568 ****
> --- 3571,3620 ----
>             return false;
>           }
>   
> +       /* If the non-constant (but loop invariant) step is of the
> + 	 form NAME or NAME * CST where CST is the element size mark
> + 	 this ddr for versioning for strides and re-set DR_STEP
> + 	 to the value we will version for.  Otherwise reject
> + 	 non-constant steps.  */
> +       if (TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
> + 	{
> + 	  tree step = DR_STEP (dr);
> + 
> + 	  STRIP_NOPS (step);
> + 	  if (flag_tree_vect_loop_version
> + 	      && (TREE_CODE (step) == SSA_NAME
> + 		  || (TREE_CODE (step) == MULT_EXPR
> + 		      && TREE_CODE (TREE_OPERAND (step, 1)) == INTEGER_CST)))
> + 	    {
> + 	      tree stride;
> + 	      tree newstep;
> + 
> + 	      stride = step;
> + 	      if (TREE_CODE (step) == MULT_EXPR)
> + 		stride = TREE_OPERAND (step, 0);
> + 	      STRIP_NOPS (stride);
> + 	      if (TREE_CODE (stride) != SSA_NAME)
> + 		return false;
> + 
> + 	      bitmap_set_bit (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo),
> + 			      SSA_NAME_VERSION (stride));
> + 	      if (bitmap_count_bits (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo))
> + 		  > (unsigned)PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS))
> + 		return false;
> + 
> + 	      /* ???  Delay this change until after versioning or
> + 	         preserve the original step somewhere.  */
> + 	      newstep = build_int_cst (TREE_TYPE (step),
> + 		       PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE));
> + 	      if (TREE_CODE (step) == MULT_EXPR)
> + 		newstep = int_const_binop (MULT_EXPR, newstep,
> + 					   TREE_OPERAND (step, 1), false);
> + 	      DR_STEP (dr) = newstep;
> + 	    }
> + 	  else
> + 	    return false;
> + 	}
> + 
>         if (!DR_SYMBOL_TAG (dr))
>           {
>             if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
> Index: trunk/gcc/params.def
> ===================================================================
> *** trunk.orig/gcc/params.def	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/params.def	2009-01-23 16:48:50.000000000 +0100
> *************** DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
> *** 506,511 ****
> --- 506,521 ----
>            "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check",
>            10, 0, 0)
>   
> + DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS,
> +          "vect-max-version-for-stride-checks",
> +          "Bound on number of runtime checks inserted by the vectorizer's loop versioning for stride check",
> +          4, 0, 0)
> + 
> + DEFPARAM(PARAM_VECT_VERSION_FOR_STRIDE_VALUE,
> +          "vect-version-for-stride-value",
> +          "The constant stride in elements the vectorizer uses for loop versioning",
> +          1, 0, 0)
> + 
>   DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
>   	 "max-cselib-memory-locations",
>   	 "The maximum memory locations recorded by cselib",
> Index: trunk/gcc/tree-vectorizer.c
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.c	2009-01-23 16:48:50.000000000 +0100
> *************** slpeel_tree_duplicate_loop_to_edge_cfg (
> *** 927,934 ****
>      Returns the skip edge.  */
>   
>   static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond, basic_block exit_bb,
> ! 		       basic_block dom_bb)
>   {
>     gimple_stmt_iterator gsi;
>     edge new_e, enter_e;
> --- 927,935 ----
>      Returns the skip edge.  */
>   
>   static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond,
> ! 		       gimple_seq cond_expr_stmt_list,
> ! 		       basic_block exit_bb, basic_block dom_bb)
>   {
>     gimple_stmt_iterator gsi;
>     edge new_e, enter_e;
> *************** slpeel_add_loop_guard (basic_block guard
> *** 941,951 ****
>     gsi = gsi_last_bb (guard_bb);
>   
>     cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
>     cond_stmt = gimple_build_cond (NE_EXPR,
>   				 cond, build_int_cst (TREE_TYPE (cond), 0),
>   				 NULL_TREE, NULL_TREE);
> !   if (gimplify_stmt_list)
> !     gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
>   
>     gsi = gsi_last_bb (guard_bb);
>     gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> --- 942,954 ----
>     gsi = gsi_last_bb (guard_bb);
>   
>     cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
> +   if (gimplify_stmt_list)
> +     gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
>     cond_stmt = gimple_build_cond (NE_EXPR,
>   				 cond, build_int_cst (TREE_TYPE (cond), 0),
>   				 NULL_TREE, NULL_TREE);
> !   if (cond_expr_stmt_list)
> !     gsi_insert_seq_after (&gsi, cond_expr_stmt_list, GSI_NEW_STMT);
>   
>     gsi = gsi_last_bb (guard_bb);
>     gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> *************** struct loop*
> *** 1151,1157 ****
>   slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>   			       edge e, tree first_niters, 
>   			       tree niters, bool update_first_loop_count,
> ! 			       unsigned int th, bool check_profitability)
>   {
>     struct loop *new_loop = NULL, *first_loop, *second_loop;
>     edge skip_e;
> --- 1154,1161 ----
>   slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>   			       edge e, tree first_niters, 
>   			       tree niters, bool update_first_loop_count,
> ! 			       unsigned int th, bool check_profitability,
> ! 			       tree cond_expr, gimple_seq cond_expr_stmt_list)
>   {
>     struct loop *new_loop = NULL, *first_loop, *second_loop;
>     edge skip_e;
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1325,1330 ****
> --- 1329,1342 ----
>   	  pre_condition = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
>   				       cost_pre_condition, pre_condition);
>   	}
> +       if (cond_expr)
> + 	{
> + 	  pre_condition =
> + 	    fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> + 			 pre_condition,
> + 			 fold_build1 (TRUTH_NOT_EXPR, boolean_type_node,
> + 				      cond_expr));
> + 	}
>       }
>   
>     /* Prologue peeling.  */  
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1340,1345 ****
> --- 1352,1358 ----
>       }
>   
>     skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
> + 				  cond_expr_stmt_list,
>                                     bb_before_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
>   				      first_loop == new_loop,
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1377,1383 ****
>   
>     pre_condition = 
>   	fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> !   skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition,
>                                     bb_after_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
>                                        second_loop == new_loop, &new_exit_bb);
> --- 1390,1396 ----
>   
>     pre_condition = 
>   	fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> !   skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
>                                     bb_after_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
>                                        second_loop == new_loop, &new_exit_bb);
> *************** new_loop_vec_info (struct loop *loop)
> *** 1714,1719 ****
> --- 1727,1733 ----
>     LOOP_VINFO_MAY_ALIAS_DDRS (res) =
>       VEC_alloc (ddr_p, heap,
>   	       PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
> +   LOOP_VINFO_VARIABLE_STRIDES (res) = BITMAP_ALLOC (NULL);
>     LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
>     LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
>     LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
> *************** destroy_loop_vec_info (loop_vec_info loo
> *** 1800,1805 ****
> --- 1814,1820 ----
>     free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
>     VEC_free (gimple, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
>     VEC_free (ddr_p, heap, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
> +   BITMAP_FREE (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
>     slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
>     for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
>       vect_free_slp_instance (instance);
> Index: trunk/gcc/tree-vectorizer.h
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.h	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.h	2009-01-23 16:48:50.000000000 +0100
> *************** typedef struct _loop_vec_info {
> *** 210,215 ****
> --- 210,219 ----
>     /* All data dependences in the loop.  */
>     VEC (ddr_p, heap) *ddrs;
>   
> +   /* SSA_NAMEs representing variable strides in data references.
> +      Candidates for a run-time stride check.  */
> +   bitmap variable_strides;
> + 
>     /* Data Dependence Relations defining address ranges that are candidates
>        for a run-time aliasing check.  */
>     VEC (ddr_p, heap) *may_alias_ddrs;
> *************** typedef struct _loop_vec_info {
> *** 254,259 ****
> --- 258,264 ----
>   #define LOOP_VINFO_LOC(L)             (L)->loop_line_number
>   #define LOOP_VINFO_MAY_ALIAS_DDRS(L)  (L)->may_alias_ddrs
>   #define LOOP_VINFO_STRIDED_STORES(L)  (L)->strided_stores
> + #define LOOP_VINFO_VARIABLE_STRIDES(L) (L)->variable_strides
>   #define LOOP_VINFO_SLP_INSTANCES(L)   (L)->slp_instances
>   #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>   
> *************** extern bitmap vect_memsyms_to_rename;
> *** 707,713 ****
>      divide by the vectorization factor, and to peel the first few iterations
>      to force the alignment of data references in the loop.  */
>   extern struct loop *slpeel_tree_peel_loop_to_edge 
> !   (struct loop *, edge, tree, tree, bool, unsigned int, bool);
>   extern void set_prologue_iterations (basic_block, tree,
>   				     struct loop *, unsigned int);
>   struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> --- 712,718 ----
>      divide by the vectorization factor, and to peel the first few iterations
>      to force the alignment of data references in the loop.  */
>   extern struct loop *slpeel_tree_peel_loop_to_edge 
> !   (struct loop *, edge, tree, tree, bool, unsigned int, bool, tree, gimple_seq);
>   extern void set_prologue_iterations (basic_block, tree,
>   				     struct loop *, unsigned int);
>   struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> Index: trunk/gcc/tree-vect-transform.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-transform.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-transform.c	2009-01-23 16:48:50.000000000 +0100
> *************** static tree get_initial_def_for_reductio
> *** 65,72 ****
>   
>   /* Utility function dealing with loop peeling (not peeling itself).  */
>   static void vect_generate_tmps_on_preheader 
> !   (loop_vec_info, tree *, tree *, tree *);
> ! static tree vect_build_loop_niters (loop_vec_info);
>   static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge); 
>   static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
>   static void vect_update_init_of_dr (struct data_reference *, tree niters);
> --- 65,72 ----
>   
>   /* Utility function dealing with loop peeling (not peeling itself).  */
>   static void vect_generate_tmps_on_preheader 
> !   (loop_vec_info, tree *, tree *, tree *, gimple_seq);
> ! static tree vect_build_loop_niters (loop_vec_info, gimple_seq);
>   static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge); 
>   static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
>   static void vect_update_init_of_dr (struct data_reference *, tree niters);
> *************** vect_transform_stmt (gimple stmt, gimple
> *** 7199,7205 ****
>      on the loop preheader.  */
>   
>   static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo)
>   {
>     tree ni_name, var;
>     gimple_seq stmts = NULL;
> --- 7199,7205 ----
>      on the loop preheader.  */
>   
>   static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo, gimple_seq seq)
>   {
>     tree ni_name, var;
>     gimple_seq stmts = NULL;
> *************** vect_build_loop_niters (loop_vec_info lo
> *** 7214,7221 ****
>     pe = loop_preheader_edge (loop);
>     if (stmts)
>       {
> !       basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>         
>     return ni_name;
> --- 7214,7226 ----
>     pe = loop_preheader_edge (loop);
>     if (stmts)
>       {
> !       if (seq)
> ! 	gimple_seq_add_seq (&seq, stmts);
> !       else
> ! 	{
> ! 	  basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>         
>     return ni_name;
> *************** static void
> *** 7234,7240 ****
>   vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, 
>   				 tree *ni_name_ptr,
>   				 tree *ratio_mult_vf_name_ptr, 
> ! 				 tree *ratio_name_ptr)
>   {
>   
>     edge pe;
> --- 7239,7246 ----
>   vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, 
>   				 tree *ni_name_ptr,
>   				 tree *ratio_mult_vf_name_ptr, 
> ! 				 tree *ratio_name_ptr,
> ! 				 gimple_seq cond_expr_stmt_list)
>   {
>   
>     edge pe;
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7254,7260 ****
>     /* Generate temporary variable that contains 
>        number of iterations loop executes.  */
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo);
>     log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>   
>     /* Create: ratio = ni >> log2(vf) */
> --- 7260,7266 ----
>     /* Generate temporary variable that contains 
>        number of iterations loop executes.  */
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo, cond_expr_stmt_list);
>     log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>   
>     /* Create: ratio = ni >> log2(vf) */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7267,7275 ****
>   
>         stmts = NULL;
>         ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> !       pe = loop_preheader_edge (loop);
> !       new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>          
>     /* Create: ratio_mult_vf = ratio << log2 (vf).  */
> --- 7273,7286 ----
>   
>         stmts = NULL;
>         ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> !       if (cond_expr_stmt_list)
> ! 	gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> !       else
> ! 	{
> ! 	  pe = loop_preheader_edge (loop);
> ! 	  new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>          
>     /* Create: ratio_mult_vf = ratio << log2 (vf).  */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7284,7292 ****
>         stmts = NULL;
>         ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
>   						 true, var);
> !       pe = loop_preheader_edge (loop);
> !       new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>   
>     *ni_name_ptr = ni_name;
> --- 7295,7308 ----
>         stmts = NULL;
>         ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
>   						 true, var);
> !       if (cond_expr_stmt_list)
> ! 	gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> !       else
> ! 	{
> ! 	  pe = loop_preheader_edge (loop);
> ! 	  new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>   
>     *ni_name_ptr = ni_name;
> *************** conservative_cost_threshold (loop_vec_in
> *** 7470,7476 ****
>      NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).  */
>   
>   static void 
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
>   {
>     tree ni_name, ratio_mult_vf_name;
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> --- 7486,7493 ----
>      NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).  */
>   
>   static void 
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
> ! 				tree cond_expr, gimple_seq cond_expr_stmt_list)
>   {
>     tree ni_name, ratio_mult_vf_name;
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7493,7499 ****
>        ratio = ni_name / vf
>        ratio_mult_vf_name = ratio * vf  */
>     vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! 				   &ratio_mult_vf_name, ratio);
>   
>     loop_num  = loop->num; 
>   
> --- 7510,7517 ----
>        ratio = ni_name / vf
>        ratio_mult_vf_name = ratio * vf  */
>     vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! 				   &ratio_mult_vf_name, ratio,
> ! 				   cond_expr_stmt_list);
>   
>     loop_num  = loop->num; 
>   
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7501,7507 ****
>        peeling for alignment.  */
>     if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>         && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> !       && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       {
>         check_profitability = true;
>   
> --- 7519,7526 ----
>        peeling for alignment.  */
>     if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>         && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> !       && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)
> !       && !cond_expr)
>       {
>         check_profitability = true;
>   
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7514,7520 ****
>   
>     new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
>                                               ratio_mult_vf_name, ni_name, false,
> !                                             th, check_profitability);
>     gcc_assert (new_loop);
>     gcc_assert (loop_num == loop->num);
>   #ifdef ENABLE_CHECKING
> --- 7533,7540 ----
>   
>     new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
>                                               ratio_mult_vf_name, ni_name, false,
> !                                             th, check_profitability,
> ! 					    cond_expr, cond_expr_stmt_list);
>     gcc_assert (new_loop);
>     gcc_assert (loop_num == loop->num);
>   #ifdef ENABLE_CHECKING
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7738,7744 ****
>   
>     initialize_original_copy_tables ();
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo);
>     niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>     
>   
> --- 7758,7764 ----
>   
>     initialize_original_copy_tables ();
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo, NULL);
>     niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>     
>   
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7759,7765 ****
>     new_loop =
>       slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
>   				   niters_of_prolog_loop, ni_name, true,
> ! 				   th, check_profitability);
>   
>     gcc_assert (new_loop);
>   #ifdef ENABLE_CHECKING
> --- 7779,7785 ----
>     new_loop =
>       slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
>   				   niters_of_prolog_loop, ni_name, true,
> ! 				   th, check_profitability, NULL_TREE, NULL);
>   
>     gcc_assert (new_loop);
>   #ifdef ENABLE_CHECKING
> *************** vect_create_cond_for_align_checks (loop_
> *** 7909,7914 ****
> --- 7929,7981 ----
>       *cond_expr = part_cond_expr;
>   }
>   
> + /* Function vect_create_cond_for_stride_checks.
> + 
> +    Create a conditional expression that represents the stride checks for
> +    all of the stride SSA_NAMEs used in data references (array element
> +    references) whose stride must be checked at runtime.
> + 
> +    Input:
> +    COND_EXPR  - input conditional expression.  New conditions will be chained
> +                 with logical AND operation.
> +    LOOP_VINFO - on field of the loop information is used.
> +                 LOOP_VINFO_VARIABLE_STRIDES is a bitmap of SSA_NAMEs to be
> + 		checked.
> + 
> +    Output:
> +    COND_EXPR_STMT_LIST - statements needed to construct the conditional
> +                          expression.
> +    The returned value is the conditional expression to be used in the if
> +    statement that controls which version of the loop gets executed at runtime.
> + 
> +    The stride we do versioning for is currently specified by a compile-time
> +    param.  In future the stride should be chosen by information from
> +    profile-feedback.  */
> + 
> + static void
> + vect_create_cond_for_stride_checks (loop_vec_info loop_vinfo,
> + 				    tree *cond_expr)
> + {
> +   bitmap_iterator bi;
> +   unsigned int i;
> +   HOST_WIDE_INT stride;
> + 
> +   stride = PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE);
> + 
> +   EXECUTE_IF_SET_IN_BITMAP (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo), 0, i, bi)
> +     {
> +       tree name = ssa_name (i);
> +       tree cond = fold_build2 (EQ_EXPR, boolean_type_node,
> + 			       name,
> + 			       build_int_cst (TREE_TYPE (name), stride));
> +       if (*cond_expr)
> + 	*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> + 				  *cond_expr, cond);
> +       else
> + 	*cond_expr = cond;
> +     }
> + }
> + 
>   /* Function vect_vfa_segment_size.
>   
>      Create an expression that computes the size of segment
> *************** vect_create_cond_for_alias_checks (loop_
> *** 8076,8087 ****
>      cost model initially.  */
>   
>   static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo)
>   {
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>     struct loop *nloop;
> -   tree cond_expr = NULL_TREE;
> -   gimple_seq cond_expr_stmt_list = NULL;
>     basic_block condition_bb;
>     gimple_stmt_iterator gsi, cond_exp_gsi;
>     basic_block merge_bb;
> --- 8143,8153 ----
>      cost model initially.  */
>   
>   static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo, bool do_versioning,
> ! 		      tree *cond_expr, gimple_seq *cond_expr_stmt_list)
>   {
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>     struct loop *nloop;
>     basic_block condition_bb;
>     gimple_stmt_iterator gsi, cond_exp_gsi;
>     basic_block merge_bb;
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8101,8129 ****
>     th = conservative_cost_threshold (loop_vinfo,
>   				    min_profitable_iters);
>   
> !   cond_expr =
> !     build2 (GT_EXPR, boolean_type_node, scalar_loop_iters, 
> ! 	    build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>   
> !   cond_expr = force_gimple_operand (cond_expr, &cond_expr_stmt_list,
> ! 				    false, NULL_TREE);
>   
>     if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> !       vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> ! 					 &cond_expr_stmt_list);
>   
>     if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> !     vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr, 
> ! 				       &cond_expr_stmt_list);
>   
> !   cond_expr =
> !     fold_build2 (NE_EXPR, boolean_type_node, cond_expr, integer_zero_node);
> !   cond_expr =
> !     force_gimple_operand (cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> !   gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
>   
>     initialize_original_copy_tables ();
> !   nloop = loop_version (loop, cond_expr, &condition_bb,
>   			prob, prob, REG_BR_PROB_BASE - prob, true);
>     free_original_copy_tables();
>   
> --- 8167,8200 ----
>     th = conservative_cost_threshold (loop_vinfo,
>   				    min_profitable_iters);
>   
> !   *cond_expr =
> !     fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> ! 		 build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>   
> !   *cond_expr = force_gimple_operand (*cond_expr, cond_expr_stmt_list,
> ! 				     false, NULL_TREE);
>   
>     if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> !       vect_create_cond_for_align_checks (loop_vinfo, cond_expr,
> ! 					 cond_expr_stmt_list);
>   
>     if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> !     vect_create_cond_for_alias_checks (loop_vinfo, cond_expr,
> ! 				       cond_expr_stmt_list);
>   
> !   if (!bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> !     vect_create_cond_for_stride_checks (loop_vinfo, cond_expr);
> ! 
> !   *cond_expr =
> !     fold_build2 (NE_EXPR, boolean_type_node, *cond_expr, integer_zero_node);
> !   *cond_expr =
> !     force_gimple_operand (*cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> !   gimple_seq_add_seq (cond_expr_stmt_list, gimplify_stmt_list);
> !   if (!do_versioning)
> !     return;
>   
>     initialize_original_copy_tables ();
> !   nloop = loop_version (loop, *cond_expr, &condition_bb,
>   			prob, prob, REG_BR_PROB_BASE - prob, true);
>     free_original_copy_tables();
>   
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8154,8164 ****
>     /* End loop-exit-fixes after versioning.  */
>   
>     update_ssa (TODO_update_ssa);
> !   if (cond_expr_stmt_list)
>       {
>         cond_exp_gsi = gsi_last_bb (condition_bb);
> !       gsi_insert_seq_before (&cond_exp_gsi, cond_expr_stmt_list, GSI_SAME_STMT);
>       }
>   }
>   
>   /* Remove a group of stores (for SLP or interleaving), free their 
> --- 8225,8238 ----
>     /* End loop-exit-fixes after versioning.  */
>   
>     update_ssa (TODO_update_ssa);
> !   if (*cond_expr_stmt_list)
>       {
>         cond_exp_gsi = gsi_last_bb (condition_bb);
> !       gsi_insert_seq_before (&cond_exp_gsi, *cond_expr_stmt_list,
> ! 			     GSI_SAME_STMT);
> !       *cond_expr_stmt_list = NULL;
>       }
> +   *cond_expr = NULL_TREE;
>   }
>   
>   /* Remove a group of stores (for SLP or interleaving), free their 
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8320,8342 ****
>     bool strided_store;
>     bool slp_scheduled = false;
>     unsigned int nunits;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vec_transform_loop ===");
>   
> -   if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> -       || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> -     vect_loop_versioning (loop_vinfo);
> - 
> -   /* CHECKME: we wouldn't need this if we called update_ssa once
> -      for all loops.  */
> -   bitmap_zero (vect_memsyms_to_rename);
> - 
>     /* Peel the loop if there are data refs with unknown alignment.
>        Only one data ref with unknown store is allowed.  */
>   
>     if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       vect_do_peeling_for_alignment (loop_vinfo);
>     
>     /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
>        compile time constant), or it is a constant that doesn't divide by the
> --- 8394,8427 ----
>     bool strided_store;
>     bool slp_scheduled = false;
>     unsigned int nunits;
> +   tree cond_expr = NULL_TREE;
> +   gimple_seq cond_expr_stmt_list = NULL;
> +   bool do_peeling_for_loop_bound;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vec_transform_loop ===");
>   
>     /* Peel the loop if there are data refs with unknown alignment.
>        Only one data ref with unknown store is allowed.  */
>   
>     if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       vect_do_peeling_for_alignment (loop_vinfo);
> + 
> +   do_peeling_for_loop_bound
> +     = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +        || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + 	   && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0));
> + 
> +   if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> +       || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> +       || !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> +     vect_loop_versioning (loop_vinfo,
> + 			  !do_peeling_for_loop_bound,
> + 			  &cond_expr, &cond_expr_stmt_list);
> + 
> +   /* CHECKME: we wouldn't need this if we called update_ssa once
> +      for all loops.  */
> +   bitmap_zero (vect_memsyms_to_rename);
>     
>     /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
>        compile time constant), or it is a constant that doesn't divide by the
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8346,8355 ****
>        will remain scalar and will compute the remaining (n%VF) iterations.
>        (VF is the vectorization factor).  */
>   
> !   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> !       || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> !           && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0))
> !     vect_do_peeling_for_loop_bound (loop_vinfo, &ratio);
>     else
>       ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
>   		LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> --- 8431,8439 ----
>        will remain scalar and will compute the remaining (n%VF) iterations.
>        (VF is the vectorization factor).  */
>   
> !   if (do_peeling_for_loop_bound)
> !     vect_do_peeling_for_loop_bound (loop_vinfo, &ratio,
> ! 				    cond_expr, cond_expr_stmt_list);
>     else
>       ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
>   		LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,17 ----
> + ! { dg-do compile }
> + 
> + subroutine to_product_of(self,a,b)
> +   real(kind=8), dimension(:,:) :: self
> +   real(kind=8), dimension(:,:), intent(in) :: a, b
> +   integer(kind=kind(1)) :: dim1, dim2
> +   dim1 = size(self,1)
> +   dim2 = size(self,2)
> +   do i = 1,dim1
> +     do j = 1,dim2
> +       self(i,j) = sum(a(i,:)*b(:,j))
> +     end do
> +   end do
> + end subroutine
> + 
> + ! { dg-final { scan-tree-dump "vectorized 1 loop" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/Makefile.in
> ===================================================================
> *** trunk.orig/gcc/Makefile.in	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/Makefile.in	2009-01-23 16:48:50.000000000 +0100
> *************** tree-nrv.o : tree-nrv.c $(CONFIG_H) $(SY
> *** 2135,2141 ****
>   tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>      $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
>      $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> !    $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H)
>   tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
>      $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
>      $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> --- 2135,2142 ----
>   tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>      $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
>      $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> !    $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H) \
> !    $(CFGLOOP_H)
>   tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
>      $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
>      $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> Index: trunk/gcc/tree-ssa-copy.c
> ===================================================================
> *** trunk.orig/gcc/tree-ssa-copy.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-ssa-copy.c	2009-01-23 16:48:50.000000000 +0100
> *************** along with GCC; see the file COPYING3.
> *** 37,42 ****
> --- 37,43 ----
>   #include "tree-pass.h"
>   #include "tree-ssa-propagate.h"
>   #include "langhooks.h"
> + #include "cfgloop.h"
>   
>   /* This file implements the copy propagation pass and provides a
>      handful of interfaces for performing const/copy propagation and
> *************** init_copy_prop (void)
> *** 991,997 ****
>             tree def;
>   
>   	  def = gimple_phi_result (phi);
> ! 	  if (!is_gimple_reg (def))
>               prop_set_simulate_again (phi, false);
>   	  else
>               prop_set_simulate_again (phi, true);
> --- 992,1004 ----
>             tree def;
>   
>   	  def = gimple_phi_result (phi);
> ! 	  if (!is_gimple_reg (def)
> ! 	      /* In loop-closed SSA form do not copy-propagate through
> ! 	         PHI nodes.  Technically this is only needed for loop
> ! 		 exit PHIs, but this is difficult to query.  */
> ! 	      || (current_loops
> ! 		  && gimple_phi_num_args (phi) == 1
> ! 		  && loops_state_satisfies_p (LOOP_CLOSED_SSA)))
>               prop_set_simulate_again (phi, false);
>   	  else
>               prop_set_simulate_again (phi, true);

Richard,
    Do you have an updated version of this patch which would apply against
current gcc trunk? Also, did this ever go into any of the branches?
              Jack



More information about the Gcc-patches mailing list