This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RFC] Add versioning for constant strides for vectorization


On Fri, Jan 23, 2009 at 05:08:43PM +0100, Richard Guenther wrote:
> 
> This patch adds the capability to the vectorizer to perform versioning
> for the case of a constant (suitable) stride.  For example for
> 
> subroutine to_product_of(self,a,b,a1,a2)
>   complex(kind=8) :: self (:)
>   complex(kind=8), intent(in) :: a(:,:)
>   complex(kind=8), intent(in) :: b(:)
>   integer a1,a2
>   do i = 1,a1
>     do j = 1,a2
>       self(i) = self(i) + a(j,i)*b(j)
>     end do
>   end do
> end subroutine
> 
> we can only apply vectorization if the strides of the fastest dimension
> of self, a and b are one (they are loaded from the passed array
> descriptors and thus appear as (loop invariant) variables).
> 
> During the implementation of this I noticed that peeling for
> number of iterations (we have to unroll the above loop twice, and so
> for an odd number of iterations have a epilogue loop for the remaining
> iteration(s)) does not play well with versioning and we end up
> vectorizing the wrong loop.  So I just disabled versioning if we
> apply peeling with an epilogue loop and instead attach the versioning
> condition to the pre-condition of the main loop that skips directly
> to the epilogue if the number of iterations is too small.  We obviously
> can use the epilogue loop as the non-vectorized version.
> 
> This patch also inserts an extra copyprop and dce pass before the
> vectorizer so it can recognize the reduction in the above testcase
> (LIM has made that reduction non-obvious).  So I noticed that
> copyprop does not preserve loop-closed SSA form and fixed that as well.
> 
> Some earlier version bootstrapped and tested ok on 
> x86_64-unknown-linux-gnu, a final attempt is still running.
> 
> I didn't yet performance test this extensively, but it might need
> cost-model adjustments and/or need to wait until we have profile
> feedback to properly seed vectorizer analysis here.  A micro-benchmark
> based on the above loop shows around 15% improvement on AMD K10.
> 
> Feedback (and ppc testing) is still welcome of course.
> 
> Thanks,
> Richard.
> 
> 2009-01-23  Richard Guenther  <rguenther@suse.de>
> 
> 	* passes.c (init_optimization_passes): Add copy-prop and dce
> 	before vectorization.
> 	* Makefile.in (tree-ssa-copy.o): Add $(CFGLOOP_H) dependency.
> 	* tree-ssa-copy.c (init_copy_prop): Do not propagate through
> 	single-argument PHIs if we are in loop-closed SSA form.
> 	* tree-data-ref.c (dr_analyze_innermost): Allow affine offsets.
> 	* tree-vect-analyze.c (vect_check_interleaving): Check that
> 	DR_STEP is constant.
> 	(vect_enhance_data_refs_alignment): If versioning for strides
> 	is required do not peel.
> 	(vect_analyze_data_ref_access): Allow non-constant step of
> 	a specific form, remember them for versioning.
> 	* params.def (vect-max-version-for-stride-checks): New param.
> 	(vect-version-for-stride-value): Likewise.
> 	* tree-vectorizer.c (slpeel_add_loop_guard): Pass extra guards
> 	for the pre-condition.
> 	(slpeel_tree_peel_loop_to_edge): Likewise.
> 	(new_loop_vec_info): Allocate stride versioning data.
> 	(destroy_loop_vec_info): Free stride versioning data.
> 	* tree-vectorizer.h (struct _loop_vec_info): Add variable_strides
> 	field.
> 	(LOOP_VINFO_VARIABLE_STRIDES): Define.
> 	(slpeel_tree_peel_loop_to_edge): Adjust declaration.
> 	* tree-vect-transform.c (vect_build_loop_niters): Take an
> 	optional sequence to append stmts.
> 	(vect_generate_tmps_on_preheader): Likewise.
> 	(vect_do_peeling_for_loop_bound): Take extra guards for the
> 	pre-condition.
> 	(vect_do_peeling_for_alignment): Adjust.
> 	(vect_create_cond_for_stride_checks): New function.
> 	(vect_loop_versioning): Take stmt and stmt list to put pre-condition
> 	guards if we are going to peel.  Do not apply versioning in that
> 	case.
> 	(vect_transform_loop): If we are peeling for loop bound only
> 	record extra pre-conditions, do not apply loop versioning.
> 
> 	* gcc.dg/vect/fast-math-vect-complex-5.c: New testcase.
> 	* gfortran.dg/vect/fast-math-vect-complex-1.f90: Likewise.
> 	* gfortran.dg/vect/fast-math-vect-stride-1.f90: Likewise.
> 
> Index: trunk/gcc/passes.c
> ===================================================================
> *** trunk.orig/gcc/passes.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/passes.c	2009-01-23 16:48:50.000000000 +0100
> *************** init_optimization_passes (void)
> *** 659,664 ****
> --- 659,666 ----
>   	  NEXT_PASS (pass_graphite_transforms);
>   	  NEXT_PASS (pass_iv_canon);
>   	  NEXT_PASS (pass_if_conversion);
> + 	  NEXT_PASS (pass_copy_prop);
> + 	  NEXT_PASS (pass_dce_loop);
>   	  NEXT_PASS (pass_vectorize);
>   	    {
>   	      struct opt_pass **p = &pass_vectorize.pass.sub;
> Index: trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do compile } */
> + /* { dg-require-effective-target vect_double } */
> + 
> + #define NUM 64
> + _Complex double ad[NUM], bd[NUM], cd[NUM];
> + 
> + void testd(void)
> + {
> +   int i;
> +   int j;
> + 
> +   for (i = 0; i < NUM; i++)
> +     for (j = 0; j < NUM; j++)
> +       cd[i] = cd[i] + ad[j] * bd[j];
> + }
> + 
> + /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> + /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,16 ----
> + ! { dg-do compile }
> + 
> + subroutine to_product_of(self,a,b,a1,a2)
> +   complex(kind=8) :: self (:)
> +   complex(kind=8), intent(in) :: a(:,:)
> +   complex(kind=8), intent(in) :: b(:)
> +   integer a1,a2
> +   do i = 1,a1
> +     do j = 1,a2
> +       self(i) = self(i) + a(i,j)*b(j)
> +     end do
> +   end do
> + end subroutine
> + 
> + ! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/tree-data-ref.c
> ===================================================================
> *** trunk.orig/gcc/tree-data-ref.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-data-ref.c	2009-01-23 16:48:50.000000000 +0100
> *************** dr_analyze_innermost (struct data_refere
> *** 708,714 ****
>         offset_iv.base = ssize_int (0);
>         offset_iv.step = ssize_int (0);
>       }
> !   else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>   	fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> --- 708,714 ----
>         offset_iv.base = ssize_int (0);
>         offset_iv.step = ssize_int (0);
>       }
> !   else if (!simple_iv (loop, stmt, poffset, &offset_iv, true))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>   	fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> Index: trunk/gcc/tree-vect-analyze.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-analyze.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-analyze.c	2009-01-23 16:48:50.000000000 +0100
> *************** vect_check_interleaving (struct data_ref
> *** 1109,1114 ****
> --- 1109,1116 ----
>     type_size_b = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb))));
>   
>     if (type_size_a != type_size_b
> +       || TREE_CODE (DR_STEP (dra)) != INTEGER_CST
> +       || TREE_CODE (DR_STEP (drb)) != INTEGER_CST
>         || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb))
>         || !types_compatible_p (TREE_TYPE (DR_REF (dra)), 
>                                 TREE_TYPE (DR_REF (drb))))
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1825,1830 ****
> --- 1827,1833 ----
>     gimple stmt;
>     stmt_vec_info stmt_info;
>     int vect_versioning_for_alias_required;
> +   int vect_versioning_for_strides_required;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1892,1904 ****
>   
>     vect_versioning_for_alias_required =
>       (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
>   
>     /* Temporarily, if versioning for alias is required, we disable peeling
>        until we support peeling and versioning.  Often peeling for alignment
>        will require peeling for loop-bound, which in turn requires that we
>        know how to adjust the loop ivs after the loop.  */
>     if (vect_versioning_for_alias_required
> !        || !vect_can_advance_ivs_p (loop_vinfo)
>         || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>       do_peeling = false;
>   
> --- 1895,1910 ----
>   
>     vect_versioning_for_alias_required =
>       (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
> +   vect_versioning_for_strides_required =
> +     !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
>   
>     /* Temporarily, if versioning for alias is required, we disable peeling
>        until we support peeling and versioning.  Often peeling for alignment
>        will require peeling for loop-bound, which in turn requires that we
>        know how to adjust the loop ivs after the loop.  */
>     if (vect_versioning_for_alias_required
> !       || vect_versioning_for_strides_required
> !       || !vect_can_advance_ivs_p (loop_vinfo)
>         || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>       do_peeling = false;
>   
> *************** vect_analyze_data_ref_access (struct dat
> *** 2349,2357 ****
>     stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> !   HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
>   
> !   if (!step)
>       {
>         if (vect_print_dump_info (REPORT_DETAILS))
>   	fprintf (vect_dump, "bad data-ref access");
> --- 2355,2364 ----
>     stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>     loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> !   HOST_WIDE_INT dr_step;
>   
> !   if (!step
> !       || TREE_CODE (step) != INTEGER_CST)
>       {
>         if (vect_print_dump_info (REPORT_DETAILS))
>   	fprintf (vect_dump, "bad data-ref access");
> *************** vect_analyze_data_ref_access (struct dat
> *** 2359,2364 ****
> --- 2366,2372 ----
>       }
>   
>     /* Don't allow invariant accesses.  */
> +   dr_step = TREE_INT_CST_LOW (step);
>     if (dr_step == 0)
>       return false; 
>   
> *************** vect_analyze_data_refs (loop_vec_info lo
> *** 3563,3568 ****
> --- 3571,3620 ----
>             return false;
>           }
>   
> +       /* If the non-constant (but loop invariant) step is of the
> + 	 form NAME or NAME * CST where CST is the element size mark
> + 	 this ddr for versioning for strides and re-set DR_STEP
> + 	 to the value we will version for.  Otherwise reject
> + 	 non-constant steps.  */
> +       if (TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
> + 	{
> + 	  tree step = DR_STEP (dr);
> + 
> + 	  STRIP_NOPS (step);
> + 	  if (flag_tree_vect_loop_version
> + 	      && (TREE_CODE (step) == SSA_NAME
> + 		  || (TREE_CODE (step) == MULT_EXPR
> + 		      && TREE_CODE (TREE_OPERAND (step, 1)) == INTEGER_CST)))
> + 	    {
> + 	      tree stride;
> + 	      tree newstep;
> + 
> + 	      stride = step;
> + 	      if (TREE_CODE (step) == MULT_EXPR)
> + 		stride = TREE_OPERAND (step, 0);
> + 	      STRIP_NOPS (stride);
> + 	      if (TREE_CODE (stride) != SSA_NAME)
> + 		return false;
> + 
> + 	      bitmap_set_bit (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo),
> + 			      SSA_NAME_VERSION (stride));
> + 	      if (bitmap_count_bits (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo))
> + 		  > (unsigned)PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS))
> + 		return false;
> + 
> + 	      /* ???  Delay this change until after versioning or
> + 	         preserve the original step somewhere.  */
> + 	      newstep = build_int_cst (TREE_TYPE (step),
> + 		       PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE));
> + 	      if (TREE_CODE (step) == MULT_EXPR)
> + 		newstep = int_const_binop (MULT_EXPR, newstep,
> + 					   TREE_OPERAND (step, 1), false);
> + 	      DR_STEP (dr) = newstep;
> + 	    }
> + 	  else
> + 	    return false;
> + 	}
> + 
>         if (!DR_SYMBOL_TAG (dr))
>           {
>             if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
> Index: trunk/gcc/params.def
> ===================================================================
> *** trunk.orig/gcc/params.def	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/params.def	2009-01-23 16:48:50.000000000 +0100
> *************** DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
> *** 506,511 ****
> --- 506,521 ----
>            "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check",
>            10, 0, 0)
>   
> + DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS,
> +          "vect-max-version-for-stride-checks",
> +          "Bound on number of runtime checks inserted by the vectorizer's loop versioning for stride check",
> +          4, 0, 0)
> + 
> + DEFPARAM(PARAM_VECT_VERSION_FOR_STRIDE_VALUE,
> +          "vect-version-for-stride-value",
> +          "The constant stride in elements the vectorizer uses for loop versioning",
> +          1, 0, 0)
> + 
>   DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
>   	 "max-cselib-memory-locations",
>   	 "The maximum memory locations recorded by cselib",
> Index: trunk/gcc/tree-vectorizer.c
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.c	2009-01-23 16:48:50.000000000 +0100
> *************** slpeel_tree_duplicate_loop_to_edge_cfg (
> *** 927,934 ****
>      Returns the skip edge.  */
>   
>   static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond, basic_block exit_bb,
> ! 		       basic_block dom_bb)
>   {
>     gimple_stmt_iterator gsi;
>     edge new_e, enter_e;
> --- 927,935 ----
>      Returns the skip edge.  */
>   
>   static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond,
> ! 		       gimple_seq cond_expr_stmt_list,
> ! 		       basic_block exit_bb, basic_block dom_bb)
>   {
>     gimple_stmt_iterator gsi;
>     edge new_e, enter_e;
> *************** slpeel_add_loop_guard (basic_block guard
> *** 941,951 ****
>     gsi = gsi_last_bb (guard_bb);
>   
>     cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
>     cond_stmt = gimple_build_cond (NE_EXPR,
>   				 cond, build_int_cst (TREE_TYPE (cond), 0),
>   				 NULL_TREE, NULL_TREE);
> !   if (gimplify_stmt_list)
> !     gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
>   
>     gsi = gsi_last_bb (guard_bb);
>     gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> --- 942,954 ----
>     gsi = gsi_last_bb (guard_bb);
>   
>     cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
> +   if (gimplify_stmt_list)
> +     gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
>     cond_stmt = gimple_build_cond (NE_EXPR,
>   				 cond, build_int_cst (TREE_TYPE (cond), 0),
>   				 NULL_TREE, NULL_TREE);
> !   if (cond_expr_stmt_list)
> !     gsi_insert_seq_after (&gsi, cond_expr_stmt_list, GSI_NEW_STMT);
>   
>     gsi = gsi_last_bb (guard_bb);
>     gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> *************** struct loop*
> *** 1151,1157 ****
>   slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>   			       edge e, tree first_niters, 
>   			       tree niters, bool update_first_loop_count,
> ! 			       unsigned int th, bool check_profitability)
>   {
>     struct loop *new_loop = NULL, *first_loop, *second_loop;
>     edge skip_e;
> --- 1154,1161 ----
>   slpeel_tree_peel_loop_to_edge (struct loop *loop, 
>   			       edge e, tree first_niters, 
>   			       tree niters, bool update_first_loop_count,
> ! 			       unsigned int th, bool check_profitability,
> ! 			       tree cond_expr, gimple_seq cond_expr_stmt_list)
>   {
>     struct loop *new_loop = NULL, *first_loop, *second_loop;
>     edge skip_e;
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1325,1330 ****
> --- 1329,1342 ----
>   	  pre_condition = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
>   				       cost_pre_condition, pre_condition);
>   	}
> +       if (cond_expr)
> + 	{
> + 	  pre_condition =
> + 	    fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> + 			 pre_condition,
> + 			 fold_build1 (TRUTH_NOT_EXPR, boolean_type_node,
> + 				      cond_expr));
> + 	}
>       }
>   
>     /* Prologue peeling.  */  
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1340,1345 ****
> --- 1352,1358 ----
>       }
>   
>     skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
> + 				  cond_expr_stmt_list,
>                                     bb_before_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
>   				      first_loop == new_loop,
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1377,1383 ****
>   
>     pre_condition = 
>   	fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> !   skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition,
>                                     bb_after_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
>                                        second_loop == new_loop, &new_exit_bb);
> --- 1390,1396 ----
>   
>     pre_condition = 
>   	fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> !   skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
>                                     bb_after_second_loop, bb_before_first_loop);
>     slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
>                                        second_loop == new_loop, &new_exit_bb);
> *************** new_loop_vec_info (struct loop *loop)
> *** 1714,1719 ****
> --- 1727,1733 ----
>     LOOP_VINFO_MAY_ALIAS_DDRS (res) =
>       VEC_alloc (ddr_p, heap,
>   	       PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
> +   LOOP_VINFO_VARIABLE_STRIDES (res) = BITMAP_ALLOC (NULL);
>     LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
>     LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
>     LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
> *************** destroy_loop_vec_info (loop_vec_info loo
> *** 1800,1805 ****
> --- 1814,1820 ----
>     free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
>     VEC_free (gimple, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
>     VEC_free (ddr_p, heap, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
> +   BITMAP_FREE (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
>     slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
>     for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
>       vect_free_slp_instance (instance);
> Index: trunk/gcc/tree-vectorizer.h
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.h	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.h	2009-01-23 16:48:50.000000000 +0100
> *************** typedef struct _loop_vec_info {
> *** 210,215 ****
> --- 210,219 ----
>     /* All data dependences in the loop.  */
>     VEC (ddr_p, heap) *ddrs;
>   
> +   /* SSA_NAMEs representing variable strides in data references.
> +      Candidates for a run-time stride check.  */
> +   bitmap variable_strides;
> + 
>     /* Data Dependence Relations defining address ranges that are candidates
>        for a run-time aliasing check.  */
>     VEC (ddr_p, heap) *may_alias_ddrs;
> *************** typedef struct _loop_vec_info {
> *** 254,259 ****
> --- 258,264 ----
>   #define LOOP_VINFO_LOC(L)             (L)->loop_line_number
>   #define LOOP_VINFO_MAY_ALIAS_DDRS(L)  (L)->may_alias_ddrs
>   #define LOOP_VINFO_STRIDED_STORES(L)  (L)->strided_stores
> + #define LOOP_VINFO_VARIABLE_STRIDES(L) (L)->variable_strides
>   #define LOOP_VINFO_SLP_INSTANCES(L)   (L)->slp_instances
>   #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>   
> *************** extern bitmap vect_memsyms_to_rename;
> *** 707,713 ****
>      divide by the vectorization factor, and to peel the first few iterations
>      to force the alignment of data references in the loop.  */
>   extern struct loop *slpeel_tree_peel_loop_to_edge 
> !   (struct loop *, edge, tree, tree, bool, unsigned int, bool);
>   extern void set_prologue_iterations (basic_block, tree,
>   				     struct loop *, unsigned int);
>   struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> --- 712,718 ----
>      divide by the vectorization factor, and to peel the first few iterations
>      to force the alignment of data references in the loop.  */
>   extern struct loop *slpeel_tree_peel_loop_to_edge 
> !   (struct loop *, edge, tree, tree, bool, unsigned int, bool, tree, gimple_seq);
>   extern void set_prologue_iterations (basic_block, tree,
>   				     struct loop *, unsigned int);
>   struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> Index: trunk/gcc/tree-vect-transform.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-transform.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-transform.c	2009-01-23 16:48:50.000000000 +0100
> *************** static tree get_initial_def_for_reductio
> *** 65,72 ****
>   
>   /* Utility function dealing with loop peeling (not peeling itself).  */
>   static void vect_generate_tmps_on_preheader 
> !   (loop_vec_info, tree *, tree *, tree *);
> ! static tree vect_build_loop_niters (loop_vec_info);
>   static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge); 
>   static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
>   static void vect_update_init_of_dr (struct data_reference *, tree niters);
> --- 65,72 ----
>   
>   /* Utility function dealing with loop peeling (not peeling itself).  */
>   static void vect_generate_tmps_on_preheader 
> !   (loop_vec_info, tree *, tree *, tree *, gimple_seq);
> ! static tree vect_build_loop_niters (loop_vec_info, gimple_seq);
>   static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge); 
>   static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
>   static void vect_update_init_of_dr (struct data_reference *, tree niters);
> *************** vect_transform_stmt (gimple stmt, gimple
> *** 7199,7205 ****
>      on the loop preheader.  */
>   
>   static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo)
>   {
>     tree ni_name, var;
>     gimple_seq stmts = NULL;
> --- 7199,7205 ----
>      on the loop preheader.  */
>   
>   static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo, gimple_seq seq)
>   {
>     tree ni_name, var;
>     gimple_seq stmts = NULL;
> *************** vect_build_loop_niters (loop_vec_info lo
> *** 7214,7221 ****
>     pe = loop_preheader_edge (loop);
>     if (stmts)
>       {
> !       basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>         
>     return ni_name;
> --- 7214,7226 ----
>     pe = loop_preheader_edge (loop);
>     if (stmts)
>       {
> !       if (seq)
> ! 	gimple_seq_add_seq (&seq, stmts);
> !       else
> ! 	{
> ! 	  basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>         
>     return ni_name;
> *************** static void
> *** 7234,7240 ****
>   vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, 
>   				 tree *ni_name_ptr,
>   				 tree *ratio_mult_vf_name_ptr, 
> ! 				 tree *ratio_name_ptr)
>   {
>   
>     edge pe;
> --- 7239,7246 ----
>   vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, 
>   				 tree *ni_name_ptr,
>   				 tree *ratio_mult_vf_name_ptr, 
> ! 				 tree *ratio_name_ptr,
> ! 				 gimple_seq cond_expr_stmt_list)
>   {
>   
>     edge pe;
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7254,7260 ****
>     /* Generate temporary variable that contains 
>        number of iterations loop executes.  */
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo);
>     log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>   
>     /* Create: ratio = ni >> log2(vf) */
> --- 7260,7266 ----
>     /* Generate temporary variable that contains 
>        number of iterations loop executes.  */
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo, cond_expr_stmt_list);
>     log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>   
>     /* Create: ratio = ni >> log2(vf) */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7267,7275 ****
>   
>         stmts = NULL;
>         ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> !       pe = loop_preheader_edge (loop);
> !       new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>          
>     /* Create: ratio_mult_vf = ratio << log2 (vf).  */
> --- 7273,7286 ----
>   
>         stmts = NULL;
>         ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> !       if (cond_expr_stmt_list)
> ! 	gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> !       else
> ! 	{
> ! 	  pe = loop_preheader_edge (loop);
> ! 	  new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>          
>     /* Create: ratio_mult_vf = ratio << log2 (vf).  */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7284,7292 ****
>         stmts = NULL;
>         ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
>   						 true, var);
> !       pe = loop_preheader_edge (loop);
> !       new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> !       gcc_assert (!new_bb);
>       }
>   
>     *ni_name_ptr = ni_name;
> --- 7295,7308 ----
>         stmts = NULL;
>         ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
>   						 true, var);
> !       if (cond_expr_stmt_list)
> ! 	gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> !       else
> ! 	{
> ! 	  pe = loop_preheader_edge (loop);
> ! 	  new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! 	  gcc_assert (!new_bb);
> ! 	}
>       }
>   
>     *ni_name_ptr = ni_name;
> *************** conservative_cost_threshold (loop_vec_in
> *** 7470,7476 ****
>      NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).  */
>   
>   static void 
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
>   {
>     tree ni_name, ratio_mult_vf_name;
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> --- 7486,7493 ----
>      NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).  */
>   
>   static void 
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
> ! 				tree cond_expr, gimple_seq cond_expr_stmt_list)
>   {
>     tree ni_name, ratio_mult_vf_name;
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7493,7499 ****
>        ratio = ni_name / vf
>        ratio_mult_vf_name = ratio * vf  */
>     vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! 				   &ratio_mult_vf_name, ratio);
>   
>     loop_num  = loop->num; 
>   
> --- 7510,7517 ----
>        ratio = ni_name / vf
>        ratio_mult_vf_name = ratio * vf  */
>     vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! 				   &ratio_mult_vf_name, ratio,
> ! 				   cond_expr_stmt_list);
>   
>     loop_num  = loop->num; 
>   
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7501,7507 ****
>        peeling for alignment.  */
>     if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>         && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> !       && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       {
>         check_profitability = true;
>   
> --- 7519,7526 ----
>        peeling for alignment.  */
>     if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>         && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> !       && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)
> !       && !cond_expr)
>       {
>         check_profitability = true;
>   
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7514,7520 ****
>   
>     new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
>                                               ratio_mult_vf_name, ni_name, false,
> !                                             th, check_profitability);
>     gcc_assert (new_loop);
>     gcc_assert (loop_num == loop->num);
>   #ifdef ENABLE_CHECKING
> --- 7533,7540 ----
>   
>     new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
>                                               ratio_mult_vf_name, ni_name, false,
> !                                             th, check_profitability,
> ! 					    cond_expr, cond_expr_stmt_list);
>     gcc_assert (new_loop);
>     gcc_assert (loop_num == loop->num);
>   #ifdef ENABLE_CHECKING
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7738,7744 ****
>   
>     initialize_original_copy_tables ();
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo);
>     niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>     
>   
> --- 7758,7764 ----
>   
>     initialize_original_copy_tables ();
>   
> !   ni_name = vect_build_loop_niters (loop_vinfo, NULL);
>     niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>     
>   
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7759,7765 ****
>     new_loop =
>       slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
>   				   niters_of_prolog_loop, ni_name, true,
> ! 				   th, check_profitability);
>   
>     gcc_assert (new_loop);
>   #ifdef ENABLE_CHECKING
> --- 7779,7785 ----
>     new_loop =
>       slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
>   				   niters_of_prolog_loop, ni_name, true,
> ! 				   th, check_profitability, NULL_TREE, NULL);
>   
>     gcc_assert (new_loop);
>   #ifdef ENABLE_CHECKING
> *************** vect_create_cond_for_align_checks (loop_
> *** 7909,7914 ****
> --- 7929,7981 ----
>       *cond_expr = part_cond_expr;
>   }
>   
> + /* Function vect_create_cond_for_stride_checks.
> + 
> +    Create a conditional expression that represents the stride checks for
> +    all of the stride SSA_NAMEs used in data references (array element
> +    references) whose stride must be checked at runtime.
> + 
> +    Input:
> +    COND_EXPR  - input conditional expression.  New conditions will be chained
> +                 with logical AND operation.
> +    LOOP_VINFO - on field of the loop information is used.
> +                 LOOP_VINFO_VARIABLE_STRIDES is a bitmap of SSA_NAMEs to be
> + 		checked.
> + 
> +    Output:
> +    COND_EXPR_STMT_LIST - statements needed to construct the conditional
> +                          expression.
> +    The returned value is the conditional expression to be used in the if
> +    statement that controls which version of the loop gets executed at runtime.
> + 
> +    The stride we do versioning for is currently specified by a compile-time
> +    param.  In future the stride should be chosen by information from
> +    profile-feedback.  */
> + 
> + static void
> + vect_create_cond_for_stride_checks (loop_vec_info loop_vinfo,
> + 				    tree *cond_expr)
> + {
> +   bitmap_iterator bi;
> +   unsigned int i;
> +   HOST_WIDE_INT stride;
> + 
> +   stride = PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE);
> + 
> +   EXECUTE_IF_SET_IN_BITMAP (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo), 0, i, bi)
> +     {
> +       tree name = ssa_name (i);
> +       tree cond = fold_build2 (EQ_EXPR, boolean_type_node,
> + 			       name,
> + 			       build_int_cst (TREE_TYPE (name), stride));
> +       if (*cond_expr)
> + 	*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> + 				  *cond_expr, cond);
> +       else
> + 	*cond_expr = cond;
> +     }
> + }
> + 
>   /* Function vect_vfa_segment_size.
>   
>      Create an expression that computes the size of segment
> *************** vect_create_cond_for_alias_checks (loop_
> *** 8076,8087 ****
>      cost model initially.  */
>   
>   static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo)
>   {
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>     struct loop *nloop;
> -   tree cond_expr = NULL_TREE;
> -   gimple_seq cond_expr_stmt_list = NULL;
>     basic_block condition_bb;
>     gimple_stmt_iterator gsi, cond_exp_gsi;
>     basic_block merge_bb;
> --- 8143,8153 ----
>      cost model initially.  */
>   
>   static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo, bool do_versioning,
> ! 		      tree *cond_expr, gimple_seq *cond_expr_stmt_list)
>   {
>     struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>     struct loop *nloop;
>     basic_block condition_bb;
>     gimple_stmt_iterator gsi, cond_exp_gsi;
>     basic_block merge_bb;
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8101,8129 ****
>     th = conservative_cost_threshold (loop_vinfo,
>   				    min_profitable_iters);
>   
> !   cond_expr =
> !     build2 (GT_EXPR, boolean_type_node, scalar_loop_iters, 
> ! 	    build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>   
> !   cond_expr = force_gimple_operand (cond_expr, &cond_expr_stmt_list,
> ! 				    false, NULL_TREE);
>   
>     if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> !       vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> ! 					 &cond_expr_stmt_list);
>   
>     if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> !     vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr, 
> ! 				       &cond_expr_stmt_list);
>   
> !   cond_expr =
> !     fold_build2 (NE_EXPR, boolean_type_node, cond_expr, integer_zero_node);
> !   cond_expr =
> !     force_gimple_operand (cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> !   gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
>   
>     initialize_original_copy_tables ();
> !   nloop = loop_version (loop, cond_expr, &condition_bb,
>   			prob, prob, REG_BR_PROB_BASE - prob, true);
>     free_original_copy_tables();
>   
> --- 8167,8200 ----
>     th = conservative_cost_threshold (loop_vinfo,
>   				    min_profitable_iters);
>   
> !   *cond_expr =
> !     fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> ! 		 build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>   
> !   *cond_expr = force_gimple_operand (*cond_expr, cond_expr_stmt_list,
> ! 				     false, NULL_TREE);
>   
>     if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> !       vect_create_cond_for_align_checks (loop_vinfo, cond_expr,
> ! 					 cond_expr_stmt_list);
>   
>     if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> !     vect_create_cond_for_alias_checks (loop_vinfo, cond_expr,
> ! 				       cond_expr_stmt_list);
>   
> !   if (!bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> !     vect_create_cond_for_stride_checks (loop_vinfo, cond_expr);
> ! 
> !   *cond_expr =
> !     fold_build2 (NE_EXPR, boolean_type_node, *cond_expr, integer_zero_node);
> !   *cond_expr =
> !     force_gimple_operand (*cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> !   gimple_seq_add_seq (cond_expr_stmt_list, gimplify_stmt_list);
> !   if (!do_versioning)
> !     return;
>   
>     initialize_original_copy_tables ();
> !   nloop = loop_version (loop, *cond_expr, &condition_bb,
>   			prob, prob, REG_BR_PROB_BASE - prob, true);
>     free_original_copy_tables();
>   
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8154,8164 ****
>     /* End loop-exit-fixes after versioning.  */
>   
>     update_ssa (TODO_update_ssa);
> !   if (cond_expr_stmt_list)
>       {
>         cond_exp_gsi = gsi_last_bb (condition_bb);
> !       gsi_insert_seq_before (&cond_exp_gsi, cond_expr_stmt_list, GSI_SAME_STMT);
>       }
>   }
>   
>   /* Remove a group of stores (for SLP or interleaving), free their 
> --- 8225,8238 ----
>     /* End loop-exit-fixes after versioning.  */
>   
>     update_ssa (TODO_update_ssa);
> !   if (*cond_expr_stmt_list)
>       {
>         cond_exp_gsi = gsi_last_bb (condition_bb);
> !       gsi_insert_seq_before (&cond_exp_gsi, *cond_expr_stmt_list,
> ! 			     GSI_SAME_STMT);
> !       *cond_expr_stmt_list = NULL;
>       }
> +   *cond_expr = NULL_TREE;
>   }
>   
>   /* Remove a group of stores (for SLP or interleaving), free their 
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8320,8342 ****
>     bool strided_store;
>     bool slp_scheduled = false;
>     unsigned int nunits;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vec_transform_loop ===");
>   
> -   if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> -       || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> -     vect_loop_versioning (loop_vinfo);
> - 
> -   /* CHECKME: we wouldn't need this if we called update_ssa once
> -      for all loops.  */
> -   bitmap_zero (vect_memsyms_to_rename);
> - 
>     /* Peel the loop if there are data refs with unknown alignment.
>        Only one data ref with unknown store is allowed.  */
>   
>     if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       vect_do_peeling_for_alignment (loop_vinfo);
>     
>     /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
>        compile time constant), or it is a constant that doesn't divide by the
> --- 8394,8427 ----
>     bool strided_store;
>     bool slp_scheduled = false;
>     unsigned int nunits;
> +   tree cond_expr = NULL_TREE;
> +   gimple_seq cond_expr_stmt_list = NULL;
> +   bool do_peeling_for_loop_bound;
>   
>     if (vect_print_dump_info (REPORT_DETAILS))
>       fprintf (vect_dump, "=== vec_transform_loop ===");
>   
>     /* Peel the loop if there are data refs with unknown alignment.
>        Only one data ref with unknown store is allowed.  */
>   
>     if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>       vect_do_peeling_for_alignment (loop_vinfo);
> + 
> +   do_peeling_for_loop_bound
> +     = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +        || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + 	   && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0));
> + 
> +   if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> +       || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> +       || !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> +     vect_loop_versioning (loop_vinfo,
> + 			  !do_peeling_for_loop_bound,
> + 			  &cond_expr, &cond_expr_stmt_list);
> + 
> +   /* CHECKME: we wouldn't need this if we called update_ssa once
> +      for all loops.  */
> +   bitmap_zero (vect_memsyms_to_rename);
>     
>     /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
>        compile time constant), or it is a constant that doesn't divide by the
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8346,8355 ****
>        will remain scalar and will compute the remaining (n%VF) iterations.
>        (VF is the vectorization factor).  */
>   
> !   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> !       || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> !           && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0))
> !     vect_do_peeling_for_loop_bound (loop_vinfo, &ratio);
>     else
>       ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
>   		LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> --- 8431,8439 ----
>        will remain scalar and will compute the remaining (n%VF) iterations.
>        (VF is the vectorization factor).  */
>   
> !   if (do_peeling_for_loop_bound)
> !     vect_do_peeling_for_loop_bound (loop_vinfo, &ratio,
> ! 				    cond_expr, cond_expr_stmt_list);
>     else
>       ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
>   		LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90
> ===================================================================
> *** /dev/null	1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90	2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,17 ----
> + ! { dg-do compile }
> + 
> + subroutine to_product_of(self,a,b)
> +   real(kind=8), dimension(:,:) :: self
> +   real(kind=8), dimension(:,:), intent(in) :: a, b
> +   integer(kind=kind(1)) :: dim1, dim2
> +   dim1 = size(self,1)
> +   dim2 = size(self,2)
> +   do i = 1,dim1
> +     do j = 1,dim2
> +       self(i,j) = sum(a(i,:)*b(:,j))
> +     end do
> +   end do
> + end subroutine
> + 
> + ! { dg-final { scan-tree-dump "vectorized 1 loop" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/Makefile.in
> ===================================================================
> *** trunk.orig/gcc/Makefile.in	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/Makefile.in	2009-01-23 16:48:50.000000000 +0100
> *************** tree-nrv.o : tree-nrv.c $(CONFIG_H) $(SY
> *** 2135,2141 ****
>   tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>      $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
>      $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> !    $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H)
>   tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
>      $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
>      $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> --- 2135,2142 ----
>   tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
>      $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
>      $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> !    $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H) \
> !    $(CFGLOOP_H)
>   tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
>      $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
>      $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> Index: trunk/gcc/tree-ssa-copy.c
> ===================================================================
> *** trunk.orig/gcc/tree-ssa-copy.c	2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-ssa-copy.c	2009-01-23 16:48:50.000000000 +0100
> *************** along with GCC; see the file COPYING3.
> *** 37,42 ****
> --- 37,43 ----
>   #include "tree-pass.h"
>   #include "tree-ssa-propagate.h"
>   #include "langhooks.h"
> + #include "cfgloop.h"
>   
>   /* This file implements the copy propagation pass and provides a
>      handful of interfaces for performing const/copy propagation and
> *************** init_copy_prop (void)
> *** 991,997 ****
>             tree def;
>   
>   	  def = gimple_phi_result (phi);
> ! 	  if (!is_gimple_reg (def))
>               prop_set_simulate_again (phi, false);
>   	  else
>               prop_set_simulate_again (phi, true);
> --- 992,1004 ----
>             tree def;
>   
>   	  def = gimple_phi_result (phi);
> ! 	  if (!is_gimple_reg (def)
> ! 	      /* In loop-closed SSA form do not copy-propagate through
> ! 	         PHI nodes.  Technically this is only needed for loop
> ! 		 exit PHIs, but this is difficult to query.  */
> ! 	      || (current_loops
> ! 		  && gimple_phi_num_args (phi) == 1
> ! 		  && loops_state_satisfies_p (LOOP_CLOSED_SSA)))
>               prop_set_simulate_again (phi, false);
>   	  else
>               prop_set_simulate_again (phi, true);

Richard,
    Do you have an updated version of this patch which would apply against
current gcc trunk? Also, did this ever go into any of the branches?
              Jack


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]