This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RFC] Add versioning for constant strides for vectorization
- From: Jack Howarth <howarth at bromo dot med dot uc dot edu>
- To: Richard Guenther <rguenther at suse dot de>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Sat, 13 Mar 2010 14:01:11 -0500
- Subject: Re: [PATCH][RFC] Add versioning for constant strides for vectorization
- References: <alpine.LNX.2.00.0901231659050.24314@zhemvz.fhfr.qr>
On Fri, Jan 23, 2009 at 05:08:43PM +0100, Richard Guenther wrote:
>
> This patch adds the capability to the vectorizer to perform versioning
> for the case of a constant (suitable) stride. For example for
>
> subroutine to_product_of(self,a,b,a1,a2)
> complex(kind=8) :: self (:)
> complex(kind=8), intent(in) :: a(:,:)
> complex(kind=8), intent(in) :: b(:)
> integer a1,a2
> do i = 1,a1
> do j = 1,a2
> self(i) = self(i) + a(j,i)*b(j)
> end do
> end do
> end subroutine
>
> we can only apply vectorization if the strides of the fastest dimension
> of self, a and b are one (they are loaded from the passed array
> descriptors and thus appear as (loop invariant) variables).
>
> During the implementation of this I noticed that peeling for
> number of iterations (we have to unroll the above loop twice, and so
> for an odd number of iterations have a epilogue loop for the remaining
> iteration(s)) does not play well with versioning and we end up
> vectorizing the wrong loop. So I just disabled versioning if we
> apply peeling with an epilogue loop and instead attach the versioning
> condition to the pre-condition of the main loop that skips directly
> to the epilogue if the number of iterations is too small. We obviously
> can use the epilogue loop as the non-vectorized version.
>
> This patch also inserts an extra copyprop and dce pass before the
> vectorizer so it can recognize the reduction in the above testcase
> (LIM has made that reduction non-obvious). So I noticed that
> copyprop does not preserve loop-closed SSA form and fixed that as well.
>
> Some earlier version bootstrapped and tested ok on
> x86_64-unknown-linux-gnu, a final attempt is still running.
>
> I didn't yet performance test this extensively, but it might need
> cost-model adjustments and/or need to wait until we have profile
> feedback to properly seed vectorizer analysis here. A micro-benchmark
> based on the above loop shows around 15% improvement on AMD K10.
>
> Feedback (and ppc testing) is still welcome of course.
>
> Thanks,
> Richard.
>
> 2009-01-23 Richard Guenther <rguenther@suse.de>
>
> * passes.c (init_optimization_passes): Add copy-prop and dce
> before vectorization.
> * Makefile.in (tree-ssa-copy.o): Add $(CFGLOOP_H) dependency.
> * tree-ssa-copy.c (init_copy_prop): Do not propagate through
> single-argument PHIs if we are in loop-closed SSA form.
> * tree-data-ref.c (dr_analyze_innermost): Allow affine offsets.
> * tree-vect-analyze.c (vect_check_interleaving): Check that
> DR_STEP is constant.
> (vect_enhance_data_refs_alignment): If versioning for strides
> is required do not peel.
> (vect_analyze_data_ref_access): Allow non-constant step of
> a specific form, remember them for versioning.
> * params.def (vect-max-version-for-stride-checks): New param.
> (vect-version-for-stride-value): Likewise.
> * tree-vectorizer.c (slpeel_add_loop_guard): Pass extra guards
> for the pre-condition.
> (slpeel_tree_peel_loop_to_edge): Likewise.
> (new_loop_vec_info): Allocate stride versioning data.
> (destroy_loop_vec_info): Free stride versioning data.
> * tree-vectorizer.h (struct _loop_vec_info): Add variable_strides
> field.
> (LOOP_VINFO_VARIABLE_STRIDES): Define.
> (slpeel_tree_peel_loop_to_edge): Adjust declaration.
> * tree-vect-transform.c (vect_build_loop_niters): Take an
> optional sequence to append stmts.
> (vect_generate_tmps_on_preheader): Likewise.
> (vect_do_peeling_for_loop_bound): Take extra guards for the
> pre-condition.
> (vect_do_peeling_for_alignment): Adjust.
> (vect_create_cond_for_stride_checks): New function.
> (vect_loop_versioning): Take stmt and stmt list to put pre-condition
> guards if we are going to peel. Do not apply versioning in that
> case.
> (vect_transform_loop): If we are peeling for loop bound only
> record extra pre-conditions, do not apply loop versioning.
>
> * gcc.dg/vect/fast-math-vect-complex-5.c: New testcase.
> * gfortran.dg/vect/fast-math-vect-complex-1.f90: Likewise.
> * gfortran.dg/vect/fast-math-vect-stride-1.f90: Likewise.
>
> Index: trunk/gcc/passes.c
> ===================================================================
> *** trunk.orig/gcc/passes.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/passes.c 2009-01-23 16:48:50.000000000 +0100
> *************** init_optimization_passes (void)
> *** 659,664 ****
> --- 659,666 ----
> NEXT_PASS (pass_graphite_transforms);
> NEXT_PASS (pass_iv_canon);
> NEXT_PASS (pass_if_conversion);
> + NEXT_PASS (pass_copy_prop);
> + NEXT_PASS (pass_dce_loop);
> NEXT_PASS (pass_vectorize);
> {
> struct opt_pass **p = &pass_vectorize.pass.sub;
> Index: trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c
> ===================================================================
> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-5.c 2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do compile } */
> + /* { dg-require-effective-target vect_double } */
> +
> + #define NUM 64
> + _Complex double ad[NUM], bd[NUM], cd[NUM];
> +
> + void testd(void)
> + {
> + int i;
> + int j;
> +
> + for (i = 0; i < NUM; i++)
> + for (j = 0; j < NUM; j++)
> + cd[i] = cd[i] + ad[j] * bd[j];
> + }
> +
> + /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> + /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90
> ===================================================================
> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-complex-1.f90 2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,16 ----
> + ! { dg-do compile }
> +
> + subroutine to_product_of(self,a,b,a1,a2)
> + complex(kind=8) :: self (:)
> + complex(kind=8), intent(in) :: a(:,:)
> + complex(kind=8), intent(in) :: b(:)
> + integer a1,a2
> + do i = 1,a1
> + do j = 1,a2
> + self(i) = self(i) + a(i,j)*b(j)
> + end do
> + end do
> + end subroutine
> +
> + ! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/tree-data-ref.c
> ===================================================================
> *** trunk.orig/gcc/tree-data-ref.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-data-ref.c 2009-01-23 16:48:50.000000000 +0100
> *************** dr_analyze_innermost (struct data_refere
> *** 708,714 ****
> offset_iv.base = ssize_int (0);
> offset_iv.step = ssize_int (0);
> }
> ! else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> --- 708,714 ----
> offset_iv.base = ssize_int (0);
> offset_iv.step = ssize_int (0);
> }
> ! else if (!simple_iv (loop, stmt, poffset, &offset_iv, true))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> Index: trunk/gcc/tree-vect-analyze.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-analyze.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-analyze.c 2009-01-23 16:48:50.000000000 +0100
> *************** vect_check_interleaving (struct data_ref
> *** 1109,1114 ****
> --- 1109,1116 ----
> type_size_b = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb))));
>
> if (type_size_a != type_size_b
> + || TREE_CODE (DR_STEP (dra)) != INTEGER_CST
> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST
> || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb))
> || !types_compatible_p (TREE_TYPE (DR_REF (dra)),
> TREE_TYPE (DR_REF (drb))))
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1825,1830 ****
> --- 1827,1833 ----
> gimple stmt;
> stmt_vec_info stmt_info;
> int vect_versioning_for_alias_required;
> + int vect_versioning_for_strides_required;
>
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ===");
> *************** vect_enhance_data_refs_alignment (loop_v
> *** 1892,1904 ****
>
> vect_versioning_for_alias_required =
> (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
>
> /* Temporarily, if versioning for alias is required, we disable peeling
> until we support peeling and versioning. Often peeling for alignment
> will require peeling for loop-bound, which in turn requires that we
> know how to adjust the loop ivs after the loop. */
> if (vect_versioning_for_alias_required
> ! || !vect_can_advance_ivs_p (loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
> do_peeling = false;
>
> --- 1895,1910 ----
>
> vect_versioning_for_alias_required =
> (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)) > 0);
> + vect_versioning_for_strides_required =
> + !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
>
> /* Temporarily, if versioning for alias is required, we disable peeling
> until we support peeling and versioning. Often peeling for alignment
> will require peeling for loop-bound, which in turn requires that we
> know how to adjust the loop ivs after the loop. */
> if (vect_versioning_for_alias_required
> ! || vect_versioning_for_strides_required
> ! || !vect_can_advance_ivs_p (loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
> do_peeling = false;
>
> *************** vect_analyze_data_ref_access (struct dat
> *** 2349,2357 ****
> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> ! HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
>
> ! if (!step)
> {
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "bad data-ref access");
> --- 2355,2364 ----
> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> ! HOST_WIDE_INT dr_step;
>
> ! if (!step
> ! || TREE_CODE (step) != INTEGER_CST)
> {
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "bad data-ref access");
> *************** vect_analyze_data_ref_access (struct dat
> *** 2359,2364 ****
> --- 2366,2372 ----
> }
>
> /* Don't allow invariant accesses. */
> + dr_step = TREE_INT_CST_LOW (step);
> if (dr_step == 0)
> return false;
>
> *************** vect_analyze_data_refs (loop_vec_info lo
> *** 3563,3568 ****
> --- 3571,3620 ----
> return false;
> }
>
> + /* If the non-constant (but loop invariant) step is of the
> + form NAME or NAME * CST where CST is the element size mark
> + this ddr for versioning for strides and re-set DR_STEP
> + to the value we will version for. Otherwise reject
> + non-constant steps. */
> + if (TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
> + {
> + tree step = DR_STEP (dr);
> +
> + STRIP_NOPS (step);
> + if (flag_tree_vect_loop_version
> + && (TREE_CODE (step) == SSA_NAME
> + || (TREE_CODE (step) == MULT_EXPR
> + && TREE_CODE (TREE_OPERAND (step, 1)) == INTEGER_CST)))
> + {
> + tree stride;
> + tree newstep;
> +
> + stride = step;
> + if (TREE_CODE (step) == MULT_EXPR)
> + stride = TREE_OPERAND (step, 0);
> + STRIP_NOPS (stride);
> + if (TREE_CODE (stride) != SSA_NAME)
> + return false;
> +
> + bitmap_set_bit (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo),
> + SSA_NAME_VERSION (stride));
> + if (bitmap_count_bits (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo))
> + > (unsigned)PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS))
> + return false;
> +
> + /* ??? Delay this change until after versioning or
> + preserve the original step somewhere. */
> + newstep = build_int_cst (TREE_TYPE (step),
> + PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE));
> + if (TREE_CODE (step) == MULT_EXPR)
> + newstep = int_const_binop (MULT_EXPR, newstep,
> + TREE_OPERAND (step, 1), false);
> + DR_STEP (dr) = newstep;
> + }
> + else
> + return false;
> + }
> +
> if (!DR_SYMBOL_TAG (dr))
> {
> if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
> Index: trunk/gcc/params.def
> ===================================================================
> *** trunk.orig/gcc/params.def 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/params.def 2009-01-23 16:48:50.000000000 +0100
> *************** DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
> *** 506,511 ****
> --- 506,521 ----
> "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check",
> 10, 0, 0)
>
> + DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_STRIDE_CHECKS,
> + "vect-max-version-for-stride-checks",
> + "Bound on number of runtime checks inserted by the vectorizer's loop versioning for stride check",
> + 4, 0, 0)
> +
> + DEFPARAM(PARAM_VECT_VERSION_FOR_STRIDE_VALUE,
> + "vect-version-for-stride-value",
> + "The constant stride in elements the vectorizer uses for loop versioning",
> + 1, 0, 0)
> +
> DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
> "max-cselib-memory-locations",
> "The maximum memory locations recorded by cselib",
> Index: trunk/gcc/tree-vectorizer.c
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.c 2009-01-23 16:48:50.000000000 +0100
> *************** slpeel_tree_duplicate_loop_to_edge_cfg (
> *** 927,934 ****
> Returns the skip edge. */
>
> static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond, basic_block exit_bb,
> ! basic_block dom_bb)
> {
> gimple_stmt_iterator gsi;
> edge new_e, enter_e;
> --- 927,935 ----
> Returns the skip edge. */
>
> static edge
> ! slpeel_add_loop_guard (basic_block guard_bb, tree cond,
> ! gimple_seq cond_expr_stmt_list,
> ! basic_block exit_bb, basic_block dom_bb)
> {
> gimple_stmt_iterator gsi;
> edge new_e, enter_e;
> *************** slpeel_add_loop_guard (basic_block guard
> *** 941,951 ****
> gsi = gsi_last_bb (guard_bb);
>
> cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
> cond_stmt = gimple_build_cond (NE_EXPR,
> cond, build_int_cst (TREE_TYPE (cond), 0),
> NULL_TREE, NULL_TREE);
> ! if (gimplify_stmt_list)
> ! gsi_insert_seq_after (&gsi, gimplify_stmt_list, GSI_NEW_STMT);
>
> gsi = gsi_last_bb (guard_bb);
> gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> --- 942,954 ----
> gsi = gsi_last_bb (guard_bb);
>
> cond = force_gimple_operand (cond, &gimplify_stmt_list, true, NULL_TREE);
> + if (gimplify_stmt_list)
> + gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
> cond_stmt = gimple_build_cond (NE_EXPR,
> cond, build_int_cst (TREE_TYPE (cond), 0),
> NULL_TREE, NULL_TREE);
> ! if (cond_expr_stmt_list)
> ! gsi_insert_seq_after (&gsi, cond_expr_stmt_list, GSI_NEW_STMT);
>
> gsi = gsi_last_bb (guard_bb);
> gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> *************** struct loop*
> *** 1151,1157 ****
> slpeel_tree_peel_loop_to_edge (struct loop *loop,
> edge e, tree first_niters,
> tree niters, bool update_first_loop_count,
> ! unsigned int th, bool check_profitability)
> {
> struct loop *new_loop = NULL, *first_loop, *second_loop;
> edge skip_e;
> --- 1154,1161 ----
> slpeel_tree_peel_loop_to_edge (struct loop *loop,
> edge e, tree first_niters,
> tree niters, bool update_first_loop_count,
> ! unsigned int th, bool check_profitability,
> ! tree cond_expr, gimple_seq cond_expr_stmt_list)
> {
> struct loop *new_loop = NULL, *first_loop, *second_loop;
> edge skip_e;
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1325,1330 ****
> --- 1329,1342 ----
> pre_condition = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> cost_pre_condition, pre_condition);
> }
> + if (cond_expr)
> + {
> + pre_condition =
> + fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> + pre_condition,
> + fold_build1 (TRUTH_NOT_EXPR, boolean_type_node,
> + cond_expr));
> + }
> }
>
> /* Prologue peeling. */
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1340,1345 ****
> --- 1352,1358 ----
> }
>
> skip_e = slpeel_add_loop_guard (bb_before_first_loop, pre_condition,
> + cond_expr_stmt_list,
> bb_before_second_loop, bb_before_first_loop);
> slpeel_update_phi_nodes_for_guard1 (skip_e, first_loop,
> first_loop == new_loop,
> *************** slpeel_tree_peel_loop_to_edge (struct lo
> *** 1377,1383 ****
>
> pre_condition =
> fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> ! skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition,
> bb_after_second_loop, bb_before_first_loop);
> slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
> second_loop == new_loop, &new_exit_bb);
> --- 1390,1396 ----
>
> pre_condition =
> fold_build2 (EQ_EXPR, boolean_type_node, first_niters, niters);
> ! skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
> bb_after_second_loop, bb_before_first_loop);
> slpeel_update_phi_nodes_for_guard2 (skip_e, second_loop,
> second_loop == new_loop, &new_exit_bb);
> *************** new_loop_vec_info (struct loop *loop)
> *** 1714,1719 ****
> --- 1727,1733 ----
> LOOP_VINFO_MAY_ALIAS_DDRS (res) =
> VEC_alloc (ddr_p, heap,
> PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
> + LOOP_VINFO_VARIABLE_STRIDES (res) = BITMAP_ALLOC (NULL);
> LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
> LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
> LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
> *************** destroy_loop_vec_info (loop_vec_info loo
> *** 1800,1805 ****
> --- 1814,1820 ----
> free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
> VEC_free (gimple, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
> VEC_free (ddr_p, heap, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo));
> + BITMAP_FREE (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo));
> slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
> for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
> vect_free_slp_instance (instance);
> Index: trunk/gcc/tree-vectorizer.h
> ===================================================================
> *** trunk.orig/gcc/tree-vectorizer.h 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vectorizer.h 2009-01-23 16:48:50.000000000 +0100
> *************** typedef struct _loop_vec_info {
> *** 210,215 ****
> --- 210,219 ----
> /* All data dependences in the loop. */
> VEC (ddr_p, heap) *ddrs;
>
> + /* SSA_NAMEs representing variable strides in data references.
> + Candidates for a run-time stride check. */
> + bitmap variable_strides;
> +
> /* Data Dependence Relations defining address ranges that are candidates
> for a run-time aliasing check. */
> VEC (ddr_p, heap) *may_alias_ddrs;
> *************** typedef struct _loop_vec_info {
> *** 254,259 ****
> --- 258,264 ----
> #define LOOP_VINFO_LOC(L) (L)->loop_line_number
> #define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
> #define LOOP_VINFO_STRIDED_STORES(L) (L)->strided_stores
> + #define LOOP_VINFO_VARIABLE_STRIDES(L) (L)->variable_strides
> #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
> #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
>
> *************** extern bitmap vect_memsyms_to_rename;
> *** 707,713 ****
> divide by the vectorization factor, and to peel the first few iterations
> to force the alignment of data references in the loop. */
> extern struct loop *slpeel_tree_peel_loop_to_edge
> ! (struct loop *, edge, tree, tree, bool, unsigned int, bool);
> extern void set_prologue_iterations (basic_block, tree,
> struct loop *, unsigned int);
> struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> --- 712,718 ----
> divide by the vectorization factor, and to peel the first few iterations
> to force the alignment of data references in the loop. */
> extern struct loop *slpeel_tree_peel_loop_to_edge
> ! (struct loop *, edge, tree, tree, bool, unsigned int, bool, tree, gimple_seq);
> extern void set_prologue_iterations (basic_block, tree,
> struct loop *, unsigned int);
> struct loop *tree_duplicate_loop_on_edge (struct loop *, edge);
> Index: trunk/gcc/tree-vect-transform.c
> ===================================================================
> *** trunk.orig/gcc/tree-vect-transform.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-vect-transform.c 2009-01-23 16:48:50.000000000 +0100
> *************** static tree get_initial_def_for_reductio
> *** 65,72 ****
>
> /* Utility function dealing with loop peeling (not peeling itself). */
> static void vect_generate_tmps_on_preheader
> ! (loop_vec_info, tree *, tree *, tree *);
> ! static tree vect_build_loop_niters (loop_vec_info);
> static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge);
> static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
> static void vect_update_init_of_dr (struct data_reference *, tree niters);
> --- 65,72 ----
>
> /* Utility function dealing with loop peeling (not peeling itself). */
> static void vect_generate_tmps_on_preheader
> ! (loop_vec_info, tree *, tree *, tree *, gimple_seq);
> ! static tree vect_build_loop_niters (loop_vec_info, gimple_seq);
> static void vect_update_ivs_after_vectorizer (loop_vec_info, tree, edge);
> static tree vect_gen_niters_for_prolog_loop (loop_vec_info, tree);
> static void vect_update_init_of_dr (struct data_reference *, tree niters);
> *************** vect_transform_stmt (gimple stmt, gimple
> *** 7199,7205 ****
> on the loop preheader. */
>
> static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo)
> {
> tree ni_name, var;
> gimple_seq stmts = NULL;
> --- 7199,7205 ----
> on the loop preheader. */
>
> static tree
> ! vect_build_loop_niters (loop_vec_info loop_vinfo, gimple_seq seq)
> {
> tree ni_name, var;
> gimple_seq stmts = NULL;
> *************** vect_build_loop_niters (loop_vec_info lo
> *** 7214,7221 ****
> pe = loop_preheader_edge (loop);
> if (stmts)
> {
> ! basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> }
>
> return ni_name;
> --- 7214,7226 ----
> pe = loop_preheader_edge (loop);
> if (stmts)
> {
> ! if (seq)
> ! gimple_seq_add_seq (&seq, stmts);
> ! else
> ! {
> ! basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> ! }
> }
>
> return ni_name;
> *************** static void
> *** 7234,7240 ****
> vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
> tree *ni_name_ptr,
> tree *ratio_mult_vf_name_ptr,
> ! tree *ratio_name_ptr)
> {
>
> edge pe;
> --- 7239,7246 ----
> vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
> tree *ni_name_ptr,
> tree *ratio_mult_vf_name_ptr,
> ! tree *ratio_name_ptr,
> ! gimple_seq cond_expr_stmt_list)
> {
>
> edge pe;
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7254,7260 ****
> /* Generate temporary variable that contains
> number of iterations loop executes. */
>
> ! ni_name = vect_build_loop_niters (loop_vinfo);
> log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>
> /* Create: ratio = ni >> log2(vf) */
> --- 7260,7266 ----
> /* Generate temporary variable that contains
> number of iterations loop executes. */
>
> ! ni_name = vect_build_loop_niters (loop_vinfo, cond_expr_stmt_list);
> log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf));
>
> /* Create: ratio = ni >> log2(vf) */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7267,7275 ****
>
> stmts = NULL;
> ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> ! pe = loop_preheader_edge (loop);
> ! new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> }
>
> /* Create: ratio_mult_vf = ratio << log2 (vf). */
> --- 7273,7286 ----
>
> stmts = NULL;
> ratio_name = force_gimple_operand (ratio_name, &stmts, true, var);
> ! if (cond_expr_stmt_list)
> ! gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> ! else
> ! {
> ! pe = loop_preheader_edge (loop);
> ! new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> ! }
> }
>
> /* Create: ratio_mult_vf = ratio << log2 (vf). */
> *************** vect_generate_tmps_on_preheader (loop_ve
> *** 7284,7292 ****
> stmts = NULL;
> ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
> true, var);
> ! pe = loop_preheader_edge (loop);
> ! new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> }
>
> *ni_name_ptr = ni_name;
> --- 7295,7308 ----
> stmts = NULL;
> ratio_mult_vf_name = force_gimple_operand (ratio_mult_vf_name, &stmts,
> true, var);
> ! if (cond_expr_stmt_list)
> ! gimple_seq_add_seq (&cond_expr_stmt_list, stmts);
> ! else
> ! {
> ! pe = loop_preheader_edge (loop);
> ! new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> ! gcc_assert (!new_bb);
> ! }
> }
>
> *ni_name_ptr = ni_name;
> *************** conservative_cost_threshold (loop_vec_in
> *** 7470,7476 ****
> NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO). */
>
> static void
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
> {
> tree ni_name, ratio_mult_vf_name;
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> --- 7486,7493 ----
> NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO). */
>
> static void
> ! vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
> ! tree cond_expr, gimple_seq cond_expr_stmt_list)
> {
> tree ni_name, ratio_mult_vf_name;
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7493,7499 ****
> ratio = ni_name / vf
> ratio_mult_vf_name = ratio * vf */
> vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! &ratio_mult_vf_name, ratio);
>
> loop_num = loop->num;
>
> --- 7510,7517 ----
> ratio = ni_name / vf
> ratio_mult_vf_name = ratio * vf */
> vect_generate_tmps_on_preheader (loop_vinfo, &ni_name,
> ! &ratio_mult_vf_name, ratio,
> ! cond_expr_stmt_list);
>
> loop_num = loop->num;
>
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7501,7507 ****
> peeling for alignment. */
> if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> ! && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
> {
> check_profitability = true;
>
> --- 7519,7526 ----
> peeling for alignment. */
> if (!VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> && !VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> ! && !LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo)
> ! && !cond_expr)
> {
> check_profitability = true;
>
> *************** vect_do_peeling_for_loop_bound (loop_vec
> *** 7514,7520 ****
>
> new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
> ratio_mult_vf_name, ni_name, false,
> ! th, check_profitability);
> gcc_assert (new_loop);
> gcc_assert (loop_num == loop->num);
> #ifdef ENABLE_CHECKING
> --- 7533,7540 ----
>
> new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
> ratio_mult_vf_name, ni_name, false,
> ! th, check_profitability,
> ! cond_expr, cond_expr_stmt_list);
> gcc_assert (new_loop);
> gcc_assert (loop_num == loop->num);
> #ifdef ENABLE_CHECKING
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7738,7744 ****
>
> initialize_original_copy_tables ();
>
> ! ni_name = vect_build_loop_niters (loop_vinfo);
> niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>
>
> --- 7758,7764 ----
>
> initialize_original_copy_tables ();
>
> ! ni_name = vect_build_loop_niters (loop_vinfo, NULL);
> niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo, ni_name);
>
>
> *************** vect_do_peeling_for_alignment (loop_vec_
> *** 7759,7765 ****
> new_loop =
> slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
> niters_of_prolog_loop, ni_name, true,
> ! th, check_profitability);
>
> gcc_assert (new_loop);
> #ifdef ENABLE_CHECKING
> --- 7779,7785 ----
> new_loop =
> slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
> niters_of_prolog_loop, ni_name, true,
> ! th, check_profitability, NULL_TREE, NULL);
>
> gcc_assert (new_loop);
> #ifdef ENABLE_CHECKING
> *************** vect_create_cond_for_align_checks (loop_
> *** 7909,7914 ****
> --- 7929,7981 ----
> *cond_expr = part_cond_expr;
> }
>
> + /* Function vect_create_cond_for_stride_checks.
> +
> + Create a conditional expression that represents the stride checks for
> + all of the stride SSA_NAMEs used in data references (array element
> + references) whose stride must be checked at runtime.
> +
> + Input:
> + COND_EXPR - input conditional expression. New conditions will be chained
> + with logical AND operation.
> + LOOP_VINFO - on field of the loop information is used.
> + LOOP_VINFO_VARIABLE_STRIDES is a bitmap of SSA_NAMEs to be
> + checked.
> +
> + Output:
> + COND_EXPR_STMT_LIST - statements needed to construct the conditional
> + expression.
> + The returned value is the conditional expression to be used in the if
> + statement that controls which version of the loop gets executed at runtime.
> +
> + The stride we do versioning for is currently specified by a compile-time
> + param. In future the stride should be chosen by information from
> + profile-feedback. */
> +
> + static void
> + vect_create_cond_for_stride_checks (loop_vec_info loop_vinfo,
> + tree *cond_expr)
> + {
> + bitmap_iterator bi;
> + unsigned int i;
> + HOST_WIDE_INT stride;
> +
> + stride = PARAM_VALUE (PARAM_VECT_VERSION_FOR_STRIDE_VALUE);
> +
> + EXECUTE_IF_SET_IN_BITMAP (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo), 0, i, bi)
> + {
> + tree name = ssa_name (i);
> + tree cond = fold_build2 (EQ_EXPR, boolean_type_node,
> + name,
> + build_int_cst (TREE_TYPE (name), stride));
> + if (*cond_expr)
> + *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> + *cond_expr, cond);
> + else
> + *cond_expr = cond;
> + }
> + }
> +
> /* Function vect_vfa_segment_size.
>
> Create an expression that computes the size of segment
> *************** vect_create_cond_for_alias_checks (loop_
> *** 8076,8087 ****
> cost model initially. */
>
> static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo)
> {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> struct loop *nloop;
> - tree cond_expr = NULL_TREE;
> - gimple_seq cond_expr_stmt_list = NULL;
> basic_block condition_bb;
> gimple_stmt_iterator gsi, cond_exp_gsi;
> basic_block merge_bb;
> --- 8143,8153 ----
> cost model initially. */
>
> static void
> ! vect_loop_versioning (loop_vec_info loop_vinfo, bool do_versioning,
> ! tree *cond_expr, gimple_seq *cond_expr_stmt_list)
> {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> struct loop *nloop;
> basic_block condition_bb;
> gimple_stmt_iterator gsi, cond_exp_gsi;
> basic_block merge_bb;
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8101,8129 ****
> th = conservative_cost_threshold (loop_vinfo,
> min_profitable_iters);
>
> ! cond_expr =
> ! build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> ! build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>
> ! cond_expr = force_gimple_operand (cond_expr, &cond_expr_stmt_list,
> ! false, NULL_TREE);
>
> if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> ! vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> ! &cond_expr_stmt_list);
>
> if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> ! vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr,
> ! &cond_expr_stmt_list);
>
> ! cond_expr =
> ! fold_build2 (NE_EXPR, boolean_type_node, cond_expr, integer_zero_node);
> ! cond_expr =
> ! force_gimple_operand (cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> ! gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
>
> initialize_original_copy_tables ();
> ! nloop = loop_version (loop, cond_expr, &condition_bb,
> prob, prob, REG_BR_PROB_BASE - prob, true);
> free_original_copy_tables();
>
> --- 8167,8200 ----
> th = conservative_cost_threshold (loop_vinfo,
> min_profitable_iters);
>
> ! *cond_expr =
> ! fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> ! build_int_cst (TREE_TYPE (scalar_loop_iters), th));
>
> ! *cond_expr = force_gimple_operand (*cond_expr, cond_expr_stmt_list,
> ! false, NULL_TREE);
>
> if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo)))
> ! vect_create_cond_for_align_checks (loop_vinfo, cond_expr,
> ! cond_expr_stmt_list);
>
> if (VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> ! vect_create_cond_for_alias_checks (loop_vinfo, cond_expr,
> ! cond_expr_stmt_list);
>
> ! if (!bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> ! vect_create_cond_for_stride_checks (loop_vinfo, cond_expr);
> !
> ! *cond_expr =
> ! fold_build2 (NE_EXPR, boolean_type_node, *cond_expr, integer_zero_node);
> ! *cond_expr =
> ! force_gimple_operand (*cond_expr, &gimplify_stmt_list, true, NULL_TREE);
> ! gimple_seq_add_seq (cond_expr_stmt_list, gimplify_stmt_list);
> ! if (!do_versioning)
> ! return;
>
> initialize_original_copy_tables ();
> ! nloop = loop_version (loop, *cond_expr, &condition_bb,
> prob, prob, REG_BR_PROB_BASE - prob, true);
> free_original_copy_tables();
>
> *************** vect_loop_versioning (loop_vec_info loop
> *** 8154,8164 ****
> /* End loop-exit-fixes after versioning. */
>
> update_ssa (TODO_update_ssa);
> ! if (cond_expr_stmt_list)
> {
> cond_exp_gsi = gsi_last_bb (condition_bb);
> ! gsi_insert_seq_before (&cond_exp_gsi, cond_expr_stmt_list, GSI_SAME_STMT);
> }
> }
>
> /* Remove a group of stores (for SLP or interleaving), free their
> --- 8225,8238 ----
> /* End loop-exit-fixes after versioning. */
>
> update_ssa (TODO_update_ssa);
> ! if (*cond_expr_stmt_list)
> {
> cond_exp_gsi = gsi_last_bb (condition_bb);
> ! gsi_insert_seq_before (&cond_exp_gsi, *cond_expr_stmt_list,
> ! GSI_SAME_STMT);
> ! *cond_expr_stmt_list = NULL;
> }
> + *cond_expr = NULL_TREE;
> }
>
> /* Remove a group of stores (for SLP or interleaving), free their
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8320,8342 ****
> bool strided_store;
> bool slp_scheduled = false;
> unsigned int nunits;
>
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "=== vec_transform_loop ===");
>
> - if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> - || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo)))
> - vect_loop_versioning (loop_vinfo);
> -
> - /* CHECKME: we wouldn't need this if we called update_ssa once
> - for all loops. */
> - bitmap_zero (vect_memsyms_to_rename);
> -
> /* Peel the loop if there are data refs with unknown alignment.
> Only one data ref with unknown store is allowed. */
>
> if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
> vect_do_peeling_for_alignment (loop_vinfo);
>
> /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
> compile time constant), or it is a constant that doesn't divide by the
> --- 8394,8427 ----
> bool strided_store;
> bool slp_scheduled = false;
> unsigned int nunits;
> + tree cond_expr = NULL_TREE;
> + gimple_seq cond_expr_stmt_list = NULL;
> + bool do_peeling_for_loop_bound;
>
> if (vect_print_dump_info (REPORT_DETAILS))
> fprintf (vect_dump, "=== vec_transform_loop ===");
>
> /* Peel the loop if there are data refs with unknown alignment.
> Only one data ref with unknown store is allowed. */
>
> if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
> vect_do_peeling_for_alignment (loop_vinfo);
> +
> + do_peeling_for_loop_bound
> + = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0));
> +
> + if (VEC_length (gimple, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> + || VEC_length (ddr_p, LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo))
> + || !bitmap_empty_p (LOOP_VINFO_VARIABLE_STRIDES (loop_vinfo)))
> + vect_loop_versioning (loop_vinfo,
> + !do_peeling_for_loop_bound,
> + &cond_expr, &cond_expr_stmt_list);
> +
> + /* CHECKME: we wouldn't need this if we called update_ssa once
> + for all loops. */
> + bitmap_zero (vect_memsyms_to_rename);
>
> /* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
> compile time constant), or it is a constant that doesn't divide by the
> *************** vect_transform_loop (loop_vec_info loop_
> *** 8346,8355 ****
> will remain scalar and will compute the remaining (n%VF) iterations.
> (VF is the vectorization factor). */
>
> ! if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> ! || (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> ! && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0))
> ! vect_do_peeling_for_loop_bound (loop_vinfo, &ratio);
> else
> ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
> LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> --- 8431,8439 ----
> will remain scalar and will compute the remaining (n%VF) iterations.
> (VF is the vectorization factor). */
>
> ! if (do_peeling_for_loop_bound)
> ! vect_do_peeling_for_loop_bound (loop_vinfo, &ratio,
> ! cond_expr, cond_expr_stmt_list);
> else
> ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
> LOOP_VINFO_INT_NITERS (loop_vinfo) / vectorization_factor);
> Index: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90
> ===================================================================
> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> --- trunk/gcc/testsuite/gfortran.dg/vect/fast-math-vect-stride-1.f90 2009-01-23 16:48:50.000000000 +0100
> ***************
> *** 0 ****
> --- 1,17 ----
> + ! { dg-do compile }
> +
> + subroutine to_product_of(self,a,b)
> + real(kind=8), dimension(:,:) :: self
> + real(kind=8), dimension(:,:), intent(in) :: a, b
> + integer(kind=kind(1)) :: dim1, dim2
> + dim1 = size(self,1)
> + dim2 = size(self,2)
> + do i = 1,dim1
> + do j = 1,dim2
> + self(i,j) = sum(a(i,:)*b(:,j))
> + end do
> + end do
> + end subroutine
> +
> + ! { dg-final { scan-tree-dump "vectorized 1 loop" "vect" } }
> + ! { dg-final { cleanup-tree-dump "vect" } }
> Index: trunk/gcc/Makefile.in
> ===================================================================
> *** trunk.orig/gcc/Makefile.in 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/Makefile.in 2009-01-23 16:48:50.000000000 +0100
> *************** tree-nrv.o : tree-nrv.c $(CONFIG_H) $(SY
> *** 2135,2141 ****
> tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
> $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
> $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> ! $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H)
> tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
> $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
> $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> --- 2135,2142 ----
> tree-ssa-copy.o : tree-ssa-copy.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
> $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h $(DIAGNOSTIC_H) \
> $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
> ! $(BASIC_BLOCK_H) tree-pass.h langhooks.h tree-ssa-propagate.h $(FLAGS_H) \
> ! $(CFGLOOP_H)
> tree-ssa-propagate.o : tree-ssa-propagate.c $(TREE_FLOW_H) $(CONFIG_H) \
> $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) output.h \
> $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
> Index: trunk/gcc/tree-ssa-copy.c
> ===================================================================
> *** trunk.orig/gcc/tree-ssa-copy.c 2009-01-23 16:47:53.000000000 +0100
> --- trunk/gcc/tree-ssa-copy.c 2009-01-23 16:48:50.000000000 +0100
> *************** along with GCC; see the file COPYING3.
> *** 37,42 ****
> --- 37,43 ----
> #include "tree-pass.h"
> #include "tree-ssa-propagate.h"
> #include "langhooks.h"
> + #include "cfgloop.h"
>
> /* This file implements the copy propagation pass and provides a
> handful of interfaces for performing const/copy propagation and
> *************** init_copy_prop (void)
> *** 991,997 ****
> tree def;
>
> def = gimple_phi_result (phi);
> ! if (!is_gimple_reg (def))
> prop_set_simulate_again (phi, false);
> else
> prop_set_simulate_again (phi, true);
> --- 992,1004 ----
> tree def;
>
> def = gimple_phi_result (phi);
> ! if (!is_gimple_reg (def)
> ! /* In loop-closed SSA form do not copy-propagate through
> ! PHI nodes. Technically this is only needed for loop
> ! exit PHIs, but this is difficult to query. */
> ! || (current_loops
> ! && gimple_phi_num_args (phi) == 1
> ! && loops_state_satisfies_p (LOOP_CLOSED_SSA)))
> prop_set_simulate_again (phi, false);
> else
> prop_set_simulate_again (phi, true);
Richard,
Do you have an updated version of this patch which would apply against
current gcc trunk? Also, did this ever go into any of the branches?
Jack