This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, RFC] Fix PR71915, PR71490: Handle casts on strides consistently
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 24 Oct 2016 12:55:41 +0200
- Subject: Re: [PATCH, RFC] Fix PR71915, PR71490: Handle casts on strides consistently
- Authentication-results: sourceware.org; auth=none
- References: <1477086370.5636.18.camel@oc8801110288.ibm.com>
On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> I've been meaning for some time now to do a better job handling strength
> reduction candidates where the stride of the candidate and its basis
> involves a cast (usually widening) from another type. The two PRs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
> strength reduction opportunities in such cases. Essentially the way I
> was handling this before required a cast to go back to the original
> type, which was illegal when this was a narrowing operation or a cast
> from a wrapping type to a non-wrapping type.
>
> This patch improves matters by tracking the effective type of the stride
> so that we perform arithmetic in that type directly. This allows us to
> strength-reduce some cases missed before. We still have to be careful
> not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
> cases don't come up nearly so often in practice with this patch.
>
> The patch is pretty straightforward but rather large, so I'd like to
> leave it up for review for a week or so before committing it -- extra
> eyes welcome!
>
> There is an existing test case that checks for the narrowing problem,
> and now fails with this patch as it should. That is, it's now legal to
> insert the initializer that we formerly knew to be bad for this test.
> Since the test no longer serves any purpose, I'm deleting it.
>
> gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
> than when I first added it to the test suite. As a result of various
> optimization changes, it's no longer maintained as a single block;
> instead, the optimizable computations are sunk into two legs of a
> conditional. This exposed another similar case, leading to the bug
> reports. This test now passes, but I had to adjust it for the new code
> generation we get. I also added some commentary to indicate what we
> expect to happen in that test, since it isn't too obvious.
>
> I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
> with no regressions. To avoid the Wrath of Octoploid ;), I've also
> tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
> -march=amdfam10, also with no failures. I've also verified that we hit
> this code about 35 times in the test suite, so it looks like we have
> some decent additional test coverage.
>
> Thanks in advance to anyone willing to take a look.
LGTM.
Richard.
> Bill
>
>
> [gcc]
>
> 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> PR tree-optimization/71915
> * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
> stride_type field.
> (find_basis_for_base_expr): Require stride types to match when
> seeking a basis.
> (alloc_cand_and_find_basis): Record the stride type.
> (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
> (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
> the expressions having those types.
> (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
> (create_mul_ssa_cand): Likewise.
> (create_mul_imm_cand): Likewise.
> (create_add_ssa_cand): Likewise.
> (create_add_imm_cand): Likewise.
> (legal_cast_p_1): Change interface to accept types rather than the
> expressions having those types.
> (legal_cast_p): Pass types to legal_cast_p_1.
> (slsr_process_cast): Pass stride type to
> alloc_cand_and_find_basis.
> (slsr_process_copy): Likewise.
> (dump_candidate): Display stride type when a cast exists.
> (create_add_on_incoming_edge): Introduce a cast when necessary for
> the stride type.
> (analyze_increments): Change the code checking for invalid casts
> to rely on the stride type, and update the documentation and
> example. Change the code checking for pointer multiplies to rely
> on the stride type.
> (insert_initializers): Introduce a cast when necessary for the
> stride type. Use the stride type for the type of the initializer.
>
> [gcc/testsuite]
>
> 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
>
> PR tree-optimization/71915
> * gcc.dg/tree-ssa/pr54245.c: Delete.
> * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
> document why.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===================================================================
> --- gcc/gimple-ssa-strength-reduction.c (revision 241379)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -246,6 +246,13 @@ struct slsr_cand_d
> replacement MEM_REF.) */
> tree cand_type;
>
> + /* The type to be used to interpret the stride field when the stride
> + is not a constant. Normally the same as the type of the recorded
> + stride, but when the stride has been cast we need to maintain that
> + knowledge in order to make legal substitutions without losing
> + precision. When the stride is a constant, this will be sizetype. */
> + tree stride_type;
> +
> /* The kind of candidate (CAND_MULT, etc.). */
> enum cand_kind kind;
>
> @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base
> || one_basis->cand_stmt == c->cand_stmt
> || !operand_equal_p (one_basis->stride, c->stride, 0)
> || !types_compatible_p (one_basis->cand_type, c->cand_type)
> + || !types_compatible_p (one_basis->stride_type, c->stride_type)
> || !dominated_by_p (CDI_DOMINATORS,
> gimple_bb (c->cand_stmt),
> gimple_bb (one_basis->cand_stmt)))
> @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base)
> static slsr_cand_t
> alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base,
> const widest_int &index, tree stride, tree ctype,
> - unsigned savings)
> + tree stype, unsigned savings)
> {
> slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack,
> sizeof (slsr_cand));
> @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi
> c->stride = stride;
> c->index = index;
> c->cand_type = ctype;
> + c->stride_type = stype;
> c->kind = kind;
> c->cand_num = cand_vec.length () + 1;
> c->next_interp = 0;
> @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed)
> base_type = TREE_TYPE (arg0_base);
>
> c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base,
> - 0, integer_one_node, base_type, savings);
> + 0, integer_one_node, base_type,
> + sizetype, savings);
>
> /* Add the candidate to the statement-candidate mapping. */
> add_cand_for_stmt (phi, c);
> @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase)
> e.g. 'B' is widened from an 'int' in order to calculate
> a 64-bit address. */
> if (CONVERT_EXPR_P (base_in)
> - && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
> + && legal_cast_p_1 (TREE_TYPE (base_in),
> + TREE_TYPE (TREE_OPERAND (base_in, 0))))
> base_in = get_unwidened (base_in, NULL_TREE);
>
> if (TREE_CODE (base_in) != SSA_NAME)
> @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs)
> return;
>
> c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset,
> - type, 0);
> + type, sizetype, 0);
>
> /* Add the candidate to the statement-candidate mapping. */
> add_cand_for_stmt (gs, c);
> @@ -1010,6 +1021,7 @@ static slsr_cand_t
> create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed)
> {
> tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> + tree stype = NULL_TREE;
> widest_int index;
> unsigned savings = 0;
> slsr_cand_t c;
> @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
> index = base_cand->index;
> stride = stride_in;
> ctype = base_cand->cand_type;
> + stype = TREE_TYPE (stride_in);
> if (has_single_use (base_in))
> savings = (base_cand->dead_savings
> + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
> index = base_cand->index * wi::to_widest (base_cand->stride);
> stride = stride_in;
> ctype = base_cand->cand_type;
> + stype = TREE_TYPE (stride_in);
> if (has_single_use (base_in))
> savings = (base_cand->dead_savings
> + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
> index = 0;
> stride = stride_in;
> ctype = TREE_TYPE (base_in);
> + stype = TREE_TYPE (stride_in);
> }
>
> c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
> - ctype, savings);
> + ctype, stype, savings);
> return c;
> }
>
> @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre
> }
>
> c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
> - ctype, savings);
> + ctype, sizetype, savings);
> return c;
> }
>
> @@ -1212,7 +1227,8 @@ static slsr_cand_t
> create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in,
> bool subtract_p, bool speed)
> {
> - tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL;
> + tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> + tree stype = NULL_TREE;
> widest_int index;
> unsigned savings = 0;
> slsr_cand_t c;
> @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
> index = -index;
> stride = addend_cand->base_expr;
> ctype = TREE_TYPE (base_in);
> + stype = addend_cand->cand_type;
> if (has_single_use (addend_in))
> savings = (addend_cand->dead_savings
> + stmt_cost (addend_cand->cand_stmt, speed));
> @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
> index = subtract_p ? -1 : 1;
> stride = addend_in;
> ctype = base_cand->cand_type;
> + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
> + : TREE_TYPE (addend_in));
> if (has_single_use (base_in))
> savings = (base_cand->dead_savings
> + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
> index = -index;
> stride = subtrahend_cand->base_expr;
> ctype = TREE_TYPE (base_in);
> + stype = subtrahend_cand->cand_type;
> if (has_single_use (addend_in))
> savings = (subtrahend_cand->dead_savings
> + stmt_cost (subtrahend_cand->cand_stmt, speed));
> @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
> index = subtract_p ? -1 : 1;
> stride = addend_in;
> ctype = TREE_TYPE (base_in);
> + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
> + : TREE_TYPE (addend_in));
> }
>
> c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride,
> - ctype, savings);
> + ctype, stype, savings);
> return c;
> }
>
> @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
> {
> enum cand_kind kind = CAND_ADD;
> tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> + tree stype = NULL_TREE;
> widest_int index, multiple;
> unsigned savings = 0;
> slsr_cand_t c;
> @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
> index = base_cand->index + multiple;
> stride = base_cand->stride;
> ctype = base_cand->cand_type;
> + stype = base_cand->stride_type;
> if (has_single_use (base_in))
> savings = (base_cand->dead_savings
> + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con
> index = index_in;
> stride = integer_one_node;
> ctype = TREE_TYPE (base_in);
> + stype = sizetype;
> }
>
> c = alloc_cand_and_find_basis (kind, gs, base, index, stride,
> - ctype, savings);
> + ctype, stype, savings);
> return c;
> }
>
> @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee
> for more details. */
>
> static bool
> -legal_cast_p_1 (tree lhs, tree rhs)
> +legal_cast_p_1 (tree lhs_type, tree rhs_type)
> {
> - tree lhs_type, rhs_type;
> unsigned lhs_size, rhs_size;
> bool lhs_wraps, rhs_wraps;
>
> - lhs_type = TREE_TYPE (lhs);
> - rhs_type = TREE_TYPE (rhs);
> lhs_size = TYPE_PRECISION (lhs_type);
> rhs_size = TYPE_PRECISION (rhs_type);
> lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type);
> @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs)
> || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)))
> return false;
>
> - return legal_cast_p_1 (gimple_assign_lhs (gs), rhs);
> + return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs));
> }
>
> /* Given GS which is a cast to a scalar integer type, determine whether
> @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
> c = alloc_cand_and_find_basis (base_cand->kind, gs,
> base_cand->base_expr,
> base_cand->index, base_cand->stride,
> - ctype, savings);
> + ctype, base_cand->stride_type,
> + savings);
> if (base_cand->next_interp)
> base_cand = lookup_cand (base_cand->next_interp);
> else
> @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
> The first of these is somewhat arbitrary, but the choice of
> 1 for the stride simplifies the logic for propagating casts
> into their uses. */
> - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
> - 0, integer_one_node, ctype, 0);
> - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
> - 0, integer_one_node, ctype, 0);
> + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
> + integer_one_node, ctype, sizetype, 0);
> + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
> + integer_one_node, ctype, sizetype, 0);
> c->next_interp = c2->cand_num;
> }
>
> @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
> c = alloc_cand_and_find_basis (base_cand->kind, gs,
> base_cand->base_expr,
> base_cand->index, base_cand->stride,
> - base_cand->cand_type, savings);
> + base_cand->cand_type,
> + base_cand->stride_type, savings);
> if (base_cand->next_interp)
> base_cand = lookup_cand (base_cand->next_interp);
> else
> @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
> The first of these is somewhat arbitrary, but the choice of
> 1 for the stride simplifies the logic for propagating casts
> into their uses. */
> - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
> - 0, integer_one_node, TREE_TYPE (rhs1), 0);
> - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
> - 0, integer_one_node, TREE_TYPE (rhs1), 0);
> + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
> + integer_one_node, TREE_TYPE (rhs1),
> + sizetype, 0);
> + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
> + integer_one_node, TREE_TYPE (rhs1),
> + sizetype, 0);
> c->next_interp = c2->cand_num;
> }
>
> @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c)
> fputs (" + ", dump_file);
> print_decs (c->index, dump_file);
> fputs (") * ", dump_file);
> + if (TREE_CODE (c->stride) != INTEGER_CST
> + && c->stride_type != TREE_TYPE (c->stride))
> + {
> + fputs ("(", dump_file);
> + print_generic_expr (dump_file, c->stride_type, 0);
> + fputs (")", dump_file);
> + }
> print_generic_expr (dump_file, c->stride, 0);
> fputs (" : ", dump_file);
> break;
> @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c)
> fputs (" + (", dump_file);
> print_decs (c->index, dump_file);
> fputs (" * ", dump_file);
> + if (TREE_CODE (c->stride) != INTEGER_CST
> + && c->stride_type != TREE_TYPE (c->stride))
> + {
> + fputs ("(", dump_file);
> + print_generic_expr (dump_file, c->stride_type, 0);
> + fputs (")", dump_file);
> + }
> print_generic_expr (dump_file, c->stride, 0);
> fputs (") : ", dump_file);
> break;
> @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
> basic_block insert_bb;
> gimple_stmt_iterator gsi;
> tree lhs, basis_type;
> - gassign *new_stmt;
> + gassign *new_stmt, *cast_stmt = NULL;
>
> /* If the add candidate along this incoming edge has the same
> index as C's hidden basis, the hidden basis represents this
> @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
> new_stmt = gimple_build_assign (lhs, code, basis_name,
> incr_vec[i].initializer);
> }
> - else if (increment == 1)
> - new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
> - else if (increment == -1)
> - new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
> - c->stride);
> - else
> - gcc_unreachable ();
> + else {
> + tree stride;
> +
> + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
> + {
> + tree cast_stride = make_temp_ssa_name (c->stride_type, NULL,
> + "slsr");
> + cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR,
> + c->stride);
> + stride = cast_stride;
> + }
> + else
> + stride = c->stride;
> +
> + if (increment == 1)
> + new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride);
> + else if (increment == -1)
> + new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride);
> + else
> + gcc_unreachable ();
> + }
> }
>
> insert_bb = single_succ_p (e->src) ? e->src : split_edge (e);
> @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
> gsi = gsi_last_bb (insert_bb);
>
> if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
> - gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
> + {
> + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> + if (cast_stmt)
> + {
> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> + gimple_set_location (cast_stmt, loc);
> + }
> + }
> else
> - gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> + {
> + if (cast_stmt)
> + {
> + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
> + gimple_set_location (cast_stmt, loc);
> + }
> + gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> + }
>
> gimple_set_location (new_stmt, loc);
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> + if (cast_stmt)
> + {
> + fprintf (dump_file, "Inserting cast in block %d: ",
> + insert_bb->index);
> + print_gimple_stmt (dump_file, cast_stmt, 0, 0);
> + }
> fprintf (dump_file, "Inserting in block %d: ", insert_bb->index);
> print_gimple_stmt (dump_file, new_stmt, 0, 0);
> }
> @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine
> && !POINTER_TYPE_P (first_dep->cand_type)))
> incr_vec[i].cost = COST_NEUTRAL;
>
> - /* FORNOW: If we need to add an initializer, give up if a cast from
> - the candidate's type to its stride's type can lose precision.
> - This could eventually be handled better by expressly retaining the
> - result of a cast to a wider type in the stride. Example:
> + /* If we need to add an initializer, give up if a cast from the
> + candidate's type to its stride's type can lose precision.
> + Note that this already takes into account that the stride may
> + have been cast to a wider type, in which case this test won't
> + fire. Example:
>
> short int _1;
> _2 = (int) _1;
> _3 = _2 * 10;
> - _4 = x + _3; ADD: x + (10 * _1) : int
> + _4 = x + _3; ADD: x + (10 * (int)_1) : int
> _5 = _2 * 15;
> - _6 = x + _3; ADD: x + (15 * _1) : int
> + _6 = x + _5; ADD: x + (15 * (int)_1) : int
>
> - Right now replacing _6 would cause insertion of an initializer
> - of the form "short int T = _1 * 5;" followed by a cast to
> - int, which could overflow incorrectly. Had we recorded _2 or
> - (int)_1 as the stride, this wouldn't happen. However, doing
> - this breaks other opportunities, so this will require some
> - care. */
> + Although the stride was a short int initially, the stride
> + used in the analysis has been widened to an int, and such
> + widening will be done in the initializer as well. */
> else if (!incr_vec[i].initializer
> && TREE_CODE (first_dep->stride) != INTEGER_CST
> - && !legal_cast_p_1 (first_dep->stride,
> - gimple_assign_lhs (first_dep->cand_stmt)))
> -
> + && !legal_cast_p_1 (first_dep->stride_type,
> + TREE_TYPE (gimple_assign_lhs
> + (first_dep->cand_stmt))))
> incr_vec[i].cost = COST_INFINITE;
>
> /* If we need to add an initializer, make sure we don't introduce
> a multiply by a pointer type, which can happen in certain cast
> - scenarios. FIXME: When cleaning up these cast issues, we can
> - afford to introduce the multiply provided we cast out to an
> - unsigned int of appropriate size. */
> + scenarios. */
> else if (!incr_vec[i].initializer
> && TREE_CODE (first_dep->stride) != INTEGER_CST
> - && POINTER_TYPE_P (TREE_TYPE (first_dep->stride)))
> -
> + && POINTER_TYPE_P (first_dep->stride_type))
> incr_vec[i].cost = COST_INFINITE;
>
> /* For any other increment, if this is a multiply candidate, we
> @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c)
> basic_block bb;
> slsr_cand_t where = NULL;
> gassign *init_stmt;
> - tree stride_type, new_name, incr_tree;
> + gassign *cast_stmt = NULL;
> + tree new_name, incr_tree, init_stride;
> widest_int incr = incr_vec[i].incr;
>
> if (!profitable_increment_p (i)
> @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c)
> that block, the earliest one will be returned in WHERE. */
> bb = nearest_common_dominator_for_cands (c, incr, &where);
>
> + /* If the nominal stride has a different type than the recorded
> + stride type, build a cast from the nominal stride to that type. */
> + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
> + {
> + init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr");
> + cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride);
> + }
> + else
> + init_stride = c->stride;
> +
> /* Create a new SSA name to hold the initializer's value. */
> - stride_type = TREE_TYPE (c->stride);
> - new_name = make_temp_ssa_name (stride_type, NULL, "slsr");
> + new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr");
> incr_vec[i].initializer = new_name;
>
> /* Create the initializer and insert it in the latest possible
> dominating position. */
> - incr_tree = wide_int_to_tree (stride_type, incr);
> + incr_tree = wide_int_to_tree (c->stride_type, incr);
> init_stmt = gimple_build_assign (new_name, MULT_EXPR,
> - c->stride, incr_tree);
> + init_stride, incr_tree);
> if (where)
> {
> gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
> + location_t loc = gimple_location (where->cand_stmt);
> +
> + if (cast_stmt)
> + {
> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> + gimple_set_location (cast_stmt, loc);
> + }
> +
> gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> - gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
> + gimple_set_location (init_stmt, loc);
> }
> else
> {
> gimple_stmt_iterator gsi = gsi_last_bb (bb);
> gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt;
> + location_t loc = gimple_location (basis_stmt);
>
> if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
> - gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> + {
> + if (cast_stmt)
> + {
> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> + gimple_set_location (cast_stmt, loc);
> + }
> + gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> + }
> else
> - gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
> + {
> + if (cast_stmt)
> + {
> + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
> + gimple_set_location (cast_stmt, loc);
> + }
> + gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
> + }
>
> gimple_set_location (init_stmt, gimple_location (basis_stmt));
> }
> @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c)
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> + if (cast_stmt)
> + {
> + fputs ("Inserting stride cast: ", dump_file);
> + print_gimple_stmt (dump_file, cast_stmt, 0, 0);
> + }
> fputs ("Inserting initializer: ", dump_file);
> print_gimple_stmt (dump_file, init_stmt, 0, 0);
> }
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (revision 241379)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (working copy)
> @@ -1,48 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-slsr-details" } */
> -
> -#include <stdio.h>
> -
> -#define W1 22725
> -#define W2 21407
> -#define W3 19266
> -#define W6 8867
> -
> -void idct_row(short *row, int *dst)
> -{
> - int a0, a1, b0, b1;
> -
> - a0 = W1 * row[0];
> - a1 = a0;
> -
> - a0 += W2 * row[2];
> - a1 += W6 * row[2];
> -
> - b0 = W1 * row[1];
> - b1 = W3 * row[1];
> -
> - dst[0] = a0 + b0;
> - dst[1] = a0 - b0;
> - dst[2] = a1 + b1;
> - dst[3] = a1 - b1;
> -}
> -
> -static short block[8] = { 1, 2, 3, 4 };
> -
> -int main(void)
> -{
> - int out[4];
> - int i;
> -
> - idct_row(block, out);
> -
> - for (i = 0; i < 4; i++)
> - printf("%d\n", out[i]);
> -
> - return !(out[2] == 87858 && out[3] == 10794);
> -}
> -
> -/* For now, disable inserting an initializer when the multiplication will
> - take place in a smaller type than originally. This test may be deleted
> - in future when this case is handled more precisely. */
> -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (revision 241379)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (working copy)
> @@ -17,7 +17,12 @@ f (int s, int *c)
> return x1 ? x2 : x3;
> }
>
> +/* Note that since some branch prediction heuristics changed, the
> + calculations of x2 and x3 are pushed downward into the legs
> + of the conditional, changing the code presented to SLSR.
> + However, this proves to be a useful test for introducing an
> + initializer with a cast, so we'll keep it as is. */
> +
> /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
> - added), 1 parm, 2 in the code. The second one in the code can be
> - a widening mult. */
> -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
> + added), 2 parms, 3 in the code. */
> +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */
>
>