This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, RFC] Fix PR71915, PR71490: Handle casts on strides consistently


On 24 October 2016 at 12:55, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
>> Hi,
>>
>> I've been meaning for some time now to do a better job handling strength
>> reduction candidates where the stride of the candidate and its basis
>> involves a cast (usually widening) from another type.  The two PRs
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
>> strength reduction opportunities in such cases.  Essentially the way I
>> was handling this before required a cast to go back to the original
>> type, which was illegal when this was a narrowing operation or a cast
>> from a wrapping type to a non-wrapping type.
>>
>> This patch improves matters by tracking the effective type of the stride
>> so that we perform arithmetic in that type directly.  This allows us to
>> strength-reduce some cases missed before.  We still have to be careful
>> not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
>> cases don't come up nearly so often in practice with this patch.
>>
>> The patch is pretty straightforward but rather large, so I'd like to
>> leave it up for review for a week or so before committing it -- extra
>> eyes welcome!
>>
>> There is an existing test case that checks for the narrowing problem,
>> and now fails with this patch as it should.  That is, it's now legal to
>> insert the initializer that we formerly knew to be bad for this test.
>> Since the test no longer serves any purpose, I'm deleting it.
>>
>> gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
>> than when I first added it to the test suite.  As a result of various
>> optimization changes, it's no longer maintained as a single block;
>> instead, the optimizable computations are sunk into two legs of a
>> conditional.  This exposed another similar case, leading to the bug
>> reports.  This test now passes, but I had to adjust it for the new code
>> generation we get.  I also added some commentary to indicate what we
>> expect to happen in that test, since it isn't too obvious.
>>
>> I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
>> with no regressions.  To avoid the Wrath of Octoploid ;), I've also
>> tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
>> -march=amdfam10, also with no failures.  I've also verified that we hit
>> this code about 35 times in the test suite, so it looks like we have
>> some decent additional test coverage.
>>
>> Thanks in advance to anyone willing to take a look.
>
> LGTM.
>

Hi,

Since this was committed, I'm seeing a failure in
  gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9
on aarch64 targets.

Christophe


> Richard.
>
>> Bill
>>
>>
>> [gcc]
>>
>> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         PR tree-optimization/71915
>>         * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
>>         stride_type field.
>>         (find_basis_for_base_expr): Require stride types to match when
>>         seeking a basis.
>>         (alloc_cand_and_find_basis): Record the stride type.
>>         (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
>>         (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
>>         the expressions having those types.
>>         (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
>>         (create_mul_ssa_cand): Likewise.
>>         (create_mul_imm_cand): Likewise.
>>         (create_add_ssa_cand): Likewise.
>>         (create_add_imm_cand): Likewise.
>>         (legal_cast_p_1): Change interface to accept types rather than the
>>         expressions having those types.
>>         (legal_cast_p): Pass types to legal_cast_p_1.
>>         (slsr_process_cast): Pass stride type to
>>         alloc_cand_and_find_basis.
>>         (slsr_process_copy): Likewise.
>>         (dump_candidate): Display stride type when a cast exists.
>>         (create_add_on_incoming_edge): Introduce a cast when necessary for
>>         the stride type.
>>         (analyze_increments): Change the code checking for invalid casts
>>         to rely on the stride type, and update the documentation and
>>         example.  Change the code checking for pointer multiplies to rely
>>         on the stride type.
>>         (insert_initializers): Introduce a cast when necessary for the
>>         stride type.  Use the stride type for the type of the initializer.
>>
>> [gcc/testsuite]
>>
>> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         PR tree-optimization/71915
>>         * gcc.dg/tree-ssa/pr54245.c: Delete.
>>         * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
>>         document why.
>>
>>
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===================================================================
>> --- gcc/gimple-ssa-strength-reduction.c (revision 241379)
>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>> @@ -246,6 +246,13 @@ struct slsr_cand_d
>>       replacement MEM_REF.)  */
>>    tree cand_type;
>>
>> +  /* The type to be used to interpret the stride field when the stride
>> +     is not a constant.  Normally the same as the type of the recorded
>> +     stride, but when the stride has been cast we need to maintain that
>> +     knowledge in order to make legal substitutions without losing
>> +     precision.  When the stride is a constant, this will be sizetype.  */
>> +  tree stride_type;
>> +
>>    /* The kind of candidate (CAND_MULT, etc.).  */
>>    enum cand_kind kind;
>>
>> @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base
>>           || one_basis->cand_stmt == c->cand_stmt
>>           || !operand_equal_p (one_basis->stride, c->stride, 0)
>>           || !types_compatible_p (one_basis->cand_type, c->cand_type)
>> +         || !types_compatible_p (one_basis->stride_type, c->stride_type)
>>           || !dominated_by_p (CDI_DOMINATORS,
>>                               gimple_bb (c->cand_stmt),
>>                               gimple_bb (one_basis->cand_stmt)))
>> @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base)
>>  static slsr_cand_t
>>  alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base,
>>                            const widest_int &index, tree stride, tree ctype,
>> -                          unsigned savings)
>> +                          tree stype, unsigned savings)
>>  {
>>    slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack,
>>                                                sizeof (slsr_cand));
>> @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi
>>    c->stride = stride;
>>    c->index = index;
>>    c->cand_type = ctype;
>> +  c->stride_type = stype;
>>    c->kind = kind;
>>    c->cand_num = cand_vec.length () + 1;
>>    c->next_interp = 0;
>> @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed)
>>    base_type = TREE_TYPE (arg0_base);
>>
>>    c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base,
>> -                                0, integer_one_node, base_type, savings);
>> +                                0, integer_one_node, base_type,
>> +                                sizetype, savings);
>>
>>    /* Add the candidate to the statement-candidate mapping.  */
>>    add_cand_for_stmt (phi, c);
>> @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase)
>>       e.g. 'B' is widened from an 'int' in order to calculate
>>       a 64-bit address.  */
>>    if (CONVERT_EXPR_P (base_in)
>> -      && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
>> +      && legal_cast_p_1 (TREE_TYPE (base_in),
>> +                        TREE_TYPE (TREE_OPERAND (base_in, 0))))
>>      base_in = get_unwidened (base_in, NULL_TREE);
>>
>>    if (TREE_CODE (base_in) != SSA_NAME)
>> @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs)
>>      return;
>>
>>    c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset,
>> -                                type, 0);
>> +                                type, sizetype, 0);
>>
>>    /* Add the candidate to the statement-candidate mapping.  */
>>    add_cand_for_stmt (gs, c);
>> @@ -1010,6 +1021,7 @@ static slsr_cand_t
>>  create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed)
>>  {
>>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>           index = base_cand->index;
>>           stride = stride_in;
>>           ctype = base_cand->cand_type;
>> +         stype = TREE_TYPE (stride_in);
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>           index = base_cand->index * wi::to_widest (base_cand->stride);
>>           stride = stride_in;
>>           ctype = base_cand->cand_type;
>> +         stype = TREE_TYPE (stride_in);
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>        index = 0;
>>        stride = stride_in;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = TREE_TYPE (stride_in);
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, sizetype, savings);
>>    return c;
>>  }
>>
>> @@ -1212,7 +1227,8 @@ static slsr_cand_t
>>  create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in,
>>                      bool subtract_p, bool speed)
>>  {
>> -  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL;
>> +  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>             index = -index;
>>           stride = addend_cand->base_expr;
>>           ctype = TREE_TYPE (base_in);
>> +         stype = addend_cand->cand_type;
>>           if (has_single_use (addend_in))
>>             savings = (addend_cand->dead_savings
>>                        + stmt_cost (addend_cand->cand_stmt, speed));
>> @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>           index = subtract_p ? -1 : 1;
>>           stride = addend_in;
>>           ctype = base_cand->cand_type;
>> +         stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
>> +                  : TREE_TYPE (addend_in));
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>                   index = -index;
>>                   stride = subtrahend_cand->base_expr;
>>                   ctype = TREE_TYPE (base_in);
>> +                 stype = subtrahend_cand->cand_type;
>>                   if (has_single_use (addend_in))
>>                     savings = (subtrahend_cand->dead_savings
>>                                + stmt_cost (subtrahend_cand->cand_stmt, speed));
>> @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>        index = subtract_p ? -1 : 1;
>>        stride = addend_in;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
>> +              : TREE_TYPE (addend_in));
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>  {
>>    enum cand_kind kind = CAND_ADD;
>>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index, multiple;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>           index = base_cand->index + multiple;
>>           stride = base_cand->stride;
>>           ctype = base_cand->cand_type;
>> +         stype = base_cand->stride_type;
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>        index = index_in;
>>        stride = integer_one_node;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = sizetype;
>>      }
>>
>>    c = alloc_cand_and_find_basis (kind, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee
>>     for more details.  */
>>
>>  static bool
>> -legal_cast_p_1 (tree lhs, tree rhs)
>> +legal_cast_p_1 (tree lhs_type, tree rhs_type)
>>  {
>> -  tree lhs_type, rhs_type;
>>    unsigned lhs_size, rhs_size;
>>    bool lhs_wraps, rhs_wraps;
>>
>> -  lhs_type = TREE_TYPE (lhs);
>> -  rhs_type = TREE_TYPE (rhs);
>>    lhs_size = TYPE_PRECISION (lhs_type);
>>    rhs_size = TYPE_PRECISION (rhs_type);
>>    lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type);
>> @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs)
>>        || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)))
>>      return false;
>>
>> -  return legal_cast_p_1 (gimple_assign_lhs (gs), rhs);
>> +  return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs));
>>  }
>>
>>  /* Given GS which is a cast to a scalar integer type, determine whether
>> @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>>                                          base_cand->base_expr,
>>                                          base_cand->index, base_cand->stride,
>> -                                        ctype, savings);
>> +                                        ctype, base_cand->stride_type,
>> +                                        savings);
>>           if (base_cand->next_interp)
>>             base_cand = lookup_cand (base_cand->next_interp);
>>           else
>> @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>>          The first of these is somewhat arbitrary, but the choice of
>>          1 for the stride simplifies the logic for propagating casts
>>          into their uses.  */
>> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
>> -                                    0, integer_one_node, ctype, 0);
>> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
>> -                                     0, integer_one_node, ctype, 0);
>> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
>> +                                    integer_one_node, ctype, sizetype, 0);
>> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
>> +                                     integer_one_node, ctype, sizetype, 0);
>>        c->next_interp = c2->cand_num;
>>      }
>>
>> @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>>                                          base_cand->base_expr,
>>                                          base_cand->index, base_cand->stride,
>> -                                        base_cand->cand_type, savings);
>> +                                        base_cand->cand_type,
>> +                                        base_cand->stride_type, savings);
>>           if (base_cand->next_interp)
>>             base_cand = lookup_cand (base_cand->next_interp);
>>           else
>> @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>>          The first of these is somewhat arbitrary, but the choice of
>>          1 for the stride simplifies the logic for propagating casts
>>          into their uses.  */
>> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
>> -                                    0, integer_one_node, TREE_TYPE (rhs1), 0);
>> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
>> -                                     0, integer_one_node, TREE_TYPE (rhs1), 0);
>> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
>> +                                    integer_one_node, TREE_TYPE (rhs1),
>> +                                    sizetype, 0);
>> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
>> +                                     integer_one_node, TREE_TYPE (rhs1),
>> +                                     sizetype, 0);
>>        c->next_interp = c2->cand_num;
>>      }
>>
>> @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c)
>>        fputs (" + ", dump_file);
>>        print_decs (c->index, dump_file);
>>        fputs (") * ", dump_file);
>> +      if (TREE_CODE (c->stride) != INTEGER_CST
>> +         && c->stride_type != TREE_TYPE (c->stride))
>> +       {
>> +         fputs ("(", dump_file);
>> +         print_generic_expr (dump_file, c->stride_type, 0);
>> +         fputs (")", dump_file);
>> +       }
>>        print_generic_expr (dump_file, c->stride, 0);
>>        fputs (" : ", dump_file);
>>        break;
>> @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c)
>>        fputs (" + (", dump_file);
>>        print_decs (c->index, dump_file);
>>        fputs (" * ", dump_file);
>> +      if (TREE_CODE (c->stride) != INTEGER_CST
>> +         && c->stride_type != TREE_TYPE (c->stride))
>> +       {
>> +         fputs ("(", dump_file);
>> +         print_generic_expr (dump_file, c->stride_type, 0);
>> +         fputs (")", dump_file);
>> +       }
>>        print_generic_expr (dump_file, c->stride, 0);
>>        fputs (") : ", dump_file);
>>        break;
>> @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>    basic_block insert_bb;
>>    gimple_stmt_iterator gsi;
>>    tree lhs, basis_type;
>> -  gassign *new_stmt;
>> +  gassign *new_stmt, *cast_stmt = NULL;
>>
>>    /* If the add candidate along this incoming edge has the same
>>       index as C's hidden basis, the hidden basis represents this
>> @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>           new_stmt = gimple_build_assign (lhs, code, basis_name,
>>                                           incr_vec[i].initializer);
>>         }
>> -      else if (increment == 1)
>> -       new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
>> -      else if (increment == -1)
>> -       new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
>> -                                       c->stride);
>> -      else
>> -       gcc_unreachable ();
>> +      else {
>> +       tree stride;
>> +
>> +       if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
>> +         {
>> +           tree cast_stride = make_temp_ssa_name (c->stride_type, NULL,
>> +                                                  "slsr");
>> +           cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR,
>> +                                            c->stride);
>> +           stride = cast_stride;
>> +         }
>> +       else
>> +         stride = c->stride;
>> +
>> +       if (increment == 1)
>> +         new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride);
>> +       else if (increment == -1)
>> +         new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride);
>> +       else
>> +         gcc_unreachable ();
>> +      }
>>      }
>>
>>    insert_bb = single_succ_p (e->src) ? e->src : split_edge (e);
>> @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>    gsi = gsi_last_bb (insert_bb);
>>
>>    if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
>> -    gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
>> +    {
>> +      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>> +      if (cast_stmt)
>> +       {
>> +         gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +         gimple_set_location (cast_stmt, loc);
>> +       }
>> +    }
>>    else
>> -    gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
>> +    {
>> +      if (cast_stmt)
>> +       {
>> +         gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
>> +         gimple_set_location (cast_stmt, loc);
>> +       }
>> +      gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
>> +    }
>>
>>    gimple_set_location (new_stmt, loc);
>>
>>    if (dump_file && (dump_flags & TDF_DETAILS))
>>      {
>> +      if (cast_stmt)
>> +       {
>> +         fprintf (dump_file, "Inserting cast in block %d: ",
>> +                  insert_bb->index);
>> +         print_gimple_stmt (dump_file, cast_stmt, 0, 0);
>> +       }
>>        fprintf (dump_file, "Inserting in block %d: ", insert_bb->index);
>>        print_gimple_stmt (dump_file, new_stmt, 0, 0);
>>      }
>> @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine
>>                    && !POINTER_TYPE_P (first_dep->cand_type)))
>>         incr_vec[i].cost = COST_NEUTRAL;
>>
>> -      /* FORNOW: If we need to add an initializer, give up if a cast from
>> -        the candidate's type to its stride's type can lose precision.
>> -        This could eventually be handled better by expressly retaining the
>> -        result of a cast to a wider type in the stride.  Example:
>> +      /* If we need to add an initializer, give up if a cast from the
>> +        candidate's type to its stride's type can lose precision.
>> +        Note that this already takes into account that the stride may
>> +        have been cast to a wider type, in which case this test won't
>> +        fire.  Example:
>>
>>             short int _1;
>>            _2 = (int) _1;
>>            _3 = _2 * 10;
>> -          _4 = x + _3;    ADD: x + (10 * _1) : int
>> +          _4 = x + _3;    ADD: x + (10 * (int)_1) : int
>>            _5 = _2 * 15;
>> -          _6 = x + _3;    ADD: x + (15 * _1) : int
>> +          _6 = x + _5;    ADD: x + (15 * (int)_1) : int
>>
>> -         Right now replacing _6 would cause insertion of an initializer
>> -        of the form "short int T = _1 * 5;" followed by a cast to
>> -        int, which could overflow incorrectly.  Had we recorded _2 or
>> -        (int)_1 as the stride, this wouldn't happen.  However, doing
>> -         this breaks other opportunities, so this will require some
>> -        care.  */
>> +        Although the stride was a short int initially, the stride
>> +        used in the analysis has been widened to an int, and such
>> +        widening will be done in the initializer as well.  */
>>        else if (!incr_vec[i].initializer
>>                && TREE_CODE (first_dep->stride) != INTEGER_CST
>> -              && !legal_cast_p_1 (first_dep->stride,
>> -                                  gimple_assign_lhs (first_dep->cand_stmt)))
>> -
>> +              && !legal_cast_p_1 (first_dep->stride_type,
>> +                                  TREE_TYPE (gimple_assign_lhs
>> +                                             (first_dep->cand_stmt))))
>>         incr_vec[i].cost = COST_INFINITE;
>>
>>        /* If we need to add an initializer, make sure we don't introduce
>>          a multiply by a pointer type, which can happen in certain cast
>> -        scenarios.  FIXME: When cleaning up these cast issues, we can
>> -         afford to introduce the multiply provided we cast out to an
>> -         unsigned int of appropriate size.  */
>> +        scenarios.  */
>>        else if (!incr_vec[i].initializer
>>                && TREE_CODE (first_dep->stride) != INTEGER_CST
>> -              && POINTER_TYPE_P (TREE_TYPE (first_dep->stride)))
>> -
>> +              && POINTER_TYPE_P (first_dep->stride_type))
>>         incr_vec[i].cost = COST_INFINITE;
>>
>>        /* For any other increment, if this is a multiply candidate, we
>> @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c)
>>        basic_block bb;
>>        slsr_cand_t where = NULL;
>>        gassign *init_stmt;
>> -      tree stride_type, new_name, incr_tree;
>> +      gassign *cast_stmt = NULL;
>> +      tree new_name, incr_tree, init_stride;
>>        widest_int incr = incr_vec[i].incr;
>>
>>        if (!profitable_increment_p (i)
>> @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c)
>>          that block, the earliest one will be returned in WHERE.  */
>>        bb = nearest_common_dominator_for_cands (c, incr, &where);
>>
>> +      /* If the nominal stride has a different type than the recorded
>> +        stride type, build a cast from the nominal stride to that type.  */
>> +      if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
>> +       {
>> +         init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr");
>> +         cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride);
>> +       }
>> +      else
>> +       init_stride = c->stride;
>> +
>>        /* Create a new SSA name to hold the initializer's value.  */
>> -      stride_type = TREE_TYPE (c->stride);
>> -      new_name = make_temp_ssa_name (stride_type, NULL, "slsr");
>> +      new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr");
>>        incr_vec[i].initializer = new_name;
>>
>>        /* Create the initializer and insert it in the latest possible
>>          dominating position.  */
>> -      incr_tree = wide_int_to_tree (stride_type, incr);
>> +      incr_tree = wide_int_to_tree (c->stride_type, incr);
>>        init_stmt = gimple_build_assign (new_name, MULT_EXPR,
>> -                                      c->stride, incr_tree);
>> +                                      init_stride, incr_tree);
>>        if (where)
>>         {
>>           gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
>> +         location_t loc = gimple_location (where->cand_stmt);
>> +
>> +         if (cast_stmt)
>> +           {
>> +             gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +             gimple_set_location (cast_stmt, loc);
>> +           }
>> +
>>           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> -         gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
>> +         gimple_set_location (init_stmt, loc);
>>         }
>>        else
>>         {
>>           gimple_stmt_iterator gsi = gsi_last_bb (bb);
>>           gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt;
>> +         location_t loc = gimple_location (basis_stmt);
>>
>>           if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
>> -           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> +           {
>> +             if (cast_stmt)
>> +               {
>> +                 gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +                 gimple_set_location (cast_stmt, loc);
>> +               }
>> +             gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> +           }
>>           else
>> -           gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
>> +           {
>> +             if (cast_stmt)
>> +               {
>> +                 gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
>> +                 gimple_set_location (cast_stmt, loc);
>> +               }
>> +             gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
>> +           }
>>
>>           gimple_set_location (init_stmt, gimple_location (basis_stmt));
>>         }
>> @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c)
>>
>>        if (dump_file && (dump_flags & TDF_DETAILS))
>>         {
>> +         if (cast_stmt)
>> +           {
>> +             fputs ("Inserting stride cast: ", dump_file);
>> +             print_gimple_stmt (dump_file, cast_stmt, 0, 0);
>> +           }
>>           fputs ("Inserting initializer: ", dump_file);
>>           print_gimple_stmt (dump_file, init_stmt, 0, 0);
>>         }
>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (revision 241379)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (working copy)
>> @@ -1,48 +0,0 @@
>> -/* { dg-do compile } */
>> -/* { dg-options "-O1 -fdump-tree-slsr-details" } */
>> -
>> -#include <stdio.h>
>> -
>> -#define W1  22725
>> -#define W2  21407
>> -#define W3  19266
>> -#define W6  8867
>> -
>> -void idct_row(short *row, int *dst)
>> -{
>> -    int a0, a1, b0, b1;
>> -
>> -    a0 = W1 * row[0];
>> -    a1 = a0;
>> -
>> -    a0 += W2 * row[2];
>> -    a1 += W6 * row[2];
>> -
>> -    b0 = W1 * row[1];
>> -    b1 = W3 * row[1];
>> -
>> -    dst[0] = a0 + b0;
>> -    dst[1] = a0 - b0;
>> -    dst[2] = a1 + b1;
>> -    dst[3] = a1 - b1;
>> -}
>> -
>> -static short block[8] = { 1, 2, 3, 4 };
>> -
>> -int main(void)
>> -{
>> -    int out[4];
>> -    int i;
>> -
>> -    idct_row(block, out);
>> -
>> -    for (i = 0; i < 4; i++)
>> -        printf("%d\n", out[i]);
>> -
>> -    return !(out[2] == 87858 && out[3] == 10794);
>> -}
>> -
>> -/* For now, disable inserting an initializer when the multiplication will
>> -   take place in a smaller type than originally.  This test may be deleted
>> -   in future when this case is handled more precisely.  */
>> -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */
>> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (revision 241379)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (working copy)
>> @@ -17,7 +17,12 @@ f (int s, int *c)
>>    return x1 ? x2 : x3;
>>  }
>>
>> +/* Note that since some branch prediction heuristics changed, the
>> +   calculations of x2 and x3 are pushed downward into the legs
>> +   of the conditional, changing the code presented to SLSR.
>> +   However, this proves to be a useful test for introducing an
>> +   initializer with a cast, so we'll keep it as is.  */
>> +
>>  /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
>> -   added), 1 parm, 2 in the code.  The second one in the code can be
>> -   a widening mult.  */
>> -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
>> +   added), 2 parms, 3 in the code.  */
>> +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */
>>
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]