This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.


Hi Vlad,

Thanks for the patch.

Vlad Lazar <vlad.lazar@arm.com> writes:
> Hi.
>
> This patch optimises the choice of immediates in integer comparisons. Integer
> comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
> and there are cases where an incremented/decremented immediate can be loaded into a
> register in fewer instructions. The cases are as follows:
>    
> i)   a >  b or a >= b + 1
> ii)  a <= b or a <  b + 1
> iii) a >= b or a >  b - 1
> iv)  a <  b or a <= b - 1
>
> For each comparison we check whether the equivalent can be performed in less instructions.
> This is done on a statement by statement basis, right before the GIMPLE statement is expanded
> to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook.
> The change is general and it applies to any integer comparison, regardless of types or location.
>
> For example, on AArch64 for the following code:
>
> int foo (int x)
> {
>    return x > 0xfefffffe;
> }
>
> it generates:
>
> mov	w1, -16777217
> cmp	w0, w1
> cset	w0, cs
>
> instead of:
>
> mov	w1, 65534
> movk	w1, 0xfeff, lsl 16
> cmp	w0, w1
> cset	w0, hi
>
> Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.

Looks like a useful feature.  I'm playing devil's advocate to some
extent here, but did you consider instead doing this during the
expansion functions themselves?  In some ways that feels more
natural because we're already operating on rtxes at that point.
It also has the advantage that it would trap comparisons that didn't
exist at the gimple level, such as when computing the carry for a
doubleword add/sub.

I think the two main functions that would need to change are:

- prepare_cmp_insn
- emit_store_flag_1

We could have something like:

  void canonicalize_comparison_for_target (machine_mode, rtx_code *, rtx *);

which changes the comparison code and second operand as in your patch.
In the case of emit_store_flag_1, it should happen after:

  if (swap_commutative_operands_p (op0, op1))
    {
      std::swap (op0, op1);
      code = swap_condition (code);
    }

(In the case of prepare_cmp_insn, the callers have already done this.)

Note that the patch as its stands won't handle comparisons nested
inside the first operand of a COND_EXPR gassign (which unfortunately
are still a thing).  E.g.:

  _1 = _2 > 0xfefffffe ? _3 : _4;

FWIW, I agree this is a target-independent technique that should be
done in target-independent code, rather than something that should
be duplicated in each individual target that benefits from it.

Thanks,
Richard

>
> Thanks,
> Vlad
>
> gcc/testsuite/
> Changelog for gcc/testsuite/Changelog
> 2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>
>
> 	* gcc.target/aarch64/imm_choice_comparison.c: New.
>
> gcc/
> Changelog for gcc/Changelog
> 2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>
>
> 	* cfgexpand.c (optimize_immediate_choice): New.
> 	(can_optimize_immediate_p): Likewise.
> 	(validate_immediate_optimization): Likewise.
> 	(update_immediate): Likewise.
> 	(immediate_optimized_code): Likewise.
>
> ---
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 9b91279282e1c6956c8b3699f13036c401ea1dcd..5b0a2e0cdb23f649336844506c8241428b3e6e7d 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5423,6 +5423,157 @@ reorder_operands (basic_block bb)
>     XDELETE (lattice);
>   }
>   
> +/* Helper function for update_immediate.  Returns the adjusted conditional
> +   code for the immediate choice optimization.  See optimize_immediate_choice
> +   for cases.  */
> +
> +static enum tree_code
> +immediate_optimized_code (enum tree_code code)
> +{
> +  switch (code)
> +    {
> +    case GT_EXPR:
> +      return GE_EXPR;
> +    case GE_EXPR:
> +      return GT_EXPR;
> +    case LT_EXPR:
> +      return LE_EXPR;
> +    case LE_EXPR:
> +      return LT_EXPR;
> +    default:
> +      return code;
> +    }
> +}
> +
> +/* Helper function for optimize_immediate_choice.  It updates the immediate
> +   and the subcode of the gimple statement.  At the point of calling
> +   the function we know that the optimization leads to fewer instructions.
> +   stmt points to the gimple statement containing the comparison we wish to
> +   optimize and to_add is the amount by which the immediate needs to be
> +   adjusted (1 or -1).  */
> +
> +static void
> +update_immediate (gimple *stmt, int to_add)
> +{
> +  tree inc_dec = to_add == 1 ? build_one_cst (integer_type_node) :
> +			       build_minus_one_cst (integer_type_node);
> +
> +  enum gimple_code code = gimple_code (stmt);
> +  if (code == GIMPLE_COND)
> +    {
> +      gcond *gc = GIMPLE_CHECK2<gcond *> (stmt);
> +      tree rhs = gimple_cond_rhs (gc);
> +
> +      /* Update the statement.  */
> +      tree new_rhs = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs), rhs, inc_dec);
> +      gimple_cond_set_rhs (gc, new_rhs);
> +      gc->subcode = immediate_optimized_code ((enum tree_code) gc->subcode);
> +    }
> +  if (code == GIMPLE_ASSIGN)
> +    {
> +      gassign *ga = GIMPLE_CHECK2<gassign *> (stmt);
> +      tree rhs2 = gimple_assign_rhs2 (ga);
> +
> +      tree new_rhs2 = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs2), rhs2, inc_dec);
> +      gimple_assign_set_rhs2 (ga, new_rhs2);
> +      ga->subcode = immediate_optimized_code ((enum tree_code) ga->subcode);
> +    }
> +}
> +
> +/* Helper function for optimize_immediate_choice.  It checks if the other
> +   possible immediate leads to a lower rtx cost.  to_add is the amount by
> +   which the immediate needs to be adjusted (1 or -1).  The function
> +   returns 0 if there is no improvement and the amount by which the immediate
> +   needs to be adjusted (1 or -1) otherwise.  */
> +
> +static int
> +validate_immediate_optimization (gimple *stmt, int to_add)
> +{
> +  tree tree_imm = gimple_code (stmt) == GIMPLE_COND ? gimple_cond_rhs (stmt)
> +						: gimple_assign_rhs2 (stmt);
> +  const_tree type = TREE_TYPE (tree_imm);
> +  widest_int imm = wi::to_widest (tree_imm);
> +  enum signop sgn = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
> +
> +  /* Check for overflow/underflow in the case of signed values and
> +     wrapping around in the case of unsigned values.  If any occur
> +     cancel the optimization.  */
> +
> +  widest_int max_type = wi::to_widest (TYPE_MAX_VALUE (type));
> +  widest_int min_type = wi::to_widest (TYPE_MIN_VALUE (type));
> +  if ((wi::cmp (imm, max_type, sgn) == 0 && to_add == 1)
> +      || (wi::cmp (imm, min_type, sgn) == 0 && to_add == -1))
> +    return 0;
> +
> +  widest_int imm_modif = wi::add (imm, to_add);
> +
> +  rtx reg = gen_reg_rtx (DImode);
> +  rtx old_imm = GEN_INT (imm.to_shwi ());
> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
> +
> +  rtx_insn *old_rtx = gen_move_insn (reg, old_imm);
> +  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
> +
> +  /* If the change is beneficial we can just propagate to_add further,
> +     otherwise return 0 to cancel the optimization.  */
> +  return insn_cost (old_rtx, true) > insn_cost (new_rtx, true) ? to_add : 0;
> +}
> +
> +/* Helper function for optimize_immediate_choice.  Checks if the gimple
> +   statement has the right shape for the optimization.  */
> +
> +static bool
> +can_optimize_immediate_p (const gimple *stmt)
> +{
> +  enum gimple_code code = gimple_code (stmt);
> +  if (code != GIMPLE_ASSIGN && code != GIMPLE_COND)
> +    return false;
> +
> +  if (code == GIMPLE_ASSIGN
> +      && (gimple_num_ops (stmt) != 3
> +	  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST))
> +    return false;
> +  if (code == GIMPLE_COND && TREE_CODE (gimple_cond_rhs (stmt)) != INTEGER_CST)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Entry point for immediate choice optimization.  It aims to choose
> +   the simpler immediate in integer comparisons.  The purpose of this
> +   is to end up with an immediate which can be loaded into a register
> +   in fewer moves, if possible.
> +
> +   For each integer comparison there exists an equivalent choice:
> +     i)   a >  b or a >= b + 1
> +     ii)  a <= b or a <  b + 1
> +     iii) a >= b or a >  b - 1
> +     iv)  a <  b or a <= b - 1
> +
> +   Comparisons can only appear on the rhs of a GIMPLE_ASSIGN
> +   or in a GIMPLE_COND.  */
> +
> +static void
> +optimize_immediate_choice (gimple *stmt)
> +{
> +  if (!can_optimize_immediate_p (stmt))
> +    return;
> +
> +  /* Check if the other immediate available is preferable.  */
> +  int to_add = 0;
> +  if (stmt->subcode == GT_EXPR || stmt->subcode == LE_EXPR)
> +    to_add = validate_immediate_optimization (stmt, 1);
> +
> +  if (stmt->subcode == GE_EXPR || stmt->subcode == LT_EXPR)
> +    to_add = validate_immediate_optimization (stmt, -1);
> +
> +  if (!to_add)
> +    return;
> +
> +  /* Update the current statement.  */
> +  update_immediate (stmt, to_add);
> +}
> +
>   /* Expand basic block BB from GIMPLE trees to RTL.  */
>   
>   static basic_block
> @@ -5515,6 +5666,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
>         basic_block new_bb;
>   
>         stmt = gsi_stmt (gsi);
> +      optimize_immediate_choice (stmt);
>   
>         /* If this statement is a non-debug one, and we generate debug
>   	 insns, then this one might be the last real use of a TERed
> diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b30dcca88f44ca73fcb8202ea488888b365400c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return x >= 0xfff01;
> +}
> +
> +int
> +GT (unsigned int x)
> +{
> +  return x > 0xfefffffe;
> +}
> +
> +/* Go from four moves to two.  */
> +
> +int
> +baz (long long x)
> +{
> +  return x <= 0x1999999999999998;
> +}
> +
> +int
> +LE (unsigned int x)
> +{
> +  return x <= 0xfefffffe;
> +}
> +
> +int
> +GE (long long x)
> +{
> +  return x >= 0xff000000;
> +}
> +
> +int
> +LT (int x)
> +{
> +  return x < 0xff000000;
> +}
> +
> +/* Optimize the immediate in conditionals.  */
> +
> +int check (int x, int y)
> +{
> +  if (x > y && GT (x))
> +    return 100;
> +
> +  return x;
> +}
> +
> +/* baz produces one movk instruction.  */
> +/* { dg-final { scan-assembler-times "movk" 1 } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]