RFC [1/3] divmod transform v2

Thu Oct 20 09:32:00 GMT 2016

On Wed, 19 Oct 2016, Jeff Law wrote:

> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> > Hi,
> > After approval from Bernd Schmidt, I committed the patch to remove
> > optab functions for
> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> > for divmod patch.
> > 
> > This patch is mostly the same as previous one, except it drops
> > targeting __udivmoddi4() because
> > it gave undefined reference link error for calling __udivmoddi4() on
> > aarch64-linux-gnu.
> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> > isn't needed for the target
> > (it was a bug in my patch that called __udivmoddi4() even though
> > aarch64 supported hardware div).
> > 
> > However this makes me wonder if it's guaranteed that __udivmoddi4()
> > will be available for a target if it doesn't have hardware div and
> > divmod insn and doesn't have target-specific libfunc for
> > DImode divmod ? To be conservative, the attached patch doesn't
> > generate call to __udivmoddi4.
> > 
> > Passes bootstrap+test on x86_64-unknown-linux.
> > Cross-tested on arm*-*-*, aarch64*-*-*.
> > Verified that there are no regressions with SPEC2006 on
> > x86_64-unknown-linux-gnu.
> > OK to commit ?
> > 
> > Thanks,
> > Prathamesh
> > 
> > 
> > divmod-v2-3-main.txt
> > 
> > 
> > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> > 	    Kugan Vivekanandarajah  <kuganv@linaro.org>
> > 	    Jim Wilson  <jim.wilson@linaro.org>
> > 
> > 	    * target.def: New hook expand_divmod_libfunc.
> > 	    * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
> > 	    * doc/tm.texi: Regenerate.
> > 	    * internal-fn.def: Add new entry for DIVMOD ifn.
> > 	    * internal-fn.c (expand_DIVMOD): New.
> > 	    * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
> > 	    targhooks.h.
> > 	    (widen_mul_stats): Add new field divmod_calls_inserted.
> > 	    (target_supports_divmod_p): New.
> > 	    (divmod_candidate_p): Likewise.
> > 	    (convert_to_divmod): Likewise.
> > 	    (pass_optimize_widening_mul::execute): Call
> > 	    calculate_dominance_info(), renumber_gimple_stmt_uids() at
> > 	    beginning of function. Call convert_to_divmod()
> > 	    and record stats for divmod.
> Starting with some high level design comments.  If these conflict with
> comments from others, let me know and we'll work through the issues.
> 
> I don't really like introducing code conditional on the target capabilities
> this early in the gimple optimization pipeline.

It's basically done right before RTL expansion 
(pass_optimize_widening_mul).

> Would it be possible to always do the transformation to divmod in the gimple
> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
> expanders make a final decision about divmod insn, libcall, or using div/mod
> insns?

The issue is that it hoists one or both the division or the modulo and
if we don't do the transform we'd want to undo that code motion.

> That would move all the target dependencies out of the gimple optimizers and
> into the gimple->rtl expansion phase, which is the preferred place to start
> introducing this kind of target dependency.
> 
> With that background, I'm going to focus more on the identification of divmod
> opportunities than the expansion bits.
> 
> 
> > 
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index a4a8e49..866c368 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
> > please refer to
> >  the hook implementation for how different fusion types are supported.
> >  @end deftypefn
> > 
> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
> > *@var{quot}, rtx *@var{rem})
> > +Define this hook for enabling divmod transform if the port does not have
> > +hardware divmod insn but defines target-specific divmod libfuncs.
> > +@end deftypefn
> > +
> >  @node Sections
> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> >  @c the above section title is WAY too long.  maybe cut the part between
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 265f1be..c4c387b 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
> > 
> >  @hook TARGET_SCHED_FUSION_PRIORITY
> > 
> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
> > +
> >  @node Sections
> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> >  @c the above section title is WAY too long.  maybe cut the part between
> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > index 0b32d5f..42c6973 100644
> > --- a/gcc/internal-fn.c
> > +++ b/gcc/internal-fn.c
> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
> > *call)
> >    expand_ifn_atomic_compare_exchange (call);
> >  }
> > 
> > +/* Expand DIVMOD() using:
> In general, we do not use () when referring to objects in comments.
> 
> > + a) optab handler for udivmod/sdivmod if it is available.
> > + b) If optab_handler doesn't exist, generate call to
> > +    target-specific divmod libfunc.  */
> > +
> > +static void
> > +expand_DIVMOD (internal_fn, gcall *call_stmt)
> In general, don't use upper case for function names.  Lower case please.  But
> I believe we should be pushing all the expansion stuff into the gimple->rtl
> expansion code, so I haven't looked at this closely yet on the expectation
> it'll likely move.
> 
> 
> 
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index 0cea1a8..4f7bdb2 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
> > gimple *stmt,
> >    return true;
> >  }
> > 
> > +/* Return true if target has support for divmod.  */
> > +
> > +static bool
> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode
> > mode)
> So this is the kind of code I expect to avoid in the gimple optimizers.
> Instead the decision about whether to use a divmod insn, divmod libfunc or div
> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
> expansion time.
> 
> 
> > +
> > +/* This function looks for:
> > +   t1 = a TRUNC_DIV_EXPR b;
> > +   t2 = a TRUNC_MOD_EXPR b;
> > +   and transforms it to the following sequence:
> > +   complex_tmp = DIVMOD (a, b);
> > +   t1 = REALPART_EXPR(a);
> > +   t2 = IMAGPART_EXPR(b);
> > +   For conditions enabling the transform see divmod_candidate_p().
> > +
> > +   The pass works in two phases:
> > +   1) Walk through all immediate uses of stmt's operand and find a
> > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
> > +      it to stmts vector.
> > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
> > that
> > +      dominates other div/mod stmts with same operands) and update entries
> > in
> > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
> > +      IMAGPART_EXPR for mod).  */
> > +
> > +static bool
> > +convert_to_divmod (gassign *stmt)
> > +{
> > +  if (!divmod_candidate_p (stmt))
> > +    return false;
> > +
> > +  tree op1 = gimple_assign_rhs1 (stmt);
> > +  tree op2 = gimple_assign_rhs2 (stmt);
> > +
> > +  imm_use_iterator use_iter;
> > +  gimple *use_stmt;
> > +  auto_vec<gimple *> stmts;
> > +
> > +  gimple *top_stmt = stmt;
> > +  basic_block top_bb = gimple_bb (stmt);
> > +
> > +  /* Try to set top_stmt to "topmost" stmt
> > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
> > */
> > +
> > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> > +    {
> > +      if (is_gimple_assign (use_stmt)
> > +	  && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> > +	      || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
> > +	  && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
> > +	  && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
> TRUNC_DIV_EXPR statement.
> 
> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
> optimizer hasn't done its job.  Have you seen this to be useful in practice?

Note that I've reviewed this parts already and we restructured things
in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
finding a dominating trunc _or_ div and interatively doing so
correctly if we have more than one pair.

> > +	{
> > +	  if (stmt_can_throw_internal (use_stmt))
> > +	    continue;
> > +
> > +	  basic_block bb = gimple_bb (use_stmt);
> > +
> > +	  if (bb == top_bb)
> > +	    {
> > +	      if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
> > +		top_stmt = use_stmt;
> > +	    }
> > +	  else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
> > +	    {
> > +	      top_bb = bb;
> > +	      top_stmt = use_stmt;
> > +	    }
> Do you want to break out of the immediate use loop once you've found a
> TRUNC_DIV_EXPR statement with matching operands?
> 
> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
> statements in the IL, which presumably doesn't happen if the rest of the
> optimizers are doing their job.
> 
> 
> 
> > +	}
> > +    }
> > +
> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> > +    return false;
> Don't you just want
> if (stmt_can_throw_internal (stmt))
>   return false;
> 
> Before the loop over the immediate uses, thus avoiding the loop if we've got a
> TRUNC_MOD_EXPR that we can't optimize?
> 
> 
> 
> > +
> > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
> > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
> > +
> > +  stmts.safe_push (top_stmt);
> > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
> > +
> > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
> ?!?  It's not clear why you'd need to do another immediate use loop here.
> 
> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
> TRUNC_MOD with the same operands.  Given that, ISTM you just need to check the
> dominance relationship of the blocks containing those statements.
> 
> It really feels like you're going out of your way to handle cases where we
> have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm right, I'd
> rather see that time spent optimizing away those redundant expressions in
> FRE/PRE/DOM.
> 
> > +
> > +  /* Create libcall to internal fn DIVMOD:
> > +     divmod_tmp = DIVMOD (op1, op2).  */
> > +
> > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
> > +  tree res = make_temp_ssa_name (
> > +		build_complex_type (TREE_TYPE (op1)),
> > +		call_stmt, "divmod_tmp");
> Please review the the GCC formatting guidelines.  This looks wrong.
> 
> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>                                call_stmt, "divmod_tmp");
> 
> Seems like the right formatting to me.
> 
> > +
> > +  /* Update all statements in stmts.
> > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
> > REALPART_EXPR<divmod_tmp>
> > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
> > IMAGPART_EXPR<divmod_tmp>.  */
> I'd just emit a copy from RES to the appropriate lhs operand just after the
> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> statements.

That sounds like a good idea.

> Copy propagation should clean that up just fine and it simplifies your
> implementation.

There should be no copy to propagate that way.

> > +
> > +  bool cfg_changed = false;
> > +  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
> > +    {
> > +      tree new_rhs;
> > +
> > +      switch (gimple_assign_rhs_code (use_stmt))
> > +	{
> > +	  case TRUNC_DIV_EXPR:
> > +	    new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> > +	    break;
> > +
> > +	  case TRUNC_MOD_EXPR:
> > +	    new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
> > +	    break;
> > +
> > +	  default:
> > +	    gcc_unreachable ();
> > +	}
> > +
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> > +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
> > +      update_stmt (use_stmt);
> > +
> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> > +	cfg_changed = true;
> Does this ever happen given how you filter out throwing statements earlier?

We filter out throwing stmts that are not "top".  The top one we replace
with the divmod call and we can preserve EH for that (and thus handle
the case where the original div/mod at this place may throw).

> 
> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
> >  		    match_uaddsub_overflow (&gsi, stmt, code);
> >  		  break;
> > 
> > +		case TRUNC_MOD_EXPR:
> > +		  cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
> > +		  break;
> > +
> >  		default:;
> >  		}
> >  	    }
> If I'm reading this correctly, ISTM that you only search for a divmod pair if
> you first find the mod insn.  That's probably reasonable.  We have to find
> both to have an optimization opportunity and I suspect we have more div than
> mod insns, so you're presumably minimizing the search space with this
> approach.  Can you confirm that was your intention here, or were you looking
> to do something else?

Yes, that was the idea.

Richard.