This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC [1/3] divmod transform v2


On Fri, 21 Oct 2016, Jeff Law wrote:

> On 10/21/2016 04:34 AM, Prathamesh Kulkarni wrote:
> > On 20 October 2016 at 15:02, Richard Biener <rguenther@suse.de> wrote:
> > > On Wed, 19 Oct 2016, Jeff Law wrote:
> > > 
> > > > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> > > > > Hi,
> > > > > After approval from Bernd Schmidt, I committed the patch to remove
> > > > > optab functions for
> > > > > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
> > > > > for divmod patch.
> > > > > 
> > > > > This patch is mostly the same as previous one, except it drops
> > > > > targeting __udivmoddi4() because
> > > > > it gave undefined reference link error for calling __udivmoddi4() on
> > > > > aarch64-linux-gnu.
> > > > > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
> > > > > isn't needed for the target
> > > > > (it was a bug in my patch that called __udivmoddi4() even though
> > > > > aarch64 supported hardware div).
> > > > > 
> > > > > However this makes me wonder if it's guaranteed that __udivmoddi4()
> > > > > will be available for a target if it doesn't have hardware div and
> > > > > divmod insn and doesn't have target-specific libfunc for
> > > > > DImode divmod ? To be conservative, the attached patch doesn't
> > > > > generate call to __udivmoddi4.
> > > > > 
> > > > > Passes bootstrap+test on x86_64-unknown-linux.
> > > > > Cross-tested on arm*-*-*, aarch64*-*-*.
> > > > > Verified that there are no regressions with SPEC2006 on
> > > > > x86_64-unknown-linux-gnu.
> > > > > OK to commit ?
> > > > > 
> > > > > Thanks,
> > > > > Prathamesh
> > > > > 
> > > > > 
> > > > > divmod-v2-3-main.txt
> > > > > 
> > > > > 
> > > > > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
> > > > >         Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > >         Jim Wilson  <jim.wilson@linaro.org>
> > > > > 
> > > > >         * target.def: New hook expand_divmod_libfunc.
> > > > >         * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
> > > > >         * doc/tm.texi: Regenerate.
> > > > >         * internal-fn.def: Add new entry for DIVMOD ifn.
> > > > >         * internal-fn.c (expand_DIVMOD): New.
> > > > >         * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
> > > > >         targhooks.h.
> > > > >         (widen_mul_stats): Add new field divmod_calls_inserted.
> > > > >         (target_supports_divmod_p): New.
> > > > >         (divmod_candidate_p): Likewise.
> > > > >         (convert_to_divmod): Likewise.
> > > > >         (pass_optimize_widening_mul::execute): Call
> > > > >         calculate_dominance_info(), renumber_gimple_stmt_uids() at
> > > > >         beginning of function. Call convert_to_divmod()
> > > > >         and record stats for divmod.
> > > > Starting with some high level design comments.  If these conflict with
> > > > comments from others, let me know and we'll work through the issues.
> > > > 
> > > > I don't really like introducing code conditional on the target
> > > > capabilities
> > > > this early in the gimple optimization pipeline.
> > > 
> > > It's basically done right before RTL expansion
> > > (pass_optimize_widening_mul).
> > > 
> > > > Would it be possible to always do the transformation to divmod in the
> > > > gimple
> > > > optimizers, regardless of the target capabilities.  Then in the
> > > > gimple->RTL
> > > > expanders make a final decision about divmod insn, libcall, or using
> > > > div/mod
> > > > insns?
> > > 
> > > The issue is that it hoists one or both the division or the modulo and
> > > if we don't do the transform we'd want to undo that code motion.
> > > 
> > > > That would move all the target dependencies out of the gimple optimizers
> > > > and
> > > > into the gimple->rtl expansion phase, which is the preferred place to
> > > > start
> > > > introducing this kind of target dependency.
> > > > 
> > > > With that background, I'm going to focus more on the identification of
> > > > divmod
> > > > opportunities than the expansion bits.
> > > > 
> > > > 
> > > > > 
> > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > > index a4a8e49..866c368 100644
> > > > > --- a/gcc/doc/tm.texi
> > > > > +++ b/gcc/doc/tm.texi
> > > > > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64
> > > > > targets,
> > > > > please refer to
> > > > >  the hook implementation for how different fusion types are supported.
> > > > >  @end deftypefn
> > > > > 
> > > > > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
> > > > > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1},
> > > > > rtx
> > > > > *@var{quot}, rtx *@var{rem})
> > > > > +Define this hook for enabling divmod transform if the port does not
> > > > > have
> > > > > +hardware divmod insn but defines target-specific divmod libfuncs.
> > > > > +@end deftypefn
> > > > > +
> > > > >  @node Sections
> > > > >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> > > > >  @c the above section title is WAY too long.  maybe cut the part
> > > > > between
> > > > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > > > > index 265f1be..c4c387b 100644
> > > > > --- a/gcc/doc/tm.texi.in
> > > > > +++ b/gcc/doc/tm.texi.in
> > > > > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
> > > > > 
> > > > >  @hook TARGET_SCHED_FUSION_PRIORITY
> > > > > 
> > > > > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
> > > > > +
> > > > >  @node Sections
> > > > >  @section Dividing the Output into Sections (Texts, Data, @dots{})
> > > > >  @c the above section title is WAY too long.  maybe cut the part
> > > > > between
> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > > > index 0b32d5f..42c6973 100644
> > > > > --- a/gcc/internal-fn.c
> > > > > +++ b/gcc/internal-fn.c
> > > > > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn,
> > > > > gcall
> > > > > *call)
> > > > >    expand_ifn_atomic_compare_exchange (call);
> > > > >  }
> > > > > 
> > > > > +/* Expand DIVMOD() using:
> > > > In general, we do not use () when referring to objects in comments.
> > > > 
> > > > > + a) optab handler for udivmod/sdivmod if it is available.
> > > > > + b) If optab_handler doesn't exist, generate call to
> > > > > +    target-specific divmod libfunc.  */
> > > > > +
> > > > > +static void
> > > > > +expand_DIVMOD (internal_fn, gcall *call_stmt)
> > > > In general, don't use upper case for function names.  Lower case please.
> > > > But
> > > > I believe we should be pushing all the expansion stuff into the
> > > > gimple->rtl
> > > > expansion code, so I haven't looked at this closely yet on the
> > > > expectation
> > > > it'll likely move.
> > > > 
> > > > 
> > > > 
> > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > > > index 0cea1a8..4f7bdb2 100644
> > > > > --- a/gcc/tree-ssa-math-opts.c
> > > > > +++ b/gcc/tree-ssa-math-opts.c
> > > > > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator
> > > > > *gsi,
> > > > > gimple *stmt,
> > > > >    return true;
> > > > >  }
> > > > > 
> > > > > +/* Return true if target has support for divmod.  */
> > > > > +
> > > > > +static bool
> > > > > +target_supports_divmod_p (optab divmod_optab, optab div_optab,
> > > > > machine_mode
> > > > > mode)
> > > > So this is the kind of code I expect to avoid in the gimple optimizers.
> > > > Instead the decision about whether to use a divmod insn, divmod libfunc
> > > > or div
> > > > + synthesized mod, or separate div and mod insns should happen at
> > > > gimple->rtl
> > > > expansion time.
> > > > 
> > > > 
> > > > > +
> > > > > +/* This function looks for:
> > > > > +   t1 = a TRUNC_DIV_EXPR b;
> > > > > +   t2 = a TRUNC_MOD_EXPR b;
> > > > > +   and transforms it to the following sequence:
> > > > > +   complex_tmp = DIVMOD (a, b);
> > > > > +   t1 = REALPART_EXPR(a);
> > > > > +   t2 = IMAGPART_EXPR(b);
> > > > > +   For conditions enabling the transform see divmod_candidate_p().
> > > > > +
> > > > > +   The pass works in two phases:
> > > > > +   1) Walk through all immediate uses of stmt's operand and find a
> > > > > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is
> > > > > found add
> > > > > +      it to stmts vector.
> > > > > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic
> > > > > block
> > > > > that
> > > > > +      dominates other div/mod stmts with same operands) and update
> > > > > entries
> > > > > in
> > > > > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for
> > > > > div,
> > > > > +      IMAGPART_EXPR for mod).  */
> > > > > +
> > > > > +static bool
> > > > > +convert_to_divmod (gassign *stmt)
> > > > > +{
> > > > > +  if (!divmod_candidate_p (stmt))
> > > > > +    return false;
> > > > > +
> > > > > +  tree op1 = gimple_assign_rhs1 (stmt);
> > > > > +  tree op2 = gimple_assign_rhs2 (stmt);
> > > > > +
> > > > > +  imm_use_iterator use_iter;
> > > > > +  gimple *use_stmt;
> > > > > +  auto_vec<gimple *> stmts;
> > > > > +
> > > > > +  gimple *top_stmt = stmt;
> > > > > +  basic_block top_bb = gimple_bb (stmt);
> > > > > +
> > > > > +  /* Try to set top_stmt to "topmost" stmt
> > > > > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as
> > > > > stmt.
> > > > > */
> > > > > +
> > > > > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
> > > > > +    {
> > > > > +      if (is_gimple_assign (use_stmt)
> > > > > +     && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> > > > > +         || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
> > > > > +     && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
> > > > > +     && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
> > > > So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to
> > > > be
> > > > TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
> > > > TRUNC_DIV_EXPR statement.
> > > > 
> > > > The only way I could see TRUNC_MOD_EXPR being useful here would be if
> > > > there is
> > > > a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some
> > > > other
> > > > optimizer hasn't done its job.  Have you seen this to be useful in
> > > > practice?
> > > 
> > > Note that I've reviewed this parts already and we restructured things
> > > in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> > > finding a dominating trunc _or_ div and interatively doing so
> > > correctly if we have more than one pair.
> > > 
> > > > > +   {
> > > > > +     if (stmt_can_throw_internal (use_stmt))
> > > > > +       continue;
> > > > > +
> > > > > +     basic_block bb = gimple_bb (use_stmt);
> > > > > +
> > > > > +     if (bb == top_bb)
> > > > > +       {
> > > > > +         if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
> > > > > +           top_stmt = use_stmt;
> > > > > +       }
> > > > > +     else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
> > > > > +       {
> > > > > +         top_bb = bb;
> > > > > +         top_stmt = use_stmt;
> > > > > +       }
> > > > Do you want to break out of the immediate use loop once you've found a
> > > > TRUNC_DIV_EXPR statement with matching operands?
> > > > 
> > > > I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
> > > > TRUNC_MOD_EXPR, but that would mean that we've got redundant
> > > > TRUNC_DIV_EXPR
> > > > statements in the IL, which presumably doesn't happen if the rest of the
> > > > optimizers are doing their job.
> > > > 
> > > > 
> > > > 
> > > > > +   }
> > > > > +    }
> > > > > +
> > > > > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
> > > > > +    return false;
> > > > Don't you just want
> > > > if (stmt_can_throw_internal (stmt))
> > > >   return false;
> > > > 
> > > > Before the loop over the immediate uses, thus avoiding the loop if we've
> > > > got a
> > > > TRUNC_MOD_EXPR that we can't optimize?
> > > > 
> > > > 
> > > > 
> > > > > +
> > > > > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
> > > > > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
> > > > > +
> > > > > +  stmts.safe_push (top_stmt);
> > > > > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) ==
> > > > > TRUNC_DIV_EXPR);
> > > > > +
> > > > > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
> > > > ?!?  It's not clear why you'd need to do another immediate use loop
> > > > here.
> > > > 
> > > > ISTM at this point you've found two statements, one a TRUNC_DIV the
> > > > other a
> > > > TRUNC_MOD with the same operands.  Given that, ISTM you just need to
> > > > check the
> > > > dominance relationship of the blocks containing those statements.
> > > > 
> > > > It really feels like you're going out of your way to handle cases where
> > > > we
> > > > have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm
> > > > right, I'd
> > > > rather see that time spent optimizing away those redundant expressions
> > > > in
> > > > FRE/PRE/DOM.
> > > > 
> > > > > +
> > > > > +  /* Create libcall to internal fn DIVMOD:
> > > > > +     divmod_tmp = DIVMOD (op1, op2).  */
> > > > > +
> > > > > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1,
> > > > > op2);
> > > > > +  tree res = make_temp_ssa_name (
> > > > > +           build_complex_type (TREE_TYPE (op1)),
> > > > > +           call_stmt, "divmod_tmp");
> > > > Please review the the GCC formatting guidelines.  This looks wrong.
> > > > 
> > > > tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
> > > >                                call_stmt, "divmod_tmp");
> > > > 
> > > > Seems like the right formatting to me.
> > > > 
> > > > > +
> > > > > +  /* Update all statements in stmts.
> > > > > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
> > > > > REALPART_EXPR<divmod_tmp>
> > > > > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
> > > > > IMAGPART_EXPR<divmod_tmp>.  */
> > > > I'd just emit a copy from RES to the appropriate lhs operand just after
> > > > the
> > > > divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> > > > statements.
> > > 
> > > That sounds like a good idea.
> > Um sorry, not sure if I understood this part.
> > For eg:
> > t1 = x / y;
> > t2 = x % y;
> > 
> > do we want to transform it to:
> > complex_tmp = DIVMOD (x, y);
> > div_tmp = REALPART_EXPR<complex_tmp>
> > mod_tmp = IMAGPART_EXPR<complex_tmp>
> 
> complex_tmp = DIVMOD (x, y)
> t1 = REALPART_EXPR (complex_tmp)
> t2 = IMAGPART_EXPR (compex_tmp)
> 
> Then remove the
> t1 = x/y
> t2 = x%y
> 
>  statements

OTOH implementation-wise that's more complicated.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]