This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFC [1/3] divmod transform v2
On 20 October 2016 at 15:02, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 19 Oct 2016, Jeff Law wrote:
>
>> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> > Hi,
>> > After approval from Bernd Schmidt, I committed the patch to remove
>> > optab functions for
>> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> > for divmod patch.
>> >
>> > This patch is mostly the same as previous one, except it drops
>> > targeting __udivmoddi4() because
>> > it gave undefined reference link error for calling __udivmoddi4() on
>> > aarch64-linux-gnu.
>> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> > isn't needed for the target
>> > (it was a bug in my patch that called __udivmoddi4() even though
>> > aarch64 supported hardware div).
>> >
>> > However this makes me wonder if it's guaranteed that __udivmoddi4()
>> > will be available for a target if it doesn't have hardware div and
>> > divmod insn and doesn't have target-specific libfunc for
>> > DImode divmod ? To be conservative, the attached patch doesn't
>> > generate call to __udivmoddi4.
>> >
>> > Passes bootstrap+test on x86_64-unknown-linux.
>> > Cross-tested on arm*-*-*, aarch64*-*-*.
>> > Verified that there are no regressions with SPEC2006 on
>> > x86_64-unknown-linux-gnu.
>> > OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> >
>> > divmod-v2-3-main.txt
>> >
>> >
>> > 2016-10-15 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
>> > Kugan Vivekanandarajah <kuganv@linaro.org>
>> > Jim Wilson <jim.wilson@linaro.org>
>> >
>> > * target.def: New hook expand_divmod_libfunc.
>> > * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
>> > * doc/tm.texi: Regenerate.
>> > * internal-fn.def: Add new entry for DIVMOD ifn.
>> > * internal-fn.c (expand_DIVMOD): New.
>> > * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
>> > targhooks.h.
>> > (widen_mul_stats): Add new field divmod_calls_inserted.
>> > (target_supports_divmod_p): New.
>> > (divmod_candidate_p): Likewise.
>> > (convert_to_divmod): Likewise.
>> > (pass_optimize_widening_mul::execute): Call
>> > calculate_dominance_info(), renumber_gimple_stmt_uids() at
>> > beginning of function. Call convert_to_divmod()
>> > and record stats for divmod.
>> Starting with some high level design comments. If these conflict with
>> comments from others, let me know and we'll work through the issues.
>>
>> I don't really like introducing code conditional on the target capabilities
>> this early in the gimple optimization pipeline.
>
> It's basically done right before RTL expansion
> (pass_optimize_widening_mul).
>
>> Would it be possible to always do the transformation to divmod in the gimple
>> optimizers, regardless of the target capabilities. Then in the gimple->RTL
>> expanders make a final decision about divmod insn, libcall, or using div/mod
>> insns?
>
> The issue is that it hoists one or both the division or the modulo and
> if we don't do the transform we'd want to undo that code motion.
>
>> That would move all the target dependencies out of the gimple optimizers and
>> into the gimple->rtl expansion phase, which is the preferred place to start
>> introducing this kind of target dependency.
>>
>> With that background, I'm going to focus more on the identification of divmod
>> opportunities than the expansion bits.
>>
>>
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index a4a8e49..866c368 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
>> > please refer to
>> > the hook implementation for how different fusion types are supported.
>> > @end deftypefn
>> >
>> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
>> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
>> > *@var{quot}, rtx *@var{rem})
>> > +Define this hook for enabling divmod transform if the port does not have
>> > +hardware divmod insn but defines target-specific divmod libfuncs.
>> > +@end deftypefn
>> > +
>> > @node Sections
>> > @section Dividing the Output into Sections (Texts, Data, @dots{})
>> > @c the above section title is WAY too long. maybe cut the part between
>> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> > index 265f1be..c4c387b 100644
>> > --- a/gcc/doc/tm.texi.in
>> > +++ b/gcc/doc/tm.texi.in
>> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>> >
>> > @hook TARGET_SCHED_FUSION_PRIORITY
>> >
>> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
>> > +
>> > @node Sections
>> > @section Dividing the Output into Sections (Texts, Data, @dots{})
>> > @c the above section title is WAY too long. maybe cut the part between
>> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> > index 0b32d5f..42c6973 100644
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
>> > *call)
>> > expand_ifn_atomic_compare_exchange (call);
>> > }
>> >
>> > +/* Expand DIVMOD() using:
>> In general, we do not use () when referring to objects in comments.
>>
>> > + a) optab handler for udivmod/sdivmod if it is available.
>> > + b) If optab_handler doesn't exist, generate call to
>> > + target-specific divmod libfunc. */
>> > +
>> > +static void
>> > +expand_DIVMOD (internal_fn, gcall *call_stmt)
>> In general, don't use upper case for function names. Lower case please. But
>> I believe we should be pushing all the expansion stuff into the gimple->rtl
>> expansion code, so I haven't looked at this closely yet on the expectation
>> it'll likely move.
>>
>>
>>
>> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>> > index 0cea1a8..4f7bdb2 100644
>> > --- a/gcc/tree-ssa-math-opts.c
>> > +++ b/gcc/tree-ssa-math-opts.c
>> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
>> > gimple *stmt,
>> > return true;
>> > }
>> >
>> > +/* Return true if target has support for divmod. */
>> > +
>> > +static bool
>> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, machine_mode
>> > mode)
>> So this is the kind of code I expect to avoid in the gimple optimizers.
>> Instead the decision about whether to use a divmod insn, divmod libfunc or div
>> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
>> expansion time.
>>
>>
>> > +
>> > +/* This function looks for:
>> > + t1 = a TRUNC_DIV_EXPR b;
>> > + t2 = a TRUNC_MOD_EXPR b;
>> > + and transforms it to the following sequence:
>> > + complex_tmp = DIVMOD (a, b);
>> > + t1 = REALPART_EXPR(a);
>> > + t2 = IMAGPART_EXPR(b);
>> > + For conditions enabling the transform see divmod_candidate_p().
>> > +
>> > + The pass works in two phases:
>> > + 1) Walk through all immediate uses of stmt's operand and find a
>> > + TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
>> > + it to stmts vector.
>> > + 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
>> > that
>> > + dominates other div/mod stmts with same operands) and update entries
>> > in
>> > + stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
>> > + IMAGPART_EXPR for mod). */
>> > +
>> > +static bool
>> > +convert_to_divmod (gassign *stmt)
>> > +{
>> > + if (!divmod_candidate_p (stmt))
>> > + return false;
>> > +
>> > + tree op1 = gimple_assign_rhs1 (stmt);
>> > + tree op2 = gimple_assign_rhs2 (stmt);
>> > +
>> > + imm_use_iterator use_iter;
>> > + gimple *use_stmt;
>> > + auto_vec<gimple *> stmts;
>> > +
>> > + gimple *top_stmt = stmt;
>> > + basic_block top_bb = gimple_bb (stmt);
>> > +
>> > + /* Try to set top_stmt to "topmost" stmt
>> > + with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
>> > */
>> > +
>> > + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> > + {
>> > + if (is_gimple_assign (use_stmt)
>> > + && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
>> > + || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
>> > + && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
>> > + && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
>> So why check for TRUNC_MOD_EXPR here? ISTM that stmt is always going to be
>> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
>> TRUNC_DIV_EXPR statement.
>>
>> The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
>> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
>> optimizer hasn't done its job. Have you seen this to be useful in practice?
>
> Note that I've reviewed this parts already and we restructured things
> in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> finding a dominating trunc _or_ div and interatively doing so
> correctly if we have more than one pair.
>
>> > + {
>> > + if (stmt_can_throw_internal (use_stmt))
>> > + continue;
>> > +
>> > + basic_block bb = gimple_bb (use_stmt);
>> > +
>> > + if (bb == top_bb)
>> > + {
>> > + if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
>> > + top_stmt = use_stmt;
>> > + }
>> > + else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
>> > + {
>> > + top_bb = bb;
>> > + top_stmt = use_stmt;
>> > + }
>> Do you want to break out of the immediate use loop once you've found a
>> TRUNC_DIV_EXPR statement with matching operands?
>>
>> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
>> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
>> statements in the IL, which presumably doesn't happen if the rest of the
>> optimizers are doing their job.
>>
>>
>>
>> > + }
>> > + }
>> > +
>> > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
>> > + return false;
>> Don't you just want
>> if (stmt_can_throw_internal (stmt))
>> return false;
>>
>> Before the loop over the immediate uses, thus avoiding the loop if we've got a
>> TRUNC_MOD_EXPR that we can't optimize?
>>
>>
>>
>> > +
>> > + tree top_op1 = gimple_assign_rhs1 (top_stmt);
>> > + tree top_op2 = gimple_assign_rhs2 (top_stmt);
>> > +
>> > + stmts.safe_push (top_stmt);
>> > + bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
>> > +
>> > + /* Ensure that gimple_bb (use_stmt) is dominated by top_bb. */
>> ?!? It's not clear why you'd need to do another immediate use loop here.
>>
>> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
>> TRUNC_MOD with the same operands. Given that, ISTM you just need to check the
>> dominance relationship of the blocks containing those statements.
>>
>> It really feels like you're going out of your way to handle cases where we
>> have redundant expressions in the IL. Maybe I'm wrong. BUt if I'm right, I'd
>> rather see that time spent optimizing away those redundant expressions in
>> FRE/PRE/DOM.
>>
>> > +
>> > + /* Create libcall to internal fn DIVMOD:
>> > + divmod_tmp = DIVMOD (op1, op2). */
>> > +
>> > + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
>> > + tree res = make_temp_ssa_name (
>> > + build_complex_type (TREE_TYPE (op1)),
>> > + call_stmt, "divmod_tmp");
>> Please review the the GCC formatting guidelines. This looks wrong.
>>
>> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>> call_stmt, "divmod_tmp");
>>
>> Seems like the right formatting to me.
>>
>> > +
>> > + /* Update all statements in stmts.
>> > + if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>> > REALPART_EXPR<divmod_tmp>
>> > + if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>> > IMAGPART_EXPR<divmod_tmp>. */
>> I'd just emit a copy from RES to the appropriate lhs operand just after the
>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> statements.
>
> That sounds like a good idea.
Um sorry, not sure if I understood this part.
For eg:
t1 = x / y;
t2 = x % y;
do we want to transform it to:
complex_tmp = DIVMOD (x, y);
div_tmp = REALPART_EXPR<complex_tmp>
mod_tmp = IMAGPART_EXPR<complex_tmp>
and then replace each use of lhs (t1, t2 etc.) by either div_tmp
or mod_tmp (depending if the stmt was trunc_div_expr or trunc_mod_expr) ?
Thanks,
Prathamesh
>
>> Copy propagation should clean that up just fine and it simplifies your
>> implementation.
>
> There should be no copy to propagate that way.
>
>> > +
>> > + bool cfg_changed = false;
>> > + for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
>> > + {
>> > + tree new_rhs;
>> > +
>> > + switch (gimple_assign_rhs_code (use_stmt))
>> > + {
>> > + case TRUNC_DIV_EXPR:
>> > + new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
>> > + break;
>> > +
>> > + case TRUNC_MOD_EXPR:
>> > + new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
>> > + break;
>> > +
>> > + default:
>> > + gcc_unreachable ();
>> > + }
>> > +
>> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> > + gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
>> > + update_stmt (use_stmt);
>> > +
>> > + if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > + cfg_changed = true;
>> Does this ever happen given how you filter out throwing statements earlier?
>
> We filter out throwing stmts that are not "top". The top one we replace
> with the divmod call and we can preserve EH for that (and thus handle
> the case where the original div/mod at this place may throw).
>
>>
>> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
>> > match_uaddsub_overflow (&gsi, stmt, code);
>> > break;
>> >
>> > + case TRUNC_MOD_EXPR:
>> > + cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
>> > + break;
>> > +
>> > default:;
>> > }
>> > }
>> If I'm reading this correctly, ISTM that you only search for a divmod pair if
>> you first find the mod insn. That's probably reasonable. We have to find
>> both to have an optimization opportunity and I suspect we have more div than
>> mod insns, so you're presumably minimizing the search space with this
>> approach. Can you confirm that was your intention here, or were you looking
>> to do something else?
>
> Yes, that was the idea.
>
> Richard.