This is the mail archive of the
mailing list for the GCC project.
Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
- From: Richard Biener <rguenther at suse dot de>
- To: Prathamesh Kulkarni <prathamesh dot kulkarni at linaro dot org>
- Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, richard dot sandiford at arm dot com
- Date: Tue, 10 Nov 2015 15:41:28 +0100 (CET)
- Subject: Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
- Authentication-results: sourceware.org; auth=none
- References: <CAAgBjMmSzFUwa8W2Tc6GzGcA3nTZ_7oOfyB-UPmRLtfKXioUMQ at mail dot gmail dot com> <CAFiYyc1OVk2L76pTjhRFzgco-D7YbA=uK+xG95Xn-whuFh292Q at mail dot gmail dot com> <CAAgBjM=XtqUa60TOdRXnpMkMcGO7fAupbuYWGmTPbt_qkkOrgQ at mail dot gmail dot com> <CAAgBjMmF4kUK2Uovw=rxYF1vpjAmfzFTsXQaRj9yj7v+ep1-Ag at mail dot gmail dot com> <alpine dot LSU dot 2 dot 11 dot 1511021342240 dot 10078 at zhemvz dot fhfr dot qr> <CAAgBjMmB=ow24szF+qW01+ordKGcOEHRLcx5fLyQnzvHND0fow at mail dot gmail dot com> <alpine dot LSU dot 2 dot 11 dot 1511041546420 dot 10078 at zhemvz dot fhfr dot qr> <CAAgBjMnL__ZhkLZG=kOL3xrf-s6gGvW6EGBvqeizrE0WDdQ8Zg at mail dot gmail dot com>
On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
> On 4 November 2015 at 20:35, Richard Biener <firstname.lastname@example.org> wrote:
> > Btw, did you investigate code gen differences on x86_64/i586? That
> > target expands all divisions/modulo ops via divmod, relying on CSE
> > solely as the HW always computes both div and mod (IIRC).
> x86_64 has optab_handler for divmod defined, so the transform won't
> take place on x86.
> > +
> > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> > (use_stmt), rhs);
> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> > Ick. Please use
> > gimple_set_rhs_from_tree (use_stmt, res);
> Um there doesn't seem to be gimple_set_rhs_from_tree.
> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
> Is that OK ?
> > update_stmt (use_stmt);
> > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> > cfg_changed = true;
> > + free_dominance_info (CDI_DOMINATORS);
> > do not free dominators.
> I have done the suggested changes in the attached patch.
> I have a few questions:
> a) Does the change to insert DIVMOD call before topmost div or mod
> stmt with matching operands
> look correct ?
+ /* Insert call-stmt just before the topmost div/mod stmt.
+ top_bb dominates all other basic blocks containing div/mod stms
+ so, the topmost stmt would be the first div/mod stmt with matching
+ in top_bb. */
+ gcc_assert (top_bb != 0);
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
+ gimple *g = gsi_stmt (gsi);
+ if (is_gimple_assign (g)
+ && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
+ || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
+ && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
+ && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
Looks overly complicated to me. Just remember "topmost" use_stmt
alongside top_bb (looks like you'll no longer need top_bb if you
retail top_stmt). And then do
gsi = gsi_for_stmt (top_stmt);
and insert before that.
> b) Handling constants - I dropped handling constants in the attached
> patch. IIUC we don't want
> to enable this transform if there's a specialized expansion for some
> constants for div or mod ?
See expand_divmod which has lots of special cases for constant operands
not requiring target support for div or mod.
> I suppose this would also be target dependent and require a target hook ?
> For instance arm defines modsi3 pattern to expand mod when 2nd operand
> is constant and <= 0 or power of 2,
> while for other cases goes the expand_divmod() route to generate call
> to __aeabi_idivmod libcall.
Ok, so it lacks a signed mod instruction.
> c) Gating the divmod transform -
> I tried gating it on checks for optab_handlers on div and mod, however
> this doesn't enable transform for arm cortex-a9
> anymore (cortex-a9 doesn't have hardware instructions for integer div and mod).
> IIUC for cortex-a9,
> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
> HAVE_divsi3 is 0.
> However optab_handler (smod_optab, SImode) matches since optab_handler
> only checks for existence of pattern
> (and not whether the pattern gets matched).
> I suppose we should enable the transform only if the divmod, div, and
> mod pattern do not match rather than checking
> if the patterns exist via optab_handler ? For a general x % y, modsi3
> would fail to match but optab_handler(smod_optab, SImode ) still
> says it's matched.
Ah, of course. Querying for an optab handler is just a cheap
guesstimate... Not sure how to circumvent this best (sub-target
enablement of patterns). RTL expansion just goes ahead (of course)
and sees if expansion eventually fails. Richard?
> Should we define a new target hook combine_divmod, which returns true
> if transforming to divmod is desirable for that
> target ?
> The default definition could be:
> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
> // check for optab_handlers for div/mod/divmod and libfunc for divmod
> And for arm, it could be over-ridden to return false if op2 is
> constant and <= 0 or power of 2.
> I am not really sure if this is a good idea since I am replicating
> information from modsi3 pattern.
> Any change to the pattern may require corresponding change to the hook :/
Yeah, I don't think that is desirable. Ideally we'd have a way
to query HAVE_* for CODE_FOR_* which would mean target-insns.def
support for all div/mod/divmod patterns(?) and queries...
Not sure if what would be enough though.
Note that the divmod check is equally flawed.
I think with the above I'd enable the transform when
+ if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+ || (optab_libfunc (divmod_optab, mode) != NULL_RTX
&& optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
+ return false;
so we either will have a divmod instruction (hopefully not sub-target
disabled for us) or a libfunc for divmod and for sure no HW divide
instruction (HW mod can be emulated by HW divide but not the other
> d) Adding effective-target-check for divmod: I just enabled it for
> arm*-*-* for now. I could additionally append more targets,
> not sure if this is the right approach.
Looks good to me.