This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On 10 November 2015 at 20:11, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 4 November 2015 at 20:35, Richard Biener <rguenther@suse.de> wrote:
>> >
>> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> > target expands all divisions/modulo ops via divmod, relying on CSE
>> > solely as the HW always computes both div and mod (IIRC).
>> x86_64 has optab_handler for divmod defined, so the transform won't
>> take place on x86.
>
> Ok.
>
>> > +
>> > +        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
>> > (use_stmt), rhs);
>> > +        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >
>> > Ick.  Please use
>> >
>> >     gimple_set_rhs_from_tree (use_stmt, res);
>> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> Is that OK ?
>
> Yes.
>
>> >     update_stmt (use_stmt);
>> >     if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >       cfg_changed = true;
>> >
>> > +  free_dominance_info (CDI_DOMINATORS);
>> >
>> > do not free dominators.
>>
>> I have done the suggested changes in the attached patch.
>> I have a few questions:
>>
>> a) Does the change to insert DIVMOD call before topmost div or mod
>> stmt with matching operands
>> look correct ?
>
> +  /* Insert call-stmt just before the topmost div/mod stmt.
> +     top_bb dominates all other basic blocks containing div/mod stms
> +     so, the topmost stmt would be the first div/mod stmt with matching
> operands
> +     in top_bb.  */
> +
> +  gcc_assert (top_bb != 0);
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> (&gsi))
> +    {
> +      gimple *g = gsi_stmt (gsi);
> +      if (is_gimple_assign (g)
> +         && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> +            || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> +         && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> +         && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> +       break;
>
> Looks overly complicated to me.  Just remember "topmost" use_stmt
> alongside top_bb (looks like you'll no longer need top_bb if you
> retail top_stmt).  And then do
>
>    gsi = gsi_for_stmt (top_stmt);
>
> and insert before that.
Thanks, done in this patch. Does it look OK ?
IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
stmt1 occurs before stmt2
only if stmt1 and stmt2 are in the same basic block ?
>
>> b) Handling constants - I dropped handling constants in the attached
>> patch. IIUC we don't want
>> to enable this transform if there's a specialized expansion for some
>> constants for div or mod ?
>
> See expand_divmod which has lots of special cases for constant operands
> not requiring target support for div or mod.
Thanks, would it be OK if I do this in follow up patch ?
>
>> I suppose this would also be target dependent and require a target hook ?
>> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> is constant and <= 0 or power of 2,
>> while for other cases goes the expand_divmod() route to generate call
>> to __aeabi_idivmod libcall.
>
> Ok, so it lacks a signed mod instruction.
>
>> c) Gating the divmod transform -
>> I tried gating it on checks for optab_handlers on div and mod, however
>> this doesn't enable transform for arm cortex-a9
>> anymore (cortex-a9 doesn't have hardware instructions for integer div and mod).
>> IIUC for cortex-a9,
>> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> HAVE_divsi3 is 0.
>> However optab_handler (smod_optab, SImode) matches since optab_handler
>> only checks for existence of pattern
>> (and not whether the pattern gets matched).
>> I suppose we should enable the transform only if the divmod, div, and
>> mod pattern do not match rather than checking
>> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> would fail to match but optab_handler(smod_optab, SImode ) still
>> says it's matched.
>
> Ah, of course.  Querying for an optab handler is just a cheap
> guesstimate...  Not sure how to circumvent this best (sub-target
> enablement of patterns).  RTL expansion just goes ahead (of course)
> and sees if expansion eventually fails.  Richard?
>
>> Should we define a new target hook combine_divmod, which returns true
>> if transforming to divmod is desirable for that
>> target ?
>> The default definition could be:
>> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
>> {
>>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> }
>>
>> And for arm, it could be over-ridden to return false if op2 is
>> constant and <= 0 or power of 2.
>> I am not really sure if this is a good idea since I am replicating
>> information from modsi3 pattern.
>> Any change to the pattern may require corresponding change to the hook :/
>
> Yeah, I don't think that is desirable.  Ideally we'd have a way
> to query HAVE_* for CODE_FOR_* which would mean target-insns.def
> support for all div/mod/divmod patterns(?) and queries...
>
> Not sure if what would be enough though.
>
> Note that the divmod check is equally flawed.
>
> I think with the above I'd enable the transform when
>
> +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> +      || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>            && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
> +    return false;
Um this fails for the arm backend (for cortex-a9) because
optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false
optab_libfunc (divmod_optab, mode) != NULL_RTX is true.
optab_handler (div_optab, mode) == CODE_FOR_nothing is true.
which comes down to false || (true && true) which is true and we hit
return false.
AFAIU, we want the transform to be disabled if:
a) optab_handler exists for divmod.
b) optab_handler exists for div.
c) optab_libfunc does not exist for divmod.  */

+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+      || optab_handler (div_optab, mode) != CODE_FOR_nothing
+      || optab_libfunc (divmod_optab, mode) == NULL_RTX)
+    return false;
Does that look correct ?
>
> so we either will have a divmod instruction (hopefully not sub-target
> disabled for us) or a libfunc for divmod and for sure no HW divide
> instruction (HW mod can be emulated by HW divide but not the other
> way around).
>
>> d) Adding effective-target-check for divmod: I just enabled it for
>> arm*-*-* for now. I could additionally append more targets,
>> not sure if this is the right approach.
>
> Looks good to me.
Is this version OK if bootstrap/testing passes ?

Thanks,
Prathamesh
>
> Thanks,
> Richard.

Attachment: ChangeLog.txt
Description: Text document

Attachment: patch-v6.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]