This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:

> On 2 November 2015 at 13:20, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> > On 30 October 2015 at 15:57, Richard Biener <richard.guenther@gmail.com> wrote:
> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >>> Hi,
> >>> I have attached revamped version of Kugan's patch
> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
> >>> Test-case: http://pastebin.com/QMfpXLD9
> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
> >>> Assembly: http://pastebin.com/kk2HZpvA
> >>>
> >>> The approach I took is similar to sincos pass, which involves two parts:
> >>> a) divmod pass that transforms:
> >>> t1 = a / b;
> >>> t2 = a % b;
> >>> to the following sequence:
> >>> complex_tmp = DIVMOD (a, b);
> >>> t1 = REALPART_EXPR(a);
> >>> t2 = IMAGPART_EXPR(b);
> >>>
> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
> >>> expand_DIVMOD().
> >>> I am not sure if I have done this correctly. I was somehow looking to
> >>> reuse expand_divmod() but am not able to figure out how to do that
> >>> (AFAIU expand_divmod() strictly returns either the quotient or
> >>> remainder but never both which is what I want), so ended up with
> >>> manually expanding to rtx.
> >>>
> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
> >>> understand the following:
> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> >>>           cfg_changed = true;
> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
> >>
> >> The call replacement might no longer throw.
> >>
> >>> A silly question, what options to pass to gcc to print statistics ? I
> >>> added statistics_counter_event to keep track of number of calls
> >>> inserted/used but not sure how to print them :/
> >>
> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
> > Thanks, I can see it now -;)
> >>
> >>> I would be grateful for suggestions for improving the patch.
> >>
> >> First of all new functions need comments (expand_DIVMOD).
> >>
> >> Second - what's the reasoning for the pass placement seen?  I think
> >> the transform is a lowering one, improving RTL expansion.  The
> >> DIVMOD representation is harmful for GIMPLE optimizers.
> >>
> >> Third - I'd have integrated this with an existing pass - we have another
> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
> >> Please piggy-back on it.
> >>
> >> You seem to do the transform unconditionally even if the target has
> >> instructions for division and modulus but no instruction for divmod
> >> in which case you'll end up emitting a library call in the end instead
> >> of a div and mod instruction.  I think the transform should be gated on
> >> missing div/mod instructions or the availability of a divmod one.
> >>
> >> +         if (is_gimple_assign (stmt)
> >> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_binary)
> >> +           {
> >> +             if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
> >> +               cfg_changed |= execute_divmod_1 (stmt);
> >>
> >> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
> >> which seems to be less common and thus should cut down compile-time.
> >>
> >> +  /* Case 2: When both are in top_bb */
> >> +  else if (op1_def_in_bb && op2_def_in_bb)
> >> +    {
> >> +      /* FIXME: better way to compare iterators?.  */
> >> +      gimple_stmt_iterator gsi = get_later_stmt (top_bb,
> >> def_stmt_op1, def_stmt_op2);
> >>
> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
> >> this test efficiently
> >> you need stmt UIDs computed.
> >>
> >> I miss a testcase (or two).
> > I have tried to address your comments in the attached patch.
> oops, unsigned uid_no = 0; should be outside the loop in
> pass_optimize_widening_mul::execute.
> Done in this version of the patch.
> 
> Thanks,
> Prathamesh
> > Could you please review it for me ?

For the testcases you need a new target check for divmod.

> > I have a few questions:
> >
> > a) Is the check for availability of divmod correct ?

You simply want optab_handler (tab, mode) != CODE_FOR_nothing

The libfunc is always available.

When looking at TRUNC_DIV_EXPR you should also exclude
the case where TYPE_OVERFLOW_TRAPS (type) as that should
expand using the [su]divv optabs (no trapping overflow
divmod optab exists).

> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == bb1).
> > However I wonder if we should also check that if bb0 dominates bb1
> > then bb1 should
> > post-dominate bb0 ?

Depends on how expensive divmod is compared to div or mod.  An alternative
transform would be to look whether [us]mod optabs are available and if not
implement modulo via div and subtract, CSEing the division itself.  Not
sure if any target that provides div patterns does not also provide
mod ones.

> > For instance the transform gets applied to test-case in pr43721-2.c.
> > bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
> > I wonder if we want to perform the transform for this case since control flow
> > may not always reach from quot = x / y to rem = x % y ?

See above, it's a cost question.  We do for sincos as computing sin or
cos if the other is already computed is cheap.  I would suggest the
same is true for div/mod.

> > c) A silly queston, I wonder if vec<gimple *> stmts  in convert_to_divmod() will
> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
> > the vector) ?
> > I assume that cse will take place before widening_mul pass executes
> > (since pass_fre/pass_fre executes earlier), and there will be no
> > duplicate gimple stmts ?

There are passes inbetween widen-mult and the last CSE that (while
not very likely) makes it possible (VRP jump threading).

> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt
> > with subcode  TRUNC_MOD_EXPR
> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
> > same operands ?

I wouldn't say so, see above.

> > d) I am not sure if I have correctly done expansion of divmod to rtx
> > in expand_DIVMOD() (I fear I may be missing
> > many checks that are in expmed.c:expand_divmod).

I'm not sure either, esp. about the libcall path and you choosing
a wider mode eventually (is this necessary?).  Did you make sure
this path works?

+  /* Compute stmt uids.  */
+  unsigned uid_no = 0;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p 
(gsi); gsi_next (&gsi))
+       {
+         gimple *stmt = gsi_stmt (gsi);
+         gimple_set_uid (stmt, uid_no++);
+       }
+    }

renumber_gimple_stmt_uids ()

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]