This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote:

> On 2 November 2015 at 18:31, Richard Biener <rguenther@suse.de> wrote:
> > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 2 November 2015 at 13:20, Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> > On 30 October 2015 at 15:57, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
> >> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> >>> Hi,
> >> >>> I have attached revamped version of Kugan's patch
> >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
> >> >>> Test-case: http://pastebin.com/QMfpXLD9
> >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
> >> >>> Assembly: http://pastebin.com/kk2HZpvA
> >> >>>
> >> >>> The approach I took is similar to sincos pass, which involves two parts:
> >> >>> a) divmod pass that transforms:
> >> >>> t1 = a / b;
> >> >>> t2 = a % b;
> >> >>> to the following sequence:
> >> >>> complex_tmp = DIVMOD (a, b);
> >> >>> t1 = REALPART_EXPR(a);
> >> >>> t2 = IMAGPART_EXPR(b);
> >> >>>
> >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
> >> >>> expand_DIVMOD().
> >> >>> I am not sure if I have done this correctly. I was somehow looking to
> >> >>> reuse expand_divmod() but am not able to figure out how to do that
> >> >>> (AFAIU expand_divmod() strictly returns either the quotient or
> >> >>> remainder but never both which is what I want), so ended up with
> >> >>> manually expanding to rtx.
> >> >>>
> >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
> >> >>> understand the following:
> >> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> >> >>>           cfg_changed = true;
> >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
> >> >>
> >> >> The call replacement might no longer throw.
> >> >>
> >> >>> A silly question, what options to pass to gcc to print statistics ? I
> >> >>> added statistics_counter_event to keep track of number of calls
> >> >>> inserted/used but not sure how to print them :/
> >> >>
> >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
> >> > Thanks, I can see it now -;)
> >> >>
> >> >>> I would be grateful for suggestions for improving the patch.
> >> >>
> >> >> First of all new functions need comments (expand_DIVMOD).
> >> >>
> >> >> Second - what's the reasoning for the pass placement seen?  I think
> >> >> the transform is a lowering one, improving RTL expansion.  The
> >> >> DIVMOD representation is harmful for GIMPLE optimizers.
> >> >>
> >> >> Third - I'd have integrated this with an existing pass - we have another
> >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
> >> >> Please piggy-back on it.
> >> >>
> >> >> You seem to do the transform unconditionally even if the target has
> >> >> instructions for division and modulus but no instruction for divmod
> >> >> in which case you'll end up emitting a library call in the end instead
> >> >> of a div and mod instruction.  I think the transform should be gated on
> >> >> missing div/mod instructions or the availability of a divmod one.
> >> >>
> >> >> +         if (is_gimple_assign (stmt)
> >> >> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_binary)
> >> >> +           {
> >> >> +             if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
> >> >> +               cfg_changed |= execute_divmod_1 (stmt);
> >> >>
> >> >> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
> >> >> which seems to be less common and thus should cut down compile-time.
> >> >>
> >> >> +  /* Case 2: When both are in top_bb */
> >> >> +  else if (op1_def_in_bb && op2_def_in_bb)
> >> >> +    {
> >> >> +      /* FIXME: better way to compare iterators?.  */
> >> >> +      gimple_stmt_iterator gsi = get_later_stmt (top_bb,
> >> >> def_stmt_op1, def_stmt_op2);
> >> >>
> >> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
> >> >> this test efficiently
> >> >> you need stmt UIDs computed.
> >> >>
> >> >> I miss a testcase (or two).
> >> > I have tried to address your comments in the attached patch.
> >> oops, unsigned uid_no = 0; should be outside the loop in
> >> pass_optimize_widening_mul::execute.
> >> Done in this version of the patch.
> >>
> >> Thanks,
> >> Prathamesh
> >> > Could you please review it for me ?
> >
> > For the testcases you need a new target check for divmod.
> Sorry I didn't understand.
> Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?

Yes.

> >
> >> > I have a few questions:
> >> >
> >> > a) Is the check for availability of divmod correct ?
> >
> > You simply want optab_handler (tab, mode) != CODE_FOR_nothing
> >
> > The libfunc is always available.
> Um I am probably missing something, I thought libfuncs are initialized
> by init_optabs()
> but the target can override/delete that with the target hook
> targetm.init_libfuncs () ?
> On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX.

Hum, I see.  I thought we always have the libfunc if we don't have
the instruction.

> >
> > When looking at TRUNC_DIV_EXPR you should also exclude
> > the case where TYPE_OVERFLOW_TRAPS (type) as that should
> > expand using the [su]divv optabs (no trapping overflow
> > divmod optab exists).
> >
> >> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
> >> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == bb1).
> >> > However I wonder if we should also check that if bb0 dominates bb1
> >> > then bb1 should
> >> > post-dominate bb0 ?
> >
> > Depends on how expensive divmod is compared to div or mod.  An alternative
> > transform would be to look whether [us]mod optabs are available and if not
> > implement modulo via div and subtract, CSEing the division itself.  Not
> > sure if any target that provides div patterns does not also provide
> > mod ones.
> >
> >> > For instance the transform gets applied to test-case in pr43721-2.c.
> >> > bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
> >> > I wonder if we want to perform the transform for this case since control flow
> >> > may not always reach from quot = x / y to rem = x % y ?
> >
> > See above, it's a cost question.  We do for sincos as computing sin or
> > cos if the other is already computed is cheap.  I would suggest the
> > same is true for div/mod.
> >
> >> > c) A silly queston, I wonder if vec<gimple *> stmts  in convert_to_divmod() will
> >> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
> >> > the vector) ?
> >> > I assume that cse will take place before widening_mul pass executes
> >> > (since pass_fre/pass_fre executes earlier), and there will be no
> >> > duplicate gimple stmts ?
> >
> > There are passes inbetween widen-mult and the last CSE that (while
> > not very likely) makes it possible (VRP jump threading).
> >
> >> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt
> >> > with subcode  TRUNC_MOD_EXPR
> >> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
> >> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
> >> > same operands ?
> >
> > I wouldn't say so, see above.
> Thanks for the explanation.
> >
> >> > d) I am not sure if I have correctly done expansion of divmod to rtx
> >> > in expand_DIVMOD() (I fear I may be missing
> >> > many checks that are in expmed.c:expand_divmod).
> >
> > I'm not sure either, esp. about the libcall path and you choosing
> > a wider mode eventually (is this necessary?).  Did you make sure
> > this path works?
> Removed choosing widening modes in this patch.
> During the widening_mul pass it checks is divmod is available for the given mode
> and only then does the transform. It probably doesn't make sense to check
> for a wider mode during expand_DIVMOD, since optab_libfunc should be
> available for divmod with the given mode ?

Yes.

> >
> > +  /* Compute stmt uids.  */
> > +  unsigned uid_no = 0;
> > +  FOR_EACH_BB_FN (bb, fun)
> > +    {
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
> > (gsi); gsi_next (&gsi))
> > +       {
> > +         gimple *stmt = gsi_stmt (gsi);
> > +         gimple_set_uid (stmt, uid_no++);
> > +       }
> > +    }
> >
> > renumber_gimple_stmt_uids ()
> Thanks.
> 
> The attached patch has following additional changes:
> a) Fixes ICE when one of the operands was constant (unconditionally
> used SSA_NAME_DEF_STMT/ FOR_EACH_IMM_USE_FAST on constant tree
> operands).
> b) Adds use_stmt to vector_stmts only if the operands match and in the
> same order.
> I was incorrectly applying the transform to the following case: t1 = x
> / y; t2 = y % x.

+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available
+ b) If optab_handler doesn't exist, Generate call to optab_libfunc for 
udivmod/sdivmod.  */

long line

+      expand_twoval_binop (tab, op0, op1, quotient, remainder, 
TYPE_UNSIGNED (type));
+
+  

likewise

+  machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE 
(mode), MODE_INT);

likewise (please verify elsewhere yourself).  Our column limit is 80.

Testcases still need adjustment with an effective tagret.

Btw, did you investigate code gen differences on x86_64/i586?  That
target expands all divisions/modulo ops via divmod, relying on CSE
solely as the HW always computes both div and mod (IIRC).

+  if ((optab_handler (tab, mode) == CODE_FOR_nothing) && !optab_libfunc 
(tab, mode))
+    return false;

As originally said we shouldn't generate divmod library calls if
the target has non-library div or mod implementations.
Thus the proper check would be to

       && (!optab_libfunc (tab ...)
           || optab_handler (div) || optab_handler (mod))))

+  /* FIXME: Is it safe to assume both operands cannot be constant
+     since constant folding would have taken place ?  */
+  gcc_assert (! (TREE_CONSTANT (op1) && TREE_CONSTANT (op2)));

On trapping ops 1/0 might prevail for example.  So I'd rather
have a runtime check if op is an SSA_NAME

I'm not sure I would even look at a / 3; a % 3 or 3 / a; 3 % a
though (they tend to have special expansions).

+  FOR_EACH_IMM_USE_FAST (use_p, use_iter, op)

use FOR_EACH_IMM_USE_STMT to only visit stmts once.

+    {
+      gimple *use_stmt = USE_STMT (use_p);
+      if (is_gimple_assign (use_stmt)
+         && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+        && !TYPE_OVERFLOW_TRAPS (TREE_TYPE (gimple_assign_lhs 
(use_stmt))))

Please check this upfront - you know the type from the stmt you are
starting the visiting from.

+       {
+         tree u_op1 = gimple_assign_rhs1 (use_stmt);
+         tree u_op2 = gimple_assign_rhs2 (use_stmt);
+
+         if ((operand_equal_p (op1, u_op1, 0)) && operand_equal_p (op2, 
u_op2, 0))
+          maybe_record_divmod (stmts, top_bb, use_stmt);

+  /* If either operand is a constant, treat it's "definition" as
+     outside of top_bb.  */
+
+  if (!TREE_CONSTANT (op1))
+    {

check TREE_CODE (op1) == SSA_NAME here and in related places.

You seem to insert the call at the place of the definitions of
the division/modulo operands rather than either at the point
of the division or the modulo.  Don't do that.  Insert at
the topmost division/modulo stmt.

+  for (unsigned i = 0; i < stmts.length (); ++i)
+    {
+      tree rhs;
+      use_stmt = stmts[i];
+
+      switch (gimple_assign_rhs_code (use_stmt))
+       {
+         case TRUNC_DIV_EXPR:
+           rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         case TRUNC_MOD_EXPR:
+           rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         default:
+           gcc_unreachable ();
+       }
+
+        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs 
(use_stmt), rhs);
+        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);

Ick.  Please use

    gimple_set_rhs_from_tree (use_stmt, res);
    update_stmt (use_stmt);
    if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
      cfg_changed = true;

+  free_dominance_info (CDI_DOMINATORS);

do not free dominators.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]