This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PR82479] missing popcount builtin detection


On Thu, Jun 7, 2018 at 2:52 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>
> Hi Richard,
>
> Thanks for the review.
>
> On 5 June 2018 at 21:25, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Tue, Jun 5, 2018 at 10:59 AM Kugan Vivekanandarajah
> > <kugan.vivekanandarajah@linaro.org> wrote:
> >>
> >> Hi Richard,
> >>
> >> Thanks for the review.
> >>
> >> On 1 June 2018 at 23:12, Richard Biener <richard.guenther@gmail.com> wrote:
> >> > On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> >> >>
> >> >> On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah
> >> >> <kugan.vivekanandarajah@linaro.org> wrote:
> >> >> > Hi Bin,
> >> >> >
> >> >> > Thanks a lo for the review.
> >> >> >
> >> >> > On 1 June 2018 at 03:45, Bin.Cheng <amker.cheng@gmail.com> wrote:
> >> >> >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah
> >> >> >> <kugan.vivekanandarajah@linaro.org> wrote:
> >> >> >>> Hi Bin,
> >> >> >>>
> >> >> >>> Thanks for the review. Please find the revised patch based on the
> >> >> >>> review comments.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Kugan
> >> >> >>>
> >> >> >>> On 17 May 2018 at 19:56, Bin.Cheng <amker.cheng@gmail.com> wrote:
> >> >> >>>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
> >> >> >>>> <kugan.vivekanandarajah@linaro.org> wrote:
> >> >> >>>>> Hi Richard,
> >> >> >>>>>
> >> >> >>>>> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:
> >> >> >>>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
> >> >> >>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
> >> >> >>>>>>> Hi Richard,
> >> >> >>>>>>>
> >> >> >>
> >> >> >> Hi,
> >> >> >> Thanks very much for working.
> >> >> >>
> >> >> >>> +/* Utility function to check if OP is defined by a stmt
> >> >> >>> +   that is a val - 1.  If that is the case, set it to STMT.  */
> >> >> >>> +
> >> >> >>> +static bool
> >> >> >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt)
> >> >> >> This is checking if op is defined as val - 1, so name it as
> >> >> >> ssa_defined_by_minus_one_stmt_p?
> >> >> >>
> >> >> >>> +{
> >> >> >>> +  if (TREE_CODE (op) == SSA_NAME
> >> >> >>> +      && (*stmt = SSA_NAME_DEF_STMT (op))
> >> >> >>> +      && is_gimple_assign (*stmt)
> >> >> >>> +      && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR)
> >> >> >>> +      && val == gimple_assign_rhs1 (*stmt)
> >> >> >>> +      && integer_minus_onep (gimple_assign_rhs2 (*stmt)))
> >> >> >>> +    return true;
> >> >> >>> +  else
> >> >> >>> +    return false;
> >> >> >> You can simply return the boolean condition.
> >> >> > Done.
> >> >> >
> >> >> >>
> >> >> >>> +}
> >> >> >>> +
> >> >> >>> +/* See if LOOP is a popcout implementation of the form
> >> >> >> ...
> >> >> >>> +  rhs1 = gimple_assign_rhs1 (and_stmt);
> >> >> >>> +  rhs2 = gimple_assign_rhs2 (and_stmt);
> >> >> >>> +
> >> >> >>> +  if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))
> >> >> >>> +    rhs1 = rhs2;
> >> >> >>> +  else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))
> >> >> >>> +    ;
> >> >> >>> +  else
> >> >> >>> +    return false;
> >> >> >>> +
> >> >> >>> +  /* Check the recurrence.  */
> >> >> >>> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
> >> >> >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true?  Please
> >> >> >> use rhs1 directly.
> >> >> >
> >> >> > Done.
> >> >> >>> +  gimple *src_phi = SSA_NAME_DEF_STMT (rhs2);
> >> >> >> I think this is checking wrong thing and is redundant.  Either rhs2
> >> >> >> equals to rhs1 or is defined as (rhs1 - 1).  For (rhs2 == rhs1) case,
> >> >> >> the check duplicates checking on phi; for the latter, it's never a PHI
> >> >> >> stmt and shouldn't be checked.
> >> >> >>
> >> >> >>> +  if (gimple_code (phi) != GIMPLE_PHI
> >> >> >>> +      || gimple_code (src_phi) != GIMPLE_PHI)
> >> >> >>> +    return false;
> >> >> >>> +
> >> >> >>> +  dest = gimple_assign_lhs (count_stmt);
> >> >> >>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
> >> >> >>> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
> >> >> >>> +  if (adjust)
> >> >> >>> +    iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
> >> >> >>> +            build_call_expr (fn, 1, src),
> >> >> >>> +            build_int_cst (TREE_TYPE (dest), 1));
> >> >> >>> +  else
> >> >> >>> +    iter = build_call_expr (fn, 1, src);
> >> >> >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as
> >> >> >> niters type.  Though unsigned type is unnecessary in this case, but
> >> >> >> better to follow existing behavior?
> >> >> >
> >> >> > Done.
> >> >> >>
> >> >> >>> +  max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
> >> >> >> As richi suggested, max should be the number of bits in type of IV.
> >> >> >>
> >> >> >>> +
> >> >> >>> +  niter->assumptions = boolean_false_node;
> >> >> >> Redundant.
> >> >> >
> >> >> > Not sure I understand. If I remove this,  I am getting ICE
> >> >> > (segmentation fault). What is the expectation here?
> >> >> Is it a simple typo?  Because assumptions is set to boolean_true_node
> >> >> just 5 lines below?
> >> >> The niters part looks good for me with this change.  You may need
> >> >> richi's approval for other parts?
> >> >
> >> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> >> > index bdf9ba1..4dc156a 100644
> >> > --- a/gcc/ipa-fnsummary.c
> >> > +++ b/gcc/ipa-fnsummary.c
> >> > @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct
> >> > ipa_node_params *info,
> >> >                                                nonconstant_names);
> >> >        return p2.or_with (summary->conds, p1);
> >> >      }
> >> > +  else if (TREE_CODE (expr) == CALL_EXPR)
> >> > +    {
> >> > +      return false;
> >> > +    }
> >> >    else
> >> >      {
> >> >        debug_tree (expr);
> >> >
> >> > I'd return true here instead - we don't want to track popcount() in
> >> > predicates.  Also unnecessary braces.
> >> >
> >> > @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
> >> >        if (!integer_pow2p (TREE_OPERAND (expr, 1)))
> >> >         return true;
> >> >      }
> >> > +  if (code == CALL_EXPR
> >> > +      && (fndecl = get_callee_fndecl (expr))
> >> > +      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
> >> > +      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
> >> > +    return false;
> >> >
> >> > please use
> >> >
> >> >     if (code == CALL_EXPR)
> >> >      {
> >> >          if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
> >> >            return true;
> >> >          FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
> >> >              if (expression_expensive_p (arg))
> >> >                 return true;
> >> >          return false;
> >> >      }
> >> >
> >> > instead.
> >> Done.
> >>
> >> >
> >> > +  /* Check "b = b & (b - 1)" is calculated.  */
> >> > +  if (!is_gimple_assign (and_stmt)
> >> > +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
> >> > +    return false;
> >> > +
> >> > +  rhs1 = gimple_assign_rhs1 (and_stmt);
> >> > +  rhs2 = gimple_assign_rhs2 (and_stmt);
> >> > +
> >> > +  if (ssa_defined_by_minus_one_stmt_p (rhs1, rhs2, &and_minus_one))
> >> > +    std::swap (rhs1, rhs2);
> >> > +  else if (ssa_defined_by_minus_one_stmt_p (rhs2, rhs1, &and_minus_one))
> >> > +    ;
> >> > +  else
> >> > +    return false;
> >> >
> >> > so the powers of match-and-simplify let you add sth like the following to
> >> > match.pd:
> >> >
> >> > #if GIMPLE
> >> > (match (b_bit_and_b_minus_one @0)
> >> >  (bit_and:c @0 (plus @0 integer_minus_onep)))
> >> > #endif
> >> >
> >> > and then use it like
> >> >
> >> >    extern bool gimple_b_bit_and_b_minus_one (tree, tree *, tree (*)(tree));
> >> >    if (!gimple_b_bit_and_b_minus_one (gimple_assign_lhs (and_stmt),
> >> > &rhs1, NULL))
> >> >       return false;
> >> >
> >> > note you need to explicitely declare the matcher as there's no header
> >> > file generated
> >> > for them.  It would be also the first user for this "feature" and at
> >> > some point I
> >> > considered using in-line source markup (and sth like GTY_FILES for
> >> > what files to scan)
> >> > to gather patterns.
> >> >
> >> > +  /* Check the loop closed SSA definition for just the variable c defined in
> >> > +     loop.  */
> >> > +  basic_block bb = exit->dest;
> >> > +  for (gphi_iterator gpi = gsi_start_phis (bb);
> >> > +       !gsi_end_p (gpi); gsi_next (&gpi))
> >> > +    {
> >> > +      phi = gpi.phi ();
> >> > +      count++;
> >> > +    }
> >> >
> >> > I don't understand why you are checking for this - isn't the number of
> >> > iterations
> >> I am trying to be overly conservative. I have removed it now and adjusted.
> >>
> >> > independent on the rest of the loop-closed PHIs?  That is, even for just
> >> >
> >> >  while (b) { b = b & (b-1); }
> >> >
> >> > the number of iterations is popcount (b).  So it's enough to check
> >> > the exit condition?  In fact you are coming from number_of_iterations_exit
> >> > and thus are asked for a specific exit and thus
> >> >
> >> > +  if (!(exit = single_exit (loop)))
> >> > +    return false;
> >> Ok, changed it.
> >>
> >>
> >> > is premature.  That is, for
> >> >
> >> >   while (b) { b = b & (b - 1); if (--c == 0) break; }
> >> >
> >> > still should have popcount (b) iterations for the exit involving the
> >> > while (b) test.
> >> >
> >> > So please simplify (and thus genericize) the code accordingly.  Do so
> >> > without the match.pd trick for now, I think we can simplify the current
> >> > beast down enough to not need the factored out function.
> >> I am not using match.pd changes as you have asked. Please let me know
> >> if you want that to be included as well.
> >>
> >> >
> >> > You then also can handle replacing
> >> >
> >> > int c = 0;
> >> > while (b) { b = b & (b-1); c+=3; }
> >> >
> >> > with 3 * popcount (b);
> >> Made to handle this too. But, there are still cases we don't handle. I
> >> am not sure if it is worth the complexity handling all the possible
> >> cases.
> >>
> >> Is the attached patch looks better now?
> >
> > +  /* Check loop terminating branch is like
> > +     if (b != 0).  */
> > +  gimple *stmt = last_stmt (loop->header);
> > +  if (!stmt
> >
> > this should now look at exit->src instead of loop->header.
> Done.
>
> >
> > +      || !zerop (gimple_cond_rhs (stmt))
> >
> > use integer_zerop
> Done.
>
> >
> > +  gimple *count_stmt = SSA_NAME_DEF_STMT (rhs1);
> >
> > you still didn't remove the whole count-stmt stuff.  It's unrelated
> > and is not required at all.  Only the GIMPLE_COND lhs def-use
> > cycle is interesting for niter purposes.
> Changed it.
>
> >
> > Please adjust accordingly.
> >
> > @@ -2441,7 +2578,11 @@ number_of_iterations_exit (struct loop *loop, edge exit,
> >    gcond *stmt;
> >    if (!number_of_iterations_exit_assumptions (loop, exit, niter,
> >                                               &stmt, every_iteration))
> > -    return false;
> > +    {
> > +      if (number_of_iterations_popcount (loop, exit, niter))
> > +       return true;
> > +      return false;
> > +    }
> >
> > this should be done in number_of_iterations_exit_assumptions.  I realize
> > you want to do it only if that fails but in that process you have to be
> > sure to properly handle every_iteration & friends which is much easier
> > to do in number_of_iterations_exit_assumptions.  So please put it
> > where that fails for this kind of IVs:
> >
> >   tree iv0_niters = NULL_TREE;
> >   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
> >                               op0, &iv0, safe ? &iv0_niters : NULL, false))
> > ^^^ here
> >     return false;
>
> Done.
>
> Is this OK now. Regression tested and bootstrapped on x86-linux-gnu
> with no new regressions.

Please pass down 'code' from number_of_iterations_exit_assumptions becaue
that correctly does

  /* We want the condition for staying inside loop.  */
  code = gimple_cond_code (stmt);
  if (exit->flags & EDGE_TRUE_VALUE)
    code = invert_tree_comparison (code, false);

alternatively if you want to keep the function more isolated from the actual
caller you'd have to do sth like the above in number_of_iterations_popcount.

+  gphi_iterator gpi = gsi_start_phis (exit->dest);
+  if (gsi_end_p (gpi))
+    return false;
+  rhs1 = gimple_phi_arg_def (gpi.phi (), exit->dest_idx);
+  if (TREE_CODE (rhs1) != SSA_NAME)
+    return false;

drop this, it's not necessary.

+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */

the loop header or loop latch

+  if (gimple_code (and_stmt) == GIMPLE_PHI
+      && gimple_bb (and_stmt) == exit->src

this would be loop->header still, not exit->src, otherwise the
gimple_phi_arg_def call on the latch-edge might ICE.

+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (rhs1);
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;

I think you need to verify that the latch edge argument of this
PHI has the proper value which should be the LHS of the
and_stmt.

What would make the code clearer is probably naming
variables after a pattern you put in comments.  For example

 /* We match
      # b_1 = PHI <..., b_2>
      tem_3 = b_1 + -1;
      b_2 = b_1 & tem_3;  */

and then have local vars b1 and b2 instead of rhs1/rhs2, etc.

The overall function comment is now misleading since
it still mentions 'c'.

+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
+  tree call = build_call_expr (fn, 1, src);

POPCOUNT expects an 'unsigned int' argument so you have to choose
POPCOUNT, POPCOUNTL or POPCOUNTLL based on the type of SRC
and also convert it properly (from say signed int to unsigned int).  The
testcases all use 'long' but no actual value that would make a difference
if you use popcount ((int)long-value).

Note there's an internal functiuon as well but it is only supported if the
target natively supports popcount which isn't what is desired here.
So you have to deal with the type of src, making sure you zero-extent
it if you need extension (for a loop that works on singed char or short)
because sign-extension can change the popcount result (likely you'll
need a GIMPLE testcase to trigger a bogus transform here).

That said, for simplicity and not complicating this even further - heh - I
suggest you do

  if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION (integer_type_node))
    fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
 else if (TYPE_PRECISION (TREE_TYPE (src)) == TYPE_PRECISION
(long_integer_type_node))
   fn = builtin_decl_implicit (BUILT_IN_POPCOUNTL);
 else if
    ... LL case

  /* ???  Support promoting char/short to int.  */
  if (!fn)
    return false;

  src = fold_convert (unsigned_type_for (TREE_TYPE (src)), src);

+  tree utype = unsigned_type_for (TREE_TYPE (call));
+  if (adjust)
+    iter = fold_build2 (MINUS_EXPR, utype,
+                       build_call_expr (fn, 1, src),

you need to convert the call result to utype, thus wrap it
in fold_convert (utype, build_call_expr (...)).

+                       build_int_cst (utype, 1));

so if adjust is true then you have to guard against the initial value
being zero because then your niter is -1U.  You do this by
setting may_be_zero to

  fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst
(TREE_TYPE (src)));

+  else
+    iter = fold_convert (utype, build_call_expr (fn, 1, src));
+  max = int_cst_value (TYPE_MAX_VALUE (utype));

Surely max is not TYPE_MAX_VALUE of utype but instead
TYPE_PRECISION (TREE_TYPE (src)) (of the original
unpromoted src).  In case of adjust == true it is even one less.

Sorry for the many (correctness) issues again :/

Thanks,
Richard.

+
+  niter->assumptions = boolean_false_node;
+  niter->control.base = NULL_TREE;
+  niter->control.step = NULL_TREE;
+  niter->control.no_overflow = false;
+  niter->niter = iter;
+  niter->assumptions = boolean_true_node;
+  niter->may_be_zero = boolean_false_node;
+  niter->max = max;
+  niter->bound = NULL_TREE;
+  niter->cmp = ERROR_MARK;



> Thanks,
> Kugan
> >
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Kugan
> >>
> >>
> >> >
> >> > Richard.
> >> >
> >> >> Thanks,
> >> >> bin
> >> >> >
> >> >> >>> +  niter->control.base = NULL_TREE;
> >> >> >>> +  niter->control.step = NULL_TREE;
> >> >> >>> +  niter->control.no_overflow = false;
> >> >> >>> +  niter->niter = iter;
> >> >> >>> +  niter->assumptions = boolean_true_node;
> >> >> >>> +  niter->may_be_zero = boolean_false_node;
> >> >> >>> +  niter->max = max;
> >> >> >>> +  niter->bound = NULL_TREE;
> >> >> >>> +  niter->cmp = ERROR_MARK;
> >> >> >>> +  return true;
> >> >> >>> +}
> >> >> >>> +
> >> >> >>> +
> >> >> >> Appology if these are nitpickings.
> >> >> > Thanks for the review. I am happy to make the changes needed to get it
> >> >> > to how it should be :)
> >> >> >
> >> >> > Thanks,
> >> >> > Kugan
> >> >> >
> >> >> >>
> >> >> >> Thanks,
> >> >> >> bin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]