This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC [1/2] divmod transform


On 27 May 2016 at 15:45, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
>
>> On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:
>> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> >> > <prathamesh.kulkarni@linaro.org> wrote:
>> >> >> >> >> Hi,
>> >> >> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> >> >> based on Kugan's patch.
>> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> >> >> having same operands to divmod representation, so we can cse computation of mod.
>> >> >> >> >>
>> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> >> is transformed to:
>> >> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >> >>
>> >> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> >> The rationale for introducing the hook is that different targets have
>> >> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> >> to generate call to target-specific divmod.
>> >> >> >> >> Ports should define this hook if:
>> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> >> >> are of DImode.
>> >> >> >> >>
>> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> >> Does this patch look OK ?
>> >> >> >> >
>> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> >> > index 6b4601b..e4a021a 100644
>> >> >> >> > --- a/gcc/targhooks.c
>> >> >> >> > +++ b/gcc/targhooks.c
>> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> >> >> > machine_mode, optimization_type)
>> >> >> >> >    return true;
>> >> >> >> >  }
>> >> >> >> >
>> >> >> >> > +void
>> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> >> >> > +                              rtx op0, rtx op1,
>> >> >> >> > +                              rtx *quot_p, rtx *rem_p)
>> >> >> >> >
>> >> >> >> > functions need a comment.
>> >> >> >> >
>> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
>> >> >> >> > case we could avoid the target hook.
>> >> >> >> Well I would prefer adding the hook because that's more easier -;)
>> >> >> >> Would it be ok for now to go with the hook ?
>> >> >> >> >
>> >> >> >> > +      /* If target overrides expand_divmod_libfunc hook
>> >> >> >> > +        then perform divmod by generating call to the target-specifc divmod
>> >> >> >> > libfunc.  */
>> >> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> >> >> >> > +       return true;
>> >> >> >> > +
>> >> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> >> > +      return (mode == DImode && unsignedp);
>> >> >> >> >
>> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> >> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> >> > targetm.expand_divmod_libfunc
>> >> >> >> > is not the default we expect the target to handle all modes?
>> >> >> >> Ah indeed, the check for DImode is unnecessary.
>> >> >> >> However I suppose the check for unsignedp should be there,
>> >> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>> >> >> >
>> >> >> > The optab libfunc for sdivmod should be NULL in that case.
>> >> >> Ah indeed, thanks.
>> >> >> >
>> >> >> >> >
>> >> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >> >> >
>> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
>> >> >> >> > supports a specific operation (for example SImode divmod would use DImode
>> >> >> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> >> >> Thanks for pointing out. So if a target does not support divmod
>> >> >> >> libfunc for a mode
>> >> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> >> >> >> perform divmod on the wider-mode, and then cast result back to the
>> >> >> >> original mode.
>> >> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?
>> >> >> >
>> >> >> > I think that you should conservatively handle the div_optab query, thus if
>> >> >> > the target has a HW division in a wider mode don't use the divmod IFN.
>> >> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> >> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
>> >> >> > out if that is available.
>> >> >> Done.
>> >> >> >
>> >> >> >> > +  /* Disable the transform if either is a constant, since
>> >> >> >> > division-by-constant
>> >> >> >> > +     may have specialized expansion.  */
>> >> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> >> >> >> > +    return false;
>> >> >> >> >
>> >> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >> >> >> >
>> >> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> >> >> >> > +    return false;
>> >> >> >> >
>> >> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)
>> >> >> >> > before checking expensive stuff (target_supports_divmod_p).
>> >> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
>> >> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
>> >> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude
>> >> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> >> >> >> expand using the [su]divv optabs (no trapping overflow
>> >> >> >> divmod optab exists)."
>> >> >> >
>> >> >> > Ok, didn't remember that.
>> >> >> >
>> >> >> >> >
>> >> >> >> > +static bool
>> >> >> >> > +convert_to_divmod (gassign *stmt)
>> >> >> >> > +{
>> >> >> >> > +  if (!divmod_candidate_p (stmt))
>> >> >> >> > +    return false;
>> >> >> >> > +
>> >> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> >> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> >> >> >> > +
>> >> >> >> > +  vec<gimple *> stmts = vNULL;
>> >> >> >> >
>> >> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.
>> >> >> >> >
>> >> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >> > +       cfg_changed = true;
>> >> >> >> >
>> >> >> >> > note that this suggests you should check whether any of the stmts may throw
>> >> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and
>> >> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to
>> >> >> >> > the list of stmts to modify.
>> >> >> >> >
>> >> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over
>> >> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.
>> >> >> >> >
>> >> >> >> > Otherwise looks ok.
>> >> >> >> Done changes in this version. I am gathering mod uses same time as div uses,
>> >> >> >> so this imposes a constraint that mod dominates mod. I am not sure if
>> >> >> >> that's desirable.
>> >> >> >
>> >> >> > I think you also need a mod_seen variable now that you don't necessarily
>> >> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a
>> >> >> > constraint that mod dominates mod - it's just that the top_stmt needs
>> >> >> > to dominate all other uses that can be replaced with replacing top_stmt
>> >> >> > with a divmod.  It's just that the actual stmt set we choose may now
>> >> >> > depend on immediate uses order which on a second thought is bad
>> >> >> > as immediate uses order could be affected by debug stmts ... hmm.
>> >> >> >
>> >> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately
>> >> >> > and add a use_stmt != stmt check to the immediate use processing loop
>> >> >> > so that we don't end up adding it twice.
>> >> >> Well I wonder what will happen for the following case:
>> >> >> t1 = x / y;
>> >> >> if (cond)
>> >> >>   t2 = x % y;
>> >> >> else
>> >> >>   t3 = x % y;
>> >> >>
>> >> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",
>> >> >> use_stmt will not get added to stmts vector, since top_stmt and
>> >> >> use_stmt are not in same bb,
>> >> >> and bb's containing top_stmt and use_stmt don't dominate each other.
>> >> >> Not sure if this is practical case (I assume fre will hoist mod
>> >> >> outside if-else?)
>> >> >>
>> >> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen
>> >> >> shall not be required ?
>> >> >
>> >> > In that case mod_seen is not needed.  But the situation you say will
>> >> > still happen so I wonder if we'd need a better way of iterating over
>> >> > immediate uses, like first pushing all candidates into a worklist
>> >> > vector and then iterating over that until we find no more candidates.
>> >> >
>> >> > You can then also handle the case of more than one group of stmts
>> >> > (the pass currently doesn't iterate in any particular useful order
>> >> > over BBs).
>> >> IIUC, we want to perform the transform if:
>> >> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and
>> >> having same operands as stmt.
>> >> ii) top_stmt dominates all other stmts with code
>> >> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.
>> >>
>> >> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,
>> >> and then iterate over all uses of top_stmt and add them to stmts vector
>> >> only if top_stmt dominates all the stmts with same operands as top_stmt
>> >> and have code trunc_div_expr/trunc_mod_expr.
>> >>
>> >> /* Get to top_stmt.  */
>> >> top_stmt = stmt;
>> >> top_bb = gimple_bb (stmt);
>> >>
>> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> >> {
>> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >>       && use_stmt has same operands as stmt)
>> >>     {
>> >>       if (gimple_bb (use_stmt) dominates top_bb)
>> >>         {
>> >>           top_bb = gimple_bb (use_stmt);
>> >>           top_stmt = use_stmt;
>> >>         }
>> >>       else if (gimple_bb (use_stmt) == top_stmt
>> >>                && gimple_uid (use_stmt) < top_stmt)
>> >>         top_stmt = use_stmt;
>> >>     }
>> >> }
>> >>
>> >> /* Speculatively consider top_stmt as dominating all other
>> >> div_expr/mod_expr stmts with same operands as stmt.  */
>> >> stmts.safe_push (top_stmt);
>> >>
>> >> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */
>> >> top_op1 = gimple_assign_rhs1 (top_stmt);
>> >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)
>> >> {
>> >>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR
>> >>       && use_stmt has same operands as top_stmt)
>> >>     {
>> >>       if (use_stmt == top_stmt)
>> >>         continue;
>> >>
>> >>       /* No top_stmt exits, do not proceed with transform  */
>> >>       if (top_bb does not dominate gimple_bb (use_stmt))
>> >>         return false;
>> >>
>> >>       stmts.safe_push (use_stmt);
>> >>     }
>> >> }
>> >>
>> >> For the case:
>> >> t1 = x / y;
>> >> if (cond)
>> >>   t2 = x % y;
>> >> else
>> >>   t3 = x % y;
>> >>
>> >> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set
>> >> top_stmt to "t1 = x / y"
>> >> Then it will walk all uses of top_stmt:
>> >> "t2 = x % y" -> dominated by top_stmt
>> >> "t3 = x % y" -> dominated by top_stmt
>> >> Since all stmts are dominated by top_stmt, we add all three stmts to
>> >> vector of stmts and proceed with transform.
>> >>
>> >> For the case where, top_stmt dominates original stmt but not all stmts:
>> >>
>> >> if (cond)
>> >>   t1 = x / y;
>> >> else
>> >> {
>> >>   t2 = x % y;
>> >>   return;
>> >> }
>> >>
>> >> t3 = x % y;
>> >>
>> >> Assuming stmt is "t3 = x % y",
>> >> Walking stmt uses will set top_stmt to "t1 = x / y";
>> >>
>> >> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not
>> >> dominated by top_stmt,
>> >> and hence don't do the transform.
>> >>
>> >> Does this sound reasonable ?
>> >
>> > Yes, that's reasonable.
>> Thanks, I have attached patch that implements above approach.
>> Does it look OK ?
>
> Please start the top-stmt search with
>
>   top_stmt = stmt;
>   top_bb = gimple_bb (stmt);
>
> this makes sure to process all stmts via the IL walk in case
> the uses have multiple independent "dominated" trees.  This also
> simplifies the loop body (no need to check for NULL).  This also
> makes mod_seen always true and you can compute div_seen in that
> loop as well.
Done.
Um I don't understand why setting top_stmt to NULL won't process all stmts ?
AFAIU it will have one extra iteration compared to initializing top_stmt to stmt
(since first iteration would initialize top_stmt to stmt assuming stmt
does not throw) ?
>
> Otherwise looks ok now.
>
>> The patch does not still handle the following case:
>> int f(int x, int y)
>> {
>>   extern int cond;
>>   int q, r;
>>
>>   if (cond)
>>     q = x % y;
>>   else
>>     q = x % y;
>>
>>   r = x % y;
>>   return q + r;
>> }
>>
>> In above case although the mod stmt is not dominated by either div
>> stmt, I suppose the transform
>> is still possible by inserting DIVMOD (x, y) before if-else ?
>
> Yeah, same for sincos where doing this requires some LCM algorithm.
Well I don't have a good approach for this.
I was thinking, before doing the divmod transform, we could walk
GIMPLE_COND of "diamond" shape
(having both arms), and check "then" bb and "else" bb have same div or
mod stmts and in that case put an artificial
same stmt above GIMPLE_COND.

So the above case would be transformed to:

int tmp = x / y;  // artificial top_stmt
if (cond)
  q = x / y;
else
  q = x / y;

r = x % y;
return q + r;

and then the divmod transform will see "tmp = x / y" as the topmost stmt.
Since top_stmt is artificially introduced, we will replace that with DIVMOD ifn
rather than inserting DIVMOD ifn above top_stmt as in other cases.
>
>> For the following test-case, I am surprised why CSE didn't take place before
>> widening_mul pass ?
>>
>> int
>> f_1 (int x, int y)
>> {
>>   int q = x / y;
>>   int r1 = 0, r2 = 0;
>>   if (cond)
>>     r1 = x % y;
>>   else
>>     r2 = x % y;
>>   return q + r1 + r2;
>> }
>
> This is not CSE but code hoisting which is not implemented on GIMPLE
> (see PR23286)
Ah right, thanks for pointing out the PR.

Thanks,
Prathamesh
>
>> The input to widening_mul pass is:
>> f_1 (int x, int y)
>> {
>>   int r2;
>>   int r1;
>>   int q;
>>   int cond.0_1;
>>   int _2;
>>   int _11;
>>
>>   <bb 2>:
>>   q_7 = x_5(D) / y_6(D);
>>   cond.0_1 = cond;
>>   if (cond.0_1 != 0)
>>     goto <bb 3>;
>>   else
>>     goto <bb 4>;
>>
>>   <bb 3>:
>>   r1_9 = x_5(D) % y_6(D);
>>   goto <bb 5>;
>>
>>   <bb 4>:
>>   r2_10 = x_5(D) % y_6(D);
>>
>>   <bb 5>:
>>   # r1_3 = PHI <r1_9(3), 0(4)>
>>   # r2_4 = PHI <0(3), r2_10(4)>
>>   _2 = r1_3 + q_7;
>>   _11 = _2 + r2_4;
>>   return _11;
>>
>> }
>>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Richard.
>> >> >
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Attachment: divmod-part1_5.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]