This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On 2 November 2015 at 18:31, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 2 November 2015 at 13:20, Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> > On 30 October 2015 at 15:57, Richard Biener <richard.guenther@gmail.com> wrote:
>> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>> >> <prathamesh.kulkarni@linaro.org> wrote:
>> >>> Hi,
>> >>> I have attached revamped version of Kugan's patch
>> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>> >>> Test-case: http://pastebin.com/QMfpXLD9
>> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
>> >>> Assembly: http://pastebin.com/kk2HZpvA
>> >>>
>> >>> The approach I took is similar to sincos pass, which involves two parts:
>> >>> a) divmod pass that transforms:
>> >>> t1 = a / b;
>> >>> t2 = a % b;
>> >>> to the following sequence:
>> >>> complex_tmp = DIVMOD (a, b);
>> >>> t1 = REALPART_EXPR(a);
>> >>> t2 = IMAGPART_EXPR(b);
>> >>>
>> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>> >>> expand_DIVMOD().
>> >>> I am not sure if I have done this correctly. I was somehow looking to
>> >>> reuse expand_divmod() but am not able to figure out how to do that
>> >>> (AFAIU expand_divmod() strictly returns either the quotient or
>> >>> remainder but never both which is what I want), so ended up with
>> >>> manually expanding to rtx.
>> >>>
>> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>> >>> understand the following:
>> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>> >>>           cfg_changed = true;
>> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>> >>
>> >> The call replacement might no longer throw.
>> >>
>> >>> A silly question, what options to pass to gcc to print statistics ? I
>> >>> added statistics_counter_event to keep track of number of calls
>> >>> inserted/used but not sure how to print them :/
>> >>
>> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
>> > Thanks, I can see it now -;)
>> >>
>> >>> I would be grateful for suggestions for improving the patch.
>> >>
>> >> First of all new functions need comments (expand_DIVMOD).
>> >>
>> >> Second - what's the reasoning for the pass placement seen?  I think
>> >> the transform is a lowering one, improving RTL expansion.  The
>> >> DIVMOD representation is harmful for GIMPLE optimizers.
>> >>
>> >> Third - I'd have integrated this with an existing pass - we have another
>> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> >> Please piggy-back on it.
>> >>
>> >> You seem to do the transform unconditionally even if the target has
>> >> instructions for division and modulus but no instruction for divmod
>> >> in which case you'll end up emitting a library call in the end instead
>> >> of a div and mod instruction.  I think the transform should be gated on
>> >> missing div/mod instructions or the availability of a divmod one.
>> >>
>> >> +         if (is_gimple_assign (stmt)
>> >> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_binary)
>> >> +           {
>> >> +             if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> >> +               cfg_changed |= execute_divmod_1 (stmt);
>> >>
>> >> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
>> >> which seems to be less common and thus should cut down compile-time.
>> >>
>> >> +  /* Case 2: When both are in top_bb */
>> >> +  else if (op1_def_in_bb && op2_def_in_bb)
>> >> +    {
>> >> +      /* FIXME: better way to compare iterators?.  */
>> >> +      gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> >> def_stmt_op1, def_stmt_op2);
>> >>
>> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> >> this test efficiently
>> >> you need stmt UIDs computed.
>> >>
>> >> I miss a testcase (or two).
>> > I have tried to address your comments in the attached patch.
>> oops, unsigned uid_no = 0; should be outside the loop in
>> pass_optimize_widening_mul::execute.
>> Done in this version of the patch.
>>
>> Thanks,
>> Prathamesh
>> > Could you please review it for me ?
>
> For the testcases you need a new target check for divmod.
Sorry I didn't understand.
Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?
>
>> > I have a few questions:
>> >
>> > a) Is the check for availability of divmod correct ?
>
> You simply want optab_handler (tab, mode) != CODE_FOR_nothing
>
> The libfunc is always available.
Um I am probably missing something, I thought libfuncs are initialized
by init_optabs()
but the target can override/delete that with the target hook
targetm.init_libfuncs () ?
On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX.
>
> When looking at TRUNC_DIV_EXPR you should also exclude
> the case where TYPE_OVERFLOW_TRAPS (type) as that should
> expand using the [su]divv optabs (no trapping overflow
> divmod optab exists).
>
>> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
>> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == bb1).
>> > However I wonder if we should also check that if bb0 dominates bb1
>> > then bb1 should
>> > post-dominate bb0 ?
>
> Depends on how expensive divmod is compared to div or mod.  An alternative
> transform would be to look whether [us]mod optabs are available and if not
> implement modulo via div and subtract, CSEing the division itself.  Not
> sure if any target that provides div patterns does not also provide
> mod ones.
>
>> > For instance the transform gets applied to test-case in pr43721-2.c.
>> > bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
>> > I wonder if we want to perform the transform for this case since control flow
>> > may not always reach from quot = x / y to rem = x % y ?
>
> See above, it's a cost question.  We do for sincos as computing sin or
> cos if the other is already computed is cheap.  I would suggest the
> same is true for div/mod.
>
>> > c) A silly queston, I wonder if vec<gimple *> stmts  in convert_to_divmod() will
>> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
>> > the vector) ?
>> > I assume that cse will take place before widening_mul pass executes
>> > (since pass_fre/pass_fre executes earlier), and there will be no
>> > duplicate gimple stmts ?
>
> There are passes inbetween widen-mult and the last CSE that (while
> not very likely) makes it possible (VRP jump threading).
>
>> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt
>> > with subcode  TRUNC_MOD_EXPR
>> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
>> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
>> > same operands ?
>
> I wouldn't say so, see above.
Thanks for the explanation.
>
>> > d) I am not sure if I have correctly done expansion of divmod to rtx
>> > in expand_DIVMOD() (I fear I may be missing
>> > many checks that are in expmed.c:expand_divmod).
>
> I'm not sure either, esp. about the libcall path and you choosing
> a wider mode eventually (is this necessary?).  Did you make sure
> this path works?
Removed choosing widening modes in this patch.
During the widening_mul pass it checks is divmod is available for the given mode
and only then does the transform. It probably doesn't make sense to check
for a wider mode during expand_DIVMOD, since optab_libfunc should be
available for divmod with the given mode ?
>
> +  /* Compute stmt uids.  */
> +  unsigned uid_no = 0;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
> (gsi); gsi_next (&gsi))
> +       {
> +         gimple *stmt = gsi_stmt (gsi);
> +         gimple_set_uid (stmt, uid_no++);
> +       }
> +    }
>
> renumber_gimple_stmt_uids ()
Thanks.

The attached patch has following additional changes:
a) Fixes ICE when one of the operands was constant (unconditionally
used SSA_NAME_DEF_STMT/ FOR_EACH_IMM_USE_FAST on constant tree
operands).
b) Adds use_stmt to vector_stmts only if the operands match and in the
same order.
I was incorrectly applying the transform to the following case: t1 = x
/ y; t2 = y % x.

Thanks,
Prathamesh
>
> Richard.

Attachment: patch-v4.diff
Description: Text document

Attachment: ChangeLog.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]