This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On Wed, 27 Jan 2016, Prathamesh Kulkarni wrote:

> On 11 November 2015 at 19:04, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
> >
> >> On 11 November 2015 at 16:03, Richard Biener <rguenther@suse.de> wrote:
> >> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 10 November 2015 at 20:11, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 4 November 2015 at 20:35, Richard Biener <rguenther@suse.de> wrote:
> >> >> >> >
> >> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
> >> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
> >> >> >> > solely as the HW always computes both div and mod (IIRC).
> >> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
> >> >> >> take place on x86.
> >> >> >
> >> >> > Ok.
> >> >> >
> >> >> >> > +
> >> >> >> > +        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> >> >> >> > (use_stmt), rhs);
> >> >> >> > +        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >> >> >> >
> >> >> >> > Ick.  Please use
> >> >> >> >
> >> >> >> >     gimple_set_rhs_from_tree (use_stmt, res);
> >> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
> >> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
> >> >> >> Is that OK ?
> >> >> >
> >> >> > Yes.
> >> >> >
> >> >> >> >     update_stmt (use_stmt);
> >> >> >> >     if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> >> >> >> >       cfg_changed = true;
> >> >> >> >
> >> >> >> > +  free_dominance_info (CDI_DOMINATORS);
> >> >> >> >
> >> >> >> > do not free dominators.
> >> >> >>
> >> >> >> I have done the suggested changes in the attached patch.
> >> >> >> I have a few questions:
> >> >> >>
> >> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
> >> >> >> stmt with matching operands
> >> >> >> look correct ?
> >> >> >
> >> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
> >> >> > +     top_bb dominates all other basic blocks containing div/mod stms
> >> >> > +     so, the topmost stmt would be the first div/mod stmt with matching
> >> >> > operands
> >> >> > +     in top_bb.  */
> >> >> > +
> >> >> > +  gcc_assert (top_bb != 0);
> >> >> > +  gimple_stmt_iterator gsi;
> >> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
> >> >> > (&gsi))
> >> >> > +    {
> >> >> > +      gimple *g = gsi_stmt (gsi);
> >> >> > +      if (is_gimple_assign (g)
> >> >> > +         && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
> >> >> > +            || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
> >> >> > +         && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
> >> >> > +         && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
> >> >> > +       break;
> >> >> >
> >> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
> >> >> > alongside top_bb (looks like you'll no longer need top_bb if you
> >> >> > retail top_stmt).  And then do
> >> >> >
> >> >> >    gsi = gsi_for_stmt (top_stmt);
> >> >> >
> >> >> > and insert before that.
> >> >> Thanks, done in this patch. Does it look OK ?
> >> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
> >> >> stmt1 occurs before stmt2
> >> >> only if stmt1 and stmt2 are in the same basic block ?
> >> >> >
> >> >> >> b) Handling constants - I dropped handling constants in the attached
> >> >> >> patch. IIUC we don't want
> >> >> >> to enable this transform if there's a specialized expansion for some
> >> >> >> constants for div or mod ?
> >> >> >
> >> >> > See expand_divmod which has lots of special cases for constant operands
> >> >> > not requiring target support for div or mod.
> >> >> Thanks, would it be OK if I do this in follow up patch ?
> >> >
> >> > Well, just not handle them like in your patch is fine.
> >> >
> >> >> >
> >> >> >> I suppose this would also be target dependent and require a target hook ?
> >> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
> >> >> >> is constant and <= 0 or power of 2,
> >> >> >> while for other cases goes the expand_divmod() route to generate call
> >> >> >> to __aeabi_idivmod libcall.
> >> >> >
> >> >> > Ok, so it lacks a signed mod instruction.
> >> >> >
> >> >> >> c) Gating the divmod transform -
> >> >> >> I tried gating it on checks for optab_handlers on div and mod, however
> >> >> >> this doesn't enable transform for arm cortex-a9
> >> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div and mod).
> >> >> >> IIUC for cortex-a9,
> >> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
> >> >> >> HAVE_divsi3 is 0.
> >> >> >> However optab_handler (smod_optab, SImode) matches since optab_handler
> >> >> >> only checks for existence of pattern
> >> >> >> (and not whether the pattern gets matched).
> >> >> >> I suppose we should enable the transform only if the divmod, div, and
> >> >> >> mod pattern do not match rather than checking
> >> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
> >> >> >> would fail to match but optab_handler(smod_optab, SImode ) still
> >> >> >> says it's matched.
> >> >> >
> >> >> > Ah, of course.  Querying for an optab handler is just a cheap
> >> >> > guesstimate...  Not sure how to circumvent this best (sub-target
> >> >> > enablement of patterns).  RTL expansion just goes ahead (of course)
> >> >> > and sees if expansion eventually fails.  Richard?
> >> >> >
> >> >> >> Should we define a new target hook combine_divmod, which returns true
> >> >> >> if transforming to divmod is desirable for that
> >> >> >> target ?
> >> >> >> The default definition could be:
> >> >> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
> >> >> >> {
> >> >> >>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
> >> >> >> }
> >> >> >>
> >> >> >> And for arm, it could be over-ridden to return false if op2 is
> >> >> >> constant and <= 0 or power of 2.
> >> >> >> I am not really sure if this is a good idea since I am replicating
> >> >> >> information from modsi3 pattern.
> >> >> >> Any change to the pattern may require corresponding change to the hook :/
> >> >> >
> >> >> > Yeah, I don't think that is desirable.  Ideally we'd have a way
> >> >> > to query HAVE_* for CODE_FOR_* which would mean target-insns.def
> >> >> > support for all div/mod/divmod patterns(?) and queries...
> >> >> >
> >> >> > Not sure if what would be enough though.
> >> >> >
> >> >> > Note that the divmod check is equally flawed.
> >> >> >
> >> >> > I think with the above I'd enable the transform when
> >> >> >
> >> >> > +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> >> >> > +      || (optab_libfunc (divmod_optab, mode) != NULL_RTX
> >> >> >            && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
> >> >> > +    return false;
> >> >> Um this fails for the arm backend (for cortex-a9) because
> >> >> optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false
> >> >> optab_libfunc (divmod_optab, mode) != NULL_RTX is true.
> >> >> optab_handler (div_optab, mode) == CODE_FOR_nothing is true.
> >> >> which comes down to false || (true && true) which is true and we hit
> >> >> return false.
> >> >
> >> > Oh, sorry to mess up the test - it was supposed to be inverted.
> >> >
> >> >> AFAIU, we want the transform to be disabled if:
> >> >> a) optab_handler exists for divmod.
> >> >> b) optab_handler exists for div.
> >> >> c) optab_libfunc does not exist for divmod.  */
> >> >>
> >> >> +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> >> >> +      || optab_handler (div_optab, mode) != CODE_FOR_nothing
> >> >> +      || optab_libfunc (divmod_optab, mode) == NULL_RTX)
> >> >> +    return false;
> >> >> Does that look correct ?
> >> >
> >> > No.  That will disable if we have a divmod optab.  Instead try
> >> >
> >> >  if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
> >> >         || (optab_libfunc (divmod_optab, mode) != NULL_RTX
> >> >             && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)))
> >> >    return false;
> >> >
> >> > which is what I intended.  If we have a divmod optab go ahead.
> >> > If we have a libfunc and not a div optab then as well.
> >> Oops, I assumed that we only wanted this transform if optab_libfunc existed.
> >> Modified the test in the attached patch.
> >> Well this does affect x86_64 and i?86 -;)
> >>
> >> I added the following hunk back to expand_DIVMOD, since if optab_handler exists
> >> we want to use it for expansion. Does it look OK ?
> >>
> >> +  /* Check if optab handler exists for udivmod/sdivmod.  */
> >> +  if (optab_handler (tab, mode) != CODE_FOR_nothing)
> >> +    {
> >> +      rtx quotient = gen_reg_rtx (mode);
> >> +      rtx remainder = gen_reg_rtx (mode);
> >> +      expand_twoval_binop (tab, op0, op1, quotient, remainder,
> >> TYPE_UNSIGNED (type));
> >> +
> >> +      /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
> >> +      expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
> >> +                  make_tree (TREE_TYPE (arg0), quotient),
> >> +                  make_tree (TREE_TYPE (arg1), remainder)),
> >> +                  target, VOIDmode, EXPAND_NORMAL);
> >> +
> >> +      return;
> >> +    }
> >
> > Ah, sure.
> >
> >> I verified the code generated for x86_64 and i?86 and it's same for my
> >> test-cases.
> >> However during a clean build of gcc for x86_64, I am getting segfault
> >> in bid64_div.c:
> >> In file included from
> >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_internal.h:27:0,
> >>                  from
> >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:56:
> >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:
> >> In function â__bid64_divâ:
> >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_conf.h:36:19:
> >> internal compiler error: in expand_DIVMOD, at internal-fn.c:2099
> >>  #define bid64_div __bid64_div
> >>                    ^
> >> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:80:1:
> >> note: in expansion of macro âbid64_divâ
> >>  bid64_div (UINT64 x,
> >>  ^
> >> 0x8e101f expand_DIVMOD
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/internal-fn.c:2099
> >> 0x705927 expand_call_stmt
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:2549
> >> 0x705927 expand_gimple_stmt_1
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3509
> >> 0x705927 expand_gimple_stmt
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3672
> >> 0x708d65 expand_gimple_basic_block
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:5676
> >> 0x70e696 execute
> >>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:6288
> >>
> >> It looks like in the following code in expand_DIVMOD:
> >>
> >> +  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
> >> +  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
> >> +                                      GET_MODE_SIZE (mode));
> >>
> >> remainder is (nil) and hence the segfault in make_tree (I added the
> >> asserts for quotient and remainder later).
> >> I am not sure why it's happening  though, investigating it.
> >
> > No idea - as said, RTL expansion isn't my area of expertise.  I would
> > expect that offsetted subregs shouldn't be used for complex components.
> > Maybe gen_lowpart / gen_highpart will work better which abstracts
> > the subregging.
> Hi Richard,
> This is a revamped version of my previous patch for divmod transform.
> The segfault with that patch can be reproduced with following
> test-case on x86_64 with -m32:
> typedef unsigned int DItype __attribute__((mode(DI)));
> 
> DItype f (DItype x, DItype y)
> {
>   DItype quot, rem;
> 
>   quot = x / y;
>   rem = x % y;
>   return quot + rem;
> }
> 
> Jim pointed out to me that happens because target-specific divmod
> libfuncs have different calling conventions, there is no "standard"
> calling convention for divmod in libgcc.
> The divmod libfunc in this case is libgcc2.c:__udivmoddi4() which has
> a different calling convention from arm's __aeabi_divmod() and I was
> assuming
> that all divmod's follow arm's calling convention.
> The arm version expects that op0 and op1 are passed as arguments and
> both div,rem are returned with return value having mode twice that of
> it's arguments.
> whereas libgcc2.c:__udivmoddi4() takes 3 args: op0, op1 of unsigned
> DImode and 3rd arg is pointer used for storing remainder while
> return value contains quotient.
> Similarly spu's divmod libfuncs follow libgcc2.c:__udivmoddi4()'s convention.
> 
> To workaround this, I defined a new hook expand_divmod_libfunc, which
> targets must override for expanding call to target-specific dimovd.
> The "default" hook default_expand_divmod_libfunc() expands call to
> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
> available.
> Is this a reasonable approach ?

Hum.  How do they get to expand/generate it today?  That said, I'm
no expert in this area.

A simpler solution may be to not do the transform if there is no
instruction for divmod.

> Expansion proceeds as follows:
> expand_DIVMOD checks if optab_handler for udivmod/sdivmod exists and
> if it does, uses expand_twoval_binop() for expansion.
> else it calls the target hook for generating call to divmod libfunc.
> 
> The divmod transform takes place if:
> a) optab_handler exists for divmod or
> b) optab_libfunc (divmod) exists and optab_handler(div) doesn't.
> and target overrides expand_divmod_libfunc or for the default function,
> mode is DImode and TYPE_UNSIGNED (type) is true.
> 
> Regarding test-cases, I added effective-check for divmod for only arm
> However DIVMOD transforms will also take place for targets having
> hardware div instructions.
> Should I need to manually add each configuration to
> check_effective_target_divmod() that have
> hardware div or is there a better way to check that ?

You have to manually add to that.

> The patch passes bootstrap and testing on x86_64-unknown-linux-gnu
> and cross-tested on arm-linux-gnueabihf, arm-none-eabi,
> armeb-none-linux-gnueabihf,

It's now too late for this, please queue it for GCC 7.

Thanks,
Richard.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]