This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR43721] Optimize a/b and a%b to single divmod call


On 11 November 2015 at 19:04, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 11 November 2015 at 16:03, Richard Biener <rguenther@suse.de> wrote:
>> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 10 November 2015 at 20:11, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 4 November 2015 at 20:35, Richard Biener <rguenther@suse.de> wrote:
>> >> >> >
>> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
>> >> >> > solely as the HW always computes both div and mod (IIRC).
>> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
>> >> >> take place on x86.
>> >> >
>> >> > Ok.
>> >> >
>> >> >> > +
>> >> >> > +        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
>> >> >> > (use_stmt), rhs);
>> >> >> > +        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >> >> >
>> >> >> > Ick.  Please use
>> >> >> >
>> >> >> >     gimple_set_rhs_from_tree (use_stmt, res);
>> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> >> >> Is that OK ?
>> >> >
>> >> > Yes.
>> >> >
>> >> >> >     update_stmt (use_stmt);
>> >> >> >     if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >       cfg_changed = true;
>> >> >> >
>> >> >> > +  free_dominance_info (CDI_DOMINATORS);
>> >> >> >
>> >> >> > do not free dominators.
>> >> >>
>> >> >> I have done the suggested changes in the attached patch.
>> >> >> I have a few questions:
>> >> >>
>> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
>> >> >> stmt with matching operands
>> >> >> look correct ?
>> >> >
>> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
>> >> > +     top_bb dominates all other basic blocks containing div/mod stms
>> >> > +     so, the topmost stmt would be the first div/mod stmt with matching
>> >> > operands
>> >> > +     in top_bb.  */
>> >> > +
>> >> > +  gcc_assert (top_bb != 0);
>> >> > +  gimple_stmt_iterator gsi;
>> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
>> >> > (&gsi))
>> >> > +    {
>> >> > +      gimple *g = gsi_stmt (gsi);
>> >> > +      if (is_gimple_assign (g)
>> >> > +         && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
>> >> > +            || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
>> >> > +         && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
>> >> > +         && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
>> >> > +       break;
>> >> >
>> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
>> >> > alongside top_bb (looks like you'll no longer need top_bb if you
>> >> > retail top_stmt).  And then do
>> >> >
>> >> >    gsi = gsi_for_stmt (top_stmt);
>> >> >
>> >> > and insert before that.
>> >> Thanks, done in this patch. Does it look OK ?
>> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
>> >> stmt1 occurs before stmt2
>> >> only if stmt1 and stmt2 are in the same basic block ?
>> >> >
>> >> >> b) Handling constants - I dropped handling constants in the attached
>> >> >> patch. IIUC we don't want
>> >> >> to enable this transform if there's a specialized expansion for some
>> >> >> constants for div or mod ?
>> >> >
>> >> > See expand_divmod which has lots of special cases for constant operands
>> >> > not requiring target support for div or mod.
>> >> Thanks, would it be OK if I do this in follow up patch ?
>> >
>> > Well, just not handle them like in your patch is fine.
>> >
>> >> >
>> >> >> I suppose this would also be target dependent and require a target hook ?
>> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> >> >> is constant and <= 0 or power of 2,
>> >> >> while for other cases goes the expand_divmod() route to generate call
>> >> >> to __aeabi_idivmod libcall.
>> >> >
>> >> > Ok, so it lacks a signed mod instruction.
>> >> >
>> >> >> c) Gating the divmod transform -
>> >> >> I tried gating it on checks for optab_handlers on div and mod, however
>> >> >> this doesn't enable transform for arm cortex-a9
>> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div and mod).
>> >> >> IIUC for cortex-a9,
>> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> >> >> HAVE_divsi3 is 0.
>> >> >> However optab_handler (smod_optab, SImode) matches since optab_handler
>> >> >> only checks for existence of pattern
>> >> >> (and not whether the pattern gets matched).
>> >> >> I suppose we should enable the transform only if the divmod, div, and
>> >> >> mod pattern do not match rather than checking
>> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> >> >> would fail to match but optab_handler(smod_optab, SImode ) still
>> >> >> says it's matched.
>> >> >
>> >> > Ah, of course.  Querying for an optab handler is just a cheap
>> >> > guesstimate...  Not sure how to circumvent this best (sub-target
>> >> > enablement of patterns).  RTL expansion just goes ahead (of course)
>> >> > and sees if expansion eventually fails.  Richard?
>> >> >
>> >> >> Should we define a new target hook combine_divmod, which returns true
>> >> >> if transforming to divmod is desirable for that
>> >> >> target ?
>> >> >> The default definition could be:
>> >> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
>> >> >> {
>> >> >>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> >> >> }
>> >> >>
>> >> >> And for arm, it could be over-ridden to return false if op2 is
>> >> >> constant and <= 0 or power of 2.
>> >> >> I am not really sure if this is a good idea since I am replicating
>> >> >> information from modsi3 pattern.
>> >> >> Any change to the pattern may require corresponding change to the hook :/
>> >> >
>> >> > Yeah, I don't think that is desirable.  Ideally we'd have a way
>> >> > to query HAVE_* for CODE_FOR_* which would mean target-insns.def
>> >> > support for all div/mod/divmod patterns(?) and queries...
>> >> >
>> >> > Not sure if what would be enough though.
>> >> >
>> >> > Note that the divmod check is equally flawed.
>> >> >
>> >> > I think with the above I'd enable the transform when
>> >> >
>> >> > +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >> > +      || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>> >> >            && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
>> >> > +    return false;
>> >> Um this fails for the arm backend (for cortex-a9) because
>> >> optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false
>> >> optab_libfunc (divmod_optab, mode) != NULL_RTX is true.
>> >> optab_handler (div_optab, mode) == CODE_FOR_nothing is true.
>> >> which comes down to false || (true && true) which is true and we hit
>> >> return false.
>> >
>> > Oh, sorry to mess up the test - it was supposed to be inverted.
>> >
>> >> AFAIU, we want the transform to be disabled if:
>> >> a) optab_handler exists for divmod.
>> >> b) optab_handler exists for div.
>> >> c) optab_libfunc does not exist for divmod.  */
>> >>
>> >> +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >> +      || optab_handler (div_optab, mode) != CODE_FOR_nothing
>> >> +      || optab_libfunc (divmod_optab, mode) == NULL_RTX)
>> >> +    return false;
>> >> Does that look correct ?
>> >
>> > No.  That will disable if we have a divmod optab.  Instead try
>> >
>> >  if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >         || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>> >             && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)))
>> >    return false;
>> >
>> > which is what I intended.  If we have a divmod optab go ahead.
>> > If we have a libfunc and not a div optab then as well.
>> Oops, I assumed that we only wanted this transform if optab_libfunc existed.
>> Modified the test in the attached patch.
>> Well this does affect x86_64 and i?86 -;)
>>
>> I added the following hunk back to expand_DIVMOD, since if optab_handler exists
>> we want to use it for expansion. Does it look OK ?
>>
>> +  /* Check if optab handler exists for udivmod/sdivmod.  */
>> +  if (optab_handler (tab, mode) != CODE_FOR_nothing)
>> +    {
>> +      rtx quotient = gen_reg_rtx (mode);
>> +      rtx remainder = gen_reg_rtx (mode);
>> +      expand_twoval_binop (tab, op0, op1, quotient, remainder,
>> TYPE_UNSIGNED (type));
>> +
>> +      /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
>> +      expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
>> +                  make_tree (TREE_TYPE (arg0), quotient),
>> +                  make_tree (TREE_TYPE (arg1), remainder)),
>> +                  target, VOIDmode, EXPAND_NORMAL);
>> +
>> +      return;
>> +    }
>
> Ah, sure.
>
>> I verified the code generated for x86_64 and i?86 and it's same for my
>> test-cases.
>> However during a clean build of gcc for x86_64, I am getting segfault
>> in bid64_div.c:
>> In file included from
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_internal.h:27:0,
>>                  from
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:56:
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:
>> In function â__bid64_divâ:
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_conf.h:36:19:
>> internal compiler error: in expand_DIVMOD, at internal-fn.c:2099
>>  #define bid64_div __bid64_div
>>                    ^
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:80:1:
>> note: in expansion of macro âbid64_divâ
>>  bid64_div (UINT64 x,
>>  ^
>> 0x8e101f expand_DIVMOD
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/internal-fn.c:2099
>> 0x705927 expand_call_stmt
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:2549
>> 0x705927 expand_gimple_stmt_1
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3509
>> 0x705927 expand_gimple_stmt
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3672
>> 0x708d65 expand_gimple_basic_block
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:5676
>> 0x70e696 execute
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:6288
>>
>> It looks like in the following code in expand_DIVMOD:
>>
>> +  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
>> +  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
>> +                                      GET_MODE_SIZE (mode));
>>
>> remainder is (nil) and hence the segfault in make_tree (I added the
>> asserts for quotient and remainder later).
>> I am not sure why it's happening  though, investigating it.
>
> No idea - as said, RTL expansion isn't my area of expertise.  I would
> expect that offsetted subregs shouldn't be used for complex components.
> Maybe gen_lowpart / gen_highpart will work better which abstracts
> the subregging.
Hi Richard,
This is a revamped version of my previous patch for divmod transform.
The segfault with that patch can be reproduced with following
test-case on x86_64 with -m32:
typedef unsigned int DItype __attribute__((mode(DI)));

DItype f (DItype x, DItype y)
{
  DItype quot, rem;

  quot = x / y;
  rem = x % y;
  return quot + rem;
}

Jim pointed out to me that happens because target-specific divmod
libfuncs have different calling conventions, there is no "standard"
calling convention for divmod in libgcc.
The divmod libfunc in this case is libgcc2.c:__udivmoddi4() which has
a different calling convention from arm's __aeabi_divmod() and I was
assuming
that all divmod's follow arm's calling convention.
The arm version expects that op0 and op1 are passed as arguments and
both div,rem are returned with return value having mode twice that of
it's arguments.
whereas libgcc2.c:__udivmoddi4() takes 3 args: op0, op1 of unsigned
DImode and 3rd arg is pointer used for storing remainder while
return value contains quotient.
Similarly spu's divmod libfuncs follow libgcc2.c:__udivmoddi4()'s convention.

To workaround this, I defined a new hook expand_divmod_libfunc, which
targets must override for expanding call to target-specific dimovd.
The "default" hook default_expand_divmod_libfunc() expands call to
libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
available.
Is this a reasonable approach ?

Expansion proceeds as follows:
expand_DIVMOD checks if optab_handler for udivmod/sdivmod exists and
if it does, uses expand_twoval_binop() for expansion.
else it calls the target hook for generating call to divmod libfunc.

The divmod transform takes place if:
a) optab_handler exists for divmod or
b) optab_libfunc (divmod) exists and optab_handler(div) doesn't.
and target overrides expand_divmod_libfunc or for the default function,
mode is DImode and TYPE_UNSIGNED (type) is true.

Regarding test-cases, I added effective-check for divmod for only arm
However DIVMOD transforms will also take place for targets having
hardware div instructions.
Should I need to manually add each configuration to
check_effective_target_divmod() that have
hardware div or is there a better way to check that ?

The patch passes bootstrap and testing on x86_64-unknown-linux-gnu
and cross-tested on arm-linux-gnueabihf, arm-none-eabi,
armeb-none-linux-gnueabihf,

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> >> >
>> >> > so we either will have a divmod instruction (hopefully not sub-target
>> >> > disabled for us) or a libfunc for divmod and for sure no HW divide
>> >> > instruction (HW mod can be emulated by HW divide but not the other
>> >> > way around).
>> >> >
>> >> >> d) Adding effective-target-check for divmod: I just enabled it for
>> >> >> arm*-*-* for now. I could additionally append more targets,
>> >> >> not sure if this is the right approach.
>> >> >
>> >> > Looks good to me.
>> >> Is this version OK if bootstrap/testing passes ?
>> >
>> > Ok with adjusting the optab check like above.
>> >
>> > Thanks,
>> > Richard.

Attachment: pr43721-3.diff
Description: Text document

Attachment: ChangeLog
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]