[PR43721] Failure to optimise (a/b) and (a%b) into single call

Richard Biener richard.guenther@gmail.com
Tue Sep 3 10:57:00 GMT 2013

On Fri, Jun 21, 2013 at 4:12 AM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:
> On 17/06/13 19:07, Richard Biener wrote:
>> On Mon, 17 Jun 2013, Kugan wrote:
>>> Hi,
>>> I am attempting to fix Bug 43721 - Failure to optimise (a/b) and (a%b)
>>> into
>>> single __aeabi_idivmod call
>>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721)
>>> execute_cse_sincos tree level pass does similar cse so I attempted to use
>>> similar approach here. Div/mod cse is not really using built-in functions
>>> though at this level.
>> The issue with performing the transform at the same time as we
>> transform SINCOS is that the vectorizer will now no longer be able
>> to vectorize these loops.  It would need to be taught how to
>> handle the builtin calls (basically undo the transformation, I don't
>> know of any ISA that can do vectorized combined div/mod).  Which
>> means it should rather be done at the point we CSE reciprocals
>> (which also replaces computes with builtin target function calls).
> Thanks Richard. Since execute_cse_reciprocals is handling reciprocals only,
> I added another pass to do divmod. Is that OK?
>>> For the case of div and mod operations, after CSE is performed, there
>>> isnt a
>>> way to represent the resulting stament in gimple. We will endup with
>>> divmod
>>> taking two arguments and returning double the size of one arguments in
>>> the
>>> three address format (divmod will return reminder and quotient so the
>>> return
>>> type is double the size of argument type).
>>> Since GIMPLE_ASSIGN will result in type checking failure in this case, I
>>> atempted use built-in functions (GIMPLE_CALL to represent the runtime
>>> library
>>> call). Name for the function here  is target specific and can be obtained
>>> from
>>> sdivmod_optab so the builtin function name defined in tree level is not
>>> used.
>>> I am not entirelt sure this is the right approach so I am attaching the
>>> first
>>> cut of the patch to get your feedback and understand the right approach
>>> to
>>> this problem.
>> If we don't want to expose new builtins to the user (I'm not sure we
>> want that), then using "internal functions" is an easier way to
>> avoid these issues (see gimple.h and internal-fn.(def|h)).
> I have now changed to use internal functions. Thanks for that.
>> Generally the transform looks useful to me as it moves forward with
>> the general idea of moving pattern recognition done during RTL expansion
>> to an earlier place.
>> For the use of a larger integer type and shifts to represent
>> the modulo and division result I don't think that's the very best
>> idea.  Instead resorting to a complex integer type as return
>> value looks more appealing (similar to sincos using cexpi here).
>> That way you also avoid the ugly hard-coding of bit-sizes.
> I have changed it to use complex integers now.
>> +  if (HAVE_divsi3
>> +       || (GET_MODE_BITSIZE (TYPE_MODE (type)) != 32)
>> watch out for types whose TYPE_PRECISION is not the bitsize
>> of their mode.  Also it should be GET_MODE_PRECISION here.
>> +       || !optab_libfunc (TYPE_UNSIGNED (type)? udivmod_optab :
>> sdivmod_optab,
>> +                            TYPE_MODE (type)))
>> targets that use a libfunc should also get this optimization, as
>> it always removes computations.  I think the proper test is
>> for whether the target can do division and/or modulus without
>> using a libfunc, not whether there is a divmod optab/libfunc.
> I guess best way to do is by defining a target hook and let the target
> define the required behaviour. Is that what you had in mind?
> I have attached a modified patch with these changes.

+@deftypefn {Target Hook} bool TARGET_COMBINE_DIVMOD (enum
machine_mode @var{mode})

rather than talking about libcalls I'd say the target should indicate whether
it has a way to optimize division and modulo with the same operands for given
mode.  As we have sdivmod_optab and udivmod_optab already you should
check whether for the given mode it is != CODE_for_nothing.  Or even better,
follow some of the vectorizer expansion stuff and separate out a
predicate from expand_divmod () that tells you whether it would do something

       NEXT_PASS (pass_cse_reciprocals);
+      NEXT_PASS (pass_cse_divmod);

of course I've meant that you perform divmod detection at the same
time reciprocals are optimized - thus in the very same pass.

Sorry for taking so long to review this - the patch now needs updating
for the pass-manager C++-ification, too.


>> Others knowing this piece of the compiler better may want to comment
>> here, of course.
>> Thanks,
>> Richard.
> Thanks,
> Kugan

More information about the Gcc mailing list