Bug 43721 - Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call
Summary: Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.3
: P3 enhancement
Target Milestone: 7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 47777 65668 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-04-11 20:39 UTC by Mans Rullgard
Modified: 2018-11-19 15:32 UTC (History)
8 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: arm-unknown-linux-gnueabi
Build: x86_64-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2010-04-14 21:38:25


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mans Rullgard 2010-04-11 20:39:10 UTC
Consider the following code:

int divmod(int a, int b)
{
    int q = a / b;
    int r = a % b;
    return q + r;
}

For an ARM EABI target, this results in one __aeabi_idivmod() call and one
__aeabi_idiv() call even though the former already calculates the quotient.
Comment 1 astrange+gcc@gmail.com 2010-04-12 03:54:29 UTC
Still the case with 4.5.

> arm-none-linux-gnueabi-gcc -Os -S divmod.c
> cat divmod.s
	.cpu arm10tdmi
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 4
	.eabi_attribute 18, 4
	.file	"divmod.c"
	.global	__aeabi_idivmod
	.global	__aeabi_idiv
	.text
	.align	2
	.global	divmod
	.type	divmod, %function
divmod:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	stmfd	sp!, {r4, r5, r6, lr}
	mov	r6, r0
	mov	r5, r1
	bl	__aeabi_idivmod
	mov	r0, r6
	mov	r4, r1
	mov	r1, r5
	bl	__aeabi_idiv
	add	r0, r4, r0
	ldmfd	sp!, {r4, r5, r6, pc}
	.size	divmod, .-divmod
	.ident	"GCC: (GNU) 4.5.0 20100325 (experimental)"
	.section	.note.GNU-stack,"",%progbits
Comment 2 Richard Earnshaw 2010-04-14 20:50:36 UTC
Before confirming this bug, we need to determine if the following program has well defined behaviour:

int count_div0 = 0;

int __aeabi_div0()
{
  count_div0++;
  return 0;
}

int divmod(int a, int b)
{
    int q = a / b;
    int r = a % b;
    return q + r;
}

int main()
{
  divmod(33, 0);
  printf("%d", count_div0);
  return 0;
}

If the above program should always print "2", then the compiler is correct and this report is invalid.  If it may print any value, then the compiler is wrong and should be fixed to optimize this case.  Note that __aeabi_div0 will be called if either __aeabi_idiv or __aeabi_idivmod are called with 0 as the second parameter.
  
Comment 3 Richard Biener 2010-04-14 21:30:48 UTC
The policy has been AFAIR that we can collapse multiple traps into a single one,
but not remove traps completely (or of course not introduce new ones).
Comment 4 Mans Rullgard 2010-04-14 21:34:36 UTC
The C99 standard says this about division by zero:

  The result of the / operator is the quotient from the division
  of the first operand by the second; the result of the % operator
  is the remainder. In both operations, if the value of the second
  operand is zero, the behavior is undefined.

The ARM ABI states the following about the __aeabi_div0() function:

  The *div0 functions:
  - Return the value passed to them as a parameter.
  - Or, return a fixed value defined by the execution environment (such as 0).
  - Or, raise a signal (often SIGFPE) or throw an exception, and do not return.
  [...]
  The *div and *divmod functions may be inlined by a tool chain. It is
  Q-o-I whether an inlined version calls *div0 out of line or returns the
  values that would have been returned by a particular value-returning
  version of *div0.

I interpret these as saying the application may make no assumptions about
the behaviour after a division (or modulus) by zero.
Comment 5 Richard Earnshaw 2010-04-14 21:38:25 UTC
I think that's enough evidence to confirm.
Comment 6 Andrew Stubbs 2010-10-20 16:53:13 UTC
Here's the problem: expand_divmod always prefers the straight "div" library call over the "divmod" library call, no exceptions.

Yes, "divmodsi4" in a .md is indeed preferred over "divsi4", so the optimization works fine on targets that have those, but ARM doesn't, so those rules are irrelevant.

ARM does not provide a separate library function for "mod", so expand_divmod does use "divmod" for mod operations, but never for div operations.

The reason for the apparent opposite rules appears to be that the divmodsi4 rule can be coded to auto-detect the most optimal kind of underlying operation to use, whereas the library call is fixed, once chosen.

I see two possible ways to fix this:
 1. Teach CSE (or somewhere) that div and divmod library calls have some overlap.
 2. Always prefer divmod, and find some other way to convert it to div, where appropriate.

I don't see any way to generate the RTL with the right call, so I'm pretty sure it does have to be fixed up after the fact.

Or, I could be way off. :)
Comment 7 Andrew Pinski 2011-02-17 15:50:30 UTC
*** Bug 47777 has been marked as a duplicate of this bug. ***
Comment 8 prathamesh3492 2016-08-27 12:17:09 UTC
*** Bug 65668 has been marked as a duplicate of this bug. ***
Comment 9 prathamesh3492 2016-10-28 19:05:44 UTC
Author: prathamesh3492
Date: Fri Oct 28 19:05:12 2016
New Revision: 241660

URL: https://gcc.gnu.org/viewcvs?rev=241660&root=gcc&view=rev
Log:
2016-10-28  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Kugan Vivekanandarajah  <kuganv@linaro.org>
	    Jim Wilson  <jim.wilson@linaro.org>

	PR tree-optimization/43721
	* target.def: New hook expand_divmod_libfunc.
	* doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
	* doc/tm.texi: Regenerate.
	* internal-fn.def: Add new entry for DIVMOD ifn.
	* internal-fn.c (expand_DIVMOD): New.
	* tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
	targhooks.h.
	(widen_mul_stats): Add new field divmod_calls_inserted.
	(target_supports_divmod_p): New.
	(divmod_candidate_p): Likewise.
	(convert_to_divmod): Likewise.
	(pass_optimize_widening_mul::execute): Call
	calculate_dominance_info(), renumber_gimple_stmt_uids() at
	beginning of function. Call convert_to_divmod()
	and record stats for divmod.
	* config/arm/arm.c (arm_expand_divmod_libfunc): Override hook
	TARGET_EXPAND_DIVMOD_LIBFUNC.
	* doc/sourcebuild.texi: Add items for arm_divmod_simode, divmod,
	divmod_simode.

testsuite/
	* lib/target-supports.exp (check_effective_target_divmod): New.
	(check_effective_target_divmod_simode): Likewise.
	(check_effective_target_arm_divmod_simode): Likewise.
	* gcc.dg/divmod-1-simode.c: New test.
	* gcc.dg/divmod-1.c: Likewise.
	* gcc.dg/divmod-2-simode.c: Likewise.
	* gcc.dg/divmod-2.c: Likewise.
	* gcc.dg/divmod-3-simode.c: Likewise.
	* gcc.dg/divmod-3.c: Likewise.
	* gcc.dg/divmod-4-simode.c: Likewise.
	* gcc.dg/divmod-4.c: Likewise.
	* gcc.dg/divmod-5.c: Likewise.
	* gcc.dg/divmod-6-simode.c: Likewise.
	* gcc.dg/divmod-6.c: Likewise.
	* gcc.dg/divmod-7.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/divmod-1-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-1.c
    trunk/gcc/testsuite/gcc.dg/divmod-2-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-2.c
    trunk/gcc/testsuite/gcc.dg/divmod-3-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-3.c
    trunk/gcc/testsuite/gcc.dg/divmod-4-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-4.c
    trunk/gcc/testsuite/gcc.dg/divmod-5.c
    trunk/gcc/testsuite/gcc.dg/divmod-6-simode.c
    trunk/gcc/testsuite/gcc.dg/divmod-6.c
    trunk/gcc/testsuite/gcc.dg/divmod-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
    trunk/gcc/doc/sourcebuild.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/internal-fn.c
    trunk/gcc/internal-fn.def
    trunk/gcc/target.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-ssa-math-opts.c
Comment 10 prathamesh3492 2016-10-28 19:06:50 UTC
Fixed on trunk.
Comment 11 Martin Liška 2018-11-19 11:56:57 UTC
Can the bug be marked as resolved?
Comment 12 Ramana Radhakrishnan 2018-11-19 15:32:59 UTC
Fixed in GCC7 then.