Bug 82981 - [7 Regression] unnecessary __multi3 call for mips64r6 linux kernel
Summary: [7 Regression] unnecessary __multi3 call for mips64r6 linux kernel
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.2.1
: P2 normal
Target Milestone: 7.5
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2017-11-14 05:08 UTC by Jim Wilson
Modified: 2018-12-06 13:45 UTC (History)
2 users (show)

See Also:
Host:
Target: mips64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-11-14 00:00:00


Attachments
gcc8-pr82981.patch (1.58 KB, patch)
2017-11-14 12:07 UTC, Jakub Jelinek
Details | Diff
gcc8-pr82981.patch (2.69 KB, patch)
2017-11-14 15:57 UTC, Jakub Jelinek
Details | Diff
mips64r6 linux kernel .config file (17.16 KB, text/plain)
2017-11-15 19:15 UTC, Jim Wilson
Details
gcc8-pr82981-arm.patch (737 bytes, patch)
2017-11-20 15:46 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Wilson 2017-11-14 05:08:49 UTC
Given the testcase
unsigned long func(unsigned long a, unsigned long b)
{
        return a > (~0UL) / b;
}
compiled with -march=mips64r6 -mabi=64 -mexplicit-relocs -O2 we end up with a call to __multi3 which is inefficient and inconvenient.

The testcase gets converted to 
  _1 = MUL_OVERFLOW (a_4(D), b_5(D));
  _2 = IMAGPART_EXPR <_1>;
There are no mulv patterns in the mips port, so it tries mulditi3 which fails, and then calls __multi3.

Mips64r6 does have a widening DImode multiply which should have been used instead.  The problem is that the *mulditi3 pattern is missing mips64r6 support.

Alternatively, the expander should try using a muldi3_highpart pattern when the mulditi3 pattern doesn't work.  Especially when the highpart is the only part we need as in this example.  The mips64r6 multi3_highpart is present.

Or alternatively, a mulvti3 pattern should be added to the mips port.

See for instance the thread at
    https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html
Comment 1 Jim Wilson 2017-11-14 05:13:22 UTC
This problem is causing link errors for the mips64r6 linux kernel.  They would like the gcc developers to fix it, instead of adding a definition of the __multi3 function to the linux kernel.
Comment 2 Richard Biener 2017-11-14 09:57:48 UTC
We could also restrict MUL_OVERFLOW pattern matching to archs that can expand it?
Comment 3 Jakub Jelinek 2017-11-14 10:13:41 UTC
That might not be that easy without repeating there big chunks of internal-fn.c stuff.
In any case, what is mips64r6 using?
From the above it seems it doesn't have a corresponding optab, so
  if (icode == CODE_FOR_nothing)
is true.  Thus, is it:
      if (GET_MODE_2XWIDER_MODE (mode).exists (&wmode)
          && targetm.scalar_mode_supported_p (wmode))
or
      else if (int_mode_for_size (prec / 2, 1).exists (&hmode)
               && 2 * GET_MODE_PRECISION (hmode) == prec)
or the fallback case that doesn't report overflow:
      else
        {
          gcc_assert (!is_ubsan);
          ops.code = MULT_EXPR;
          ops.type = type;
          res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
          emit_jump (done_label);
        }
?  If it is the first one from these, perhaps we should have some extra checks there whether WIDEN_MULT_EXPR will be emitted as a library call or not.
Though, if mips64r6 has hipart multiplication, I don't see why it couldn't handle the widening multiplication by performing normal DImode multiplication plus highpart DImode multiplication or something similar.
Comment 4 Jakub Jelinek 2017-11-14 12:07:06 UTC
Created attachment 42600 [details]
gcc8-pr82981.patch

Possible untested fix.
Comment 5 Jakub Jelinek 2017-11-14 12:20:50 UTC
Actually, it might be better to verify if the can_widen_mult_without_libcall
fails that hmode exists and is exactly half the size of prec, otherwise we could end up with the worst case fallback that can't do overflow.

And/or, the PR71289 change could be guarded by precision equal to TYPE_MODE precision and umulv4_optab present for that mode, otherwise MUL_OVERFLOW might be more expensive than the division.
Comment 6 Jakub Jelinek 2017-11-14 15:57:14 UTC
Created attachment 42602 [details]
gcc8-pr82981.patch

Updated untested patch.
Comment 7 Jim Wilson 2017-11-14 18:28:03 UTC
      if (GET_MODE_2XWIDER_MODE (mode).exists (&wmode)
          && targetm.scalar_mode_supported_p (wmode))
This test succeeds, and then in expand_expr WIDEN_MULT_EXPR the checks for a mult widen optab entry fails, so it gnerates a TImode multiply, which is a libcall.
Comment 8 Jakub Jelinek 2017-11-15 09:02:14 UTC
Author: jakub
Date: Wed Nov 15 09:01:42 2017
New Revision: 254758

URL: https://gcc.gnu.org/viewcvs?rev=254758&root=gcc&view=rev
Log:
	PR target/82981
	* internal-fn.c: Include gimple-ssa.h, tree-phinodes.h and
	ssa-iterators.h.
	(can_widen_mult_without_libcall): New function.
	(expand_mul_overflow): If only checking unsigned mul overflow,
	not result, and can do efficiently MULT_HIGHPART_EXPR, emit that.
	Don't use WIDEN_MULT_EXPR if it would involve a libcall, unless
	no other way works.  Add MULT_HIGHPART_EXPR + MULT_EXPR support.
	(expand_DIVMOD): Formatting fix.
	* expmed.h (expand_mult): Add NO_LIBCALL argument.
	* expmed.c (expand_mult): Likewise.  Use OPTAB_WIDEN rather
	than OPTAB_LIB_WIDEN if NO_LIBCALL is true, and allow it to fail.

	* gcc.target/mips/pr82981.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/mips/pr82981.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expmed.c
    trunk/gcc/expmed.h
    trunk/gcc/internal-fn.c
    trunk/gcc/testsuite/ChangeLog
Comment 9 Jim Wilson 2017-11-15 19:13:22 UTC
Jakub's patch fixes the optimization problem with the testcase.

I tried a linux kernel build with the patched compiler and the provided .config file.  I still see some __multi3 calls.  However, looking at the code, I see in lib/mpi/ several files with calls to umul_ppmm from a longlong.h file similar to the one in gcc, that has

#if (defined(__mips) && __mips >= 3) && W_TYPE_SIZE == 64
#if (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
#define umul_ppmm(w1, w0, u, v) \
do {                                                                    \
        typedef unsigned int __ll_UTItype __attribute__((mode(TI)));    \
        __ll_UTItype __ll = (__ll_UTItype)(u) * (v);                    \
        w1 = __ll >> 64;                                                \
        w0 = __ll;                                                      \
} while (0)
#elif ...

So this is self inflicted damage, as this is an explicit TImode multiply for gcc 5 and later.

This requires either a linux kernel longlong.h fix to use an alternate macro definition for mips targets that don't directly support TImode multiply.  Or else it requires a gcc mips maintainer to fix the mips backend to add multi3 support for mips64r6.  Either one should be possible by emitting two multiply instructions, one for the high part and one for the low part.

Or it requires adding a mips assembly __multi3 function to the linux kernel.  Or maybe the config file can be modified to drop the mpi support?
Comment 10 Jim Wilson 2017-11-15 19:15:35 UTC
Created attachment 42617 [details]
mips64r6 linux kernel .config file

Linux kernel config file to reproduce the linux kernel link failure.
Comment 11 Andrew Pinski 2017-11-15 19:27:32 UTC
(In reply to Jim Wilson from comment #9)
> This requires either a linux kernel longlong.h fix to use an alternate macro
> definition for mips targets that don't directly support TImode multiply.  Or
> else it requires a gcc mips maintainer to fix the mips backend to add multi3
> support for mips64r6.  Either one should be possible by emitting two
> multiply instructions, one for the high part and one for the low part.

The middle-end has support already for mult_highpart optab, low is the same as mult optab (for the smaller mode), emitting this from the expr.c should be easy to add rather than changing most of the back-ends to add a multi3 optab.
Comment 12 Jim Wilson 2017-11-16 00:26:54 UTC
(In reply to Andrew Pinski from comment #11)
> (In reply to Jim Wilson from comment #9)
> > This requires either a linux kernel longlong.h fix to use an alternate macro
> > definition for mips targets that don't directly support TImode multiply.  Or
> > else it requires a gcc mips maintainer to fix the mips backend to add multi3
> > support for mips64r6.  Either one should be possible by emitting two
> > multiply instructions, one for the high part and one for the low part.
> 
> The middle-end has support already for mult_highpart optab, low is the same
> as mult optab (for the smaller mode), emitting this from the expr.c should
> be easy to add rather than changing most of the back-ends to add a multi3
> optab.

This is mips specific code in the kernel to do a multi3 (actually I think mulditi3).  So this isn't a generic gcc problem affecting all targets.  If you don't want the kernel doing this, then fix the kernel.

The mips port already has a mulditi3 pattern, it is just lacking mips64r6 support.  It looks like an oversight that should be fixed.

You are right that expand_expr could synthesize a mulditi3 if muldi3_highpart is available, but it isn't clear if that is the right fix.  Note that if we adopt this solution, then effectively you are forcing backends to implement a muldi3_highpart pattern.  How is this any different than forcing them to implement a mulditi3 pattern?  That doesn't make any sense.

Also, note, that Jakub patch generates muldi3_highpart is mulditi3 fails.  If we fix mulditi3 expansion to call muldi3_highpart, then that appears to make part of Jakub's patch unnecessary.

I think it would be better to fix the mips mulditi3 pattern, or fix the kernel umul_ppmm macro to avoid the problem.
Comment 13 Christophe Lyon 2017-11-20 15:07:41 UTC
I have noticed that this patch (r254758) causes errors on several tests, for instance:
c-c++-common/builtin-arith-overflow-2.c

target: arm-none-linux-gnueabi
--with-mode thumb
--with-cpu cortex-a9
and using -march=armv5t in the runtest flags/target board

spawn -ignore SIGHUP /home/christophe.lyon/src/GCC/builds/gcc-fsf-reg-254758-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc -B/home/christophe.lyon/src/GCC/builds/gcc-fsf-reg-254758-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/ /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c -march=armv5t -fno-diagnostics-show-caret -fdiagnostics-color=never -Wc++-compat -Wno-long-long -lm -o ./builtin-arith-overflow-2.exe^M
during RTL pass: expand^M
/home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c: In function 'main':^M
/home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:93:23: internal compiler error: Segmentation fault^M
/home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:103:3: note: in expansion of macro 'RuntimeAssert'^M
/home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:250:3: note: in expansion of macro 'G_TEST'^M
0xba522f crash_signal^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/toplev.c:325^M
0xb3cd90 commutative_operand_precedence(rtx_def*)^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/rtlanal.c:3405^M
0xb3cedd swap_commutative_operands_p(rtx_def*, rtx_def*)^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/rtlanal.c:3475^M
0x7baa10 do_compare_rtx_and_jump(rtx_def*, rtx_def*, rtx_code, int, machine_mode, rtx_def*, rtx_code_label*, rtx_code_label*, profile_probability)^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/dojump.c:996^M
0x9702c3 expand_mul_overflow^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/internal-fn.c:1841^M
0x970d1b expand_arith_overflow^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/internal-fn.c:2251^M
0x73749f expand_call_stmt^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:2584^M
0x73749f expand_gimple_stmt_1^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:3607^M
0x73749f expand_gimple_stmt^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:3773^M
0x7393af expand_gimple_basic_block^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:5774^M
0x73f1f6 execute^M
        /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:6375^M
Please submit a full bug report,^M
Comment 14 Jakub Jelinek 2017-11-20 15:46:25 UTC
Created attachment 42662 [details]
gcc8-pr82981-arm.patch

Untested fix for the arm issue.
Comment 15 Jakub Jelinek 2017-11-21 07:49:46 UTC
Author: jakub
Date: Tue Nov 21 07:49:14 2017
New Revision: 254986

URL: https://gcc.gnu.org/viewcvs?rev=254986&root=gcc&view=rev
Log:
	PR target/82981
	* internal-fn.c (expand_mul_overflow): Use OPTAB_WIDEN instead of
	OPTAB_DIRECT in calls to expand_simple_binop.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/internal-fn.c
Comment 16 Richard Biener 2018-01-25 08:25:40 UTC
GCC 7.3 is being released, adjusting target milestone.
Comment 17 Christophe Lyon 2018-12-06 13:45:01 UTC
The testcase passes on all the arm configs I am testing, on trunk, gcc-8 and gcc-7.