Given the testcase unsigned long func(unsigned long a, unsigned long b) { return a > (~0UL) / b; } compiled with -march=mips64r6 -mabi=64 -mexplicit-relocs -O2 we end up with a call to __multi3 which is inefficient and inconvenient. The testcase gets converted to _1 = MUL_OVERFLOW (a_4(D), b_5(D)); _2 = IMAGPART_EXPR <_1>; There are no mulv patterns in the mips port, so it tries mulditi3 which fails, and then calls __multi3. Mips64r6 does have a widening DImode multiply which should have been used instead. The problem is that the *mulditi3 pattern is missing mips64r6 support. Alternatively, the expander should try using a muldi3_highpart pattern when the mulditi3 pattern doesn't work. Especially when the highpart is the only part we need as in this example. The mips64r6 multi3_highpart is present. Or alternatively, a mulvti3 pattern should be added to the mips port. See for instance the thread at https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html
This problem is causing link errors for the mips64r6 linux kernel. They would like the gcc developers to fix it, instead of adding a definition of the __multi3 function to the linux kernel.
We could also restrict MUL_OVERFLOW pattern matching to archs that can expand it?
That might not be that easy without repeating there big chunks of internal-fn.c stuff. In any case, what is mips64r6 using? From the above it seems it doesn't have a corresponding optab, so if (icode == CODE_FOR_nothing) is true. Thus, is it: if (GET_MODE_2XWIDER_MODE (mode).exists (&wmode) && targetm.scalar_mode_supported_p (wmode)) or else if (int_mode_for_size (prec / 2, 1).exists (&hmode) && 2 * GET_MODE_PRECISION (hmode) == prec) or the fallback case that doesn't report overflow: else { gcc_assert (!is_ubsan); ops.code = MULT_EXPR; ops.type = type; res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL); emit_jump (done_label); } ? If it is the first one from these, perhaps we should have some extra checks there whether WIDEN_MULT_EXPR will be emitted as a library call or not. Though, if mips64r6 has hipart multiplication, I don't see why it couldn't handle the widening multiplication by performing normal DImode multiplication plus highpart DImode multiplication or something similar.
Created attachment 42600 [details] gcc8-pr82981.patch Possible untested fix.
Actually, it might be better to verify if the can_widen_mult_without_libcall fails that hmode exists and is exactly half the size of prec, otherwise we could end up with the worst case fallback that can't do overflow. And/or, the PR71289 change could be guarded by precision equal to TYPE_MODE precision and umulv4_optab present for that mode, otherwise MUL_OVERFLOW might be more expensive than the division.
Created attachment 42602 [details] gcc8-pr82981.patch Updated untested patch.
if (GET_MODE_2XWIDER_MODE (mode).exists (&wmode) && targetm.scalar_mode_supported_p (wmode)) This test succeeds, and then in expand_expr WIDEN_MULT_EXPR the checks for a mult widen optab entry fails, so it gnerates a TImode multiply, which is a libcall.
Author: jakub Date: Wed Nov 15 09:01:42 2017 New Revision: 254758 URL: https://gcc.gnu.org/viewcvs?rev=254758&root=gcc&view=rev Log: PR target/82981 * internal-fn.c: Include gimple-ssa.h, tree-phinodes.h and ssa-iterators.h. (can_widen_mult_without_libcall): New function. (expand_mul_overflow): If only checking unsigned mul overflow, not result, and can do efficiently MULT_HIGHPART_EXPR, emit that. Don't use WIDEN_MULT_EXPR if it would involve a libcall, unless no other way works. Add MULT_HIGHPART_EXPR + MULT_EXPR support. (expand_DIVMOD): Formatting fix. * expmed.h (expand_mult): Add NO_LIBCALL argument. * expmed.c (expand_mult): Likewise. Use OPTAB_WIDEN rather than OPTAB_LIB_WIDEN if NO_LIBCALL is true, and allow it to fail. * gcc.target/mips/pr82981.c: New test. Added: trunk/gcc/testsuite/gcc.target/mips/pr82981.c Modified: trunk/gcc/ChangeLog trunk/gcc/expmed.c trunk/gcc/expmed.h trunk/gcc/internal-fn.c trunk/gcc/testsuite/ChangeLog
Jakub's patch fixes the optimization problem with the testcase. I tried a linux kernel build with the patched compiler and the provided .config file. I still see some __multi3 calls. However, looking at the code, I see in lib/mpi/ several files with calls to umul_ppmm from a longlong.h file similar to the one in gcc, that has #if (defined(__mips) && __mips >= 3) && W_TYPE_SIZE == 64 #if (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4) #define umul_ppmm(w1, w0, u, v) \ do { \ typedef unsigned int __ll_UTItype __attribute__((mode(TI))); \ __ll_UTItype __ll = (__ll_UTItype)(u) * (v); \ w1 = __ll >> 64; \ w0 = __ll; \ } while (0) #elif ... So this is self inflicted damage, as this is an explicit TImode multiply for gcc 5 and later. This requires either a linux kernel longlong.h fix to use an alternate macro definition for mips targets that don't directly support TImode multiply. Or else it requires a gcc mips maintainer to fix the mips backend to add multi3 support for mips64r6. Either one should be possible by emitting two multiply instructions, one for the high part and one for the low part. Or it requires adding a mips assembly __multi3 function to the linux kernel. Or maybe the config file can be modified to drop the mpi support?
Created attachment 42617 [details] mips64r6 linux kernel .config file Linux kernel config file to reproduce the linux kernel link failure.
(In reply to Jim Wilson from comment #9) > This requires either a linux kernel longlong.h fix to use an alternate macro > definition for mips targets that don't directly support TImode multiply. Or > else it requires a gcc mips maintainer to fix the mips backend to add multi3 > support for mips64r6. Either one should be possible by emitting two > multiply instructions, one for the high part and one for the low part. The middle-end has support already for mult_highpart optab, low is the same as mult optab (for the smaller mode), emitting this from the expr.c should be easy to add rather than changing most of the back-ends to add a multi3 optab.
(In reply to Andrew Pinski from comment #11) > (In reply to Jim Wilson from comment #9) > > This requires either a linux kernel longlong.h fix to use an alternate macro > > definition for mips targets that don't directly support TImode multiply. Or > > else it requires a gcc mips maintainer to fix the mips backend to add multi3 > > support for mips64r6. Either one should be possible by emitting two > > multiply instructions, one for the high part and one for the low part. > > The middle-end has support already for mult_highpart optab, low is the same > as mult optab (for the smaller mode), emitting this from the expr.c should > be easy to add rather than changing most of the back-ends to add a multi3 > optab. This is mips specific code in the kernel to do a multi3 (actually I think mulditi3). So this isn't a generic gcc problem affecting all targets. If you don't want the kernel doing this, then fix the kernel. The mips port already has a mulditi3 pattern, it is just lacking mips64r6 support. It looks like an oversight that should be fixed. You are right that expand_expr could synthesize a mulditi3 if muldi3_highpart is available, but it isn't clear if that is the right fix. Note that if we adopt this solution, then effectively you are forcing backends to implement a muldi3_highpart pattern. How is this any different than forcing them to implement a mulditi3 pattern? That doesn't make any sense. Also, note, that Jakub patch generates muldi3_highpart is mulditi3 fails. If we fix mulditi3 expansion to call muldi3_highpart, then that appears to make part of Jakub's patch unnecessary. I think it would be better to fix the mips mulditi3 pattern, or fix the kernel umul_ppmm macro to avoid the problem.
I have noticed that this patch (r254758) causes errors on several tests, for instance: c-c++-common/builtin-arith-overflow-2.c target: arm-none-linux-gnueabi --with-mode thumb --with-cpu cortex-a9 and using -march=armv5t in the runtest flags/target board spawn -ignore SIGHUP /home/christophe.lyon/src/GCC/builds/gcc-fsf-reg-254758-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc -B/home/christophe.lyon/src/GCC/builds/gcc-fsf-reg-254758-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/ /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c -march=armv5t -fno-diagnostics-show-caret -fdiagnostics-color=never -Wc++-compat -Wno-long-long -lm -o ./builtin-arith-overflow-2.exe^M during RTL pass: expand^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c: In function 'main':^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:93:23: internal compiler error: Segmentation fault^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:103:3: note: in expansion of macro 'RuntimeAssert'^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/testsuite/c-c++-common/builtin-arith-overflow-2.c:250:3: note: in expansion of macro 'G_TEST'^M 0xba522f crash_signal^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/toplev.c:325^M 0xb3cd90 commutative_operand_precedence(rtx_def*)^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/rtlanal.c:3405^M 0xb3cedd swap_commutative_operands_p(rtx_def*, rtx_def*)^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/rtlanal.c:3475^M 0x7baa10 do_compare_rtx_and_jump(rtx_def*, rtx_def*, rtx_code, int, machine_mode, rtx_def*, rtx_code_label*, rtx_code_label*, profile_probability)^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/dojump.c:996^M 0x9702c3 expand_mul_overflow^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/internal-fn.c:1841^M 0x970d1b expand_arith_overflow^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/internal-fn.c:2251^M 0x73749f expand_call_stmt^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:2584^M 0x73749f expand_gimple_stmt_1^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:3607^M 0x73749f expand_gimple_stmt^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:3773^M 0x7393af expand_gimple_basic_block^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:5774^M 0x73f1f6 execute^M /home/christophe.lyon/src/GCC/sources/gcc-fsf/reg-254758/gcc/cfgexpand.c:6375^M Please submit a full bug report,^M
Created attachment 42662 [details] gcc8-pr82981-arm.patch Untested fix for the arm issue.
Author: jakub Date: Tue Nov 21 07:49:14 2017 New Revision: 254986 URL: https://gcc.gnu.org/viewcvs?rev=254986&root=gcc&view=rev Log: PR target/82981 * internal-fn.c (expand_mul_overflow): Use OPTAB_WIDEN instead of OPTAB_DIRECT in calls to expand_simple_binop. Modified: trunk/gcc/ChangeLog trunk/gcc/internal-fn.c
GCC 7.3 is being released, adjusting target milestone.
The testcase passes on all the arm configs I am testing, on trunk, gcc-8 and gcc-7.
Author: law Date: Mon Oct 28 19:17:58 2019 New Revision: 277537 URL: https://gcc.gnu.org/viewcvs?rev=277537&root=gcc&view=rev Log: PR target/82981 * config/mips/mips.md (<u>mulditi3): Generate patterns for high doubleword and low doubleword result of multiplication on MIPS64R6. * gcc.target/mips/mips64r6-ti-mult.c: New test. Added: trunk/gcc/testsuite/gcc.target/mips/mips64r6-ti-mult.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/mips/mips.md trunk/gcc/testsuite/ChangeLog
Fixed in GCC 8.