If A and B are both unsigned, then 'A > -1 / B' is a nice predicate for checking whether A*B would overflow. The compiler should be able to optimize the otherwise-unused division to a multiplication followed by branch on overflow. I've tried to add the corresponding transform to match.pd, but it seems something else needs to be wired up as well, because it doesn't trigger. What am I missing? TIA. diff --git a/gcc/match.pd b/gcc/match.pd index e511e9a..83c02a8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2582,6 +2582,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && types_match (TREE_TYPE (@0), TREE_TYPE (@1))) (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); })))) +/* Simplify unsigned multiplication overflow check A > -1 / B to a builtin. */ +(for cmp (gt le) + out (ne eq) + (simplify + (cmp @0 (trunc_div:s integer_all_onesp @1)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (@1))) + (out + (imagpart (IFN_MUL_OVERFLOW @0 @1)) + { build_zero_cst (TREE_TYPE (@0)); })))) /* Simplification of math builtins. These rules must all be optimizations as well as IL simplifications. If there is a possibility that the new
(In reply to Alexander Monakov from comment #0) > I've tried to add the corresponding transform to match.pd, but it seems > something else needs to be wired up as well, because it doesn't trigger. > What am I missing? TIA. What do the dumps look like? Gcc is likely to change things to -1 / B < A, which you don't handle... > +/* Simplify unsigned multiplication overflow check A > -1 / B to a builtin. > */ > +(for cmp (gt le) > + out (ne eq) > + (simplify > + (cmp @0 (trunc_div:s integer_all_onesp @1)) > + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (TREE_TYPE (@0), TREE_TYPE (@1))) types_match seems unnecessary here, it is mostly useful when we have some CONVERT_EXPR, but trunc's arguments and result types have to match, same for cmp arguments, so @0 and @1 should always have matching types. > + (out > + (imagpart (IFN_MUL_OVERFLOW @0 @1)) > + { build_zero_cst (TREE_TYPE (@0)); })))) We probably need to disable such a transformation for vectors.
> What do the dumps look like? Gcc is likely to change things to -1 / B < A, which you don't handle... The dumps didn't help much, but you're right that normally the order is opposite, thanks (I didn't realize that when looking in the debugger the first time). I tried to take into account your comments. I also had to make the output more complex because emitting bare internal functions doesn't work. It still ICEs for typedef unsigned v2si __attribute__((vector_size(8))); v2si h(v2si a, v2si b) { v2si m = {~0, ~0}; return (a > m / b); } but before debugging that I'd like to know if this is the right approach in the first place. diff --git a/gcc/match.pd b/gcc/match.pd index e511e9a..a7fbcf1 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2582,6 +2582,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && types_match (TREE_TYPE (@0), TREE_TYPE (@1))) (out (imagpart @2) { build_zero_cst (TREE_TYPE (@0)); })))) +/* Simplify unsigned multiplication overflow check -1 / A < B to a builtin. */ +(for cmp (lt ge) + out (ne eq) + (simplify + (cmp (trunc_div:s integer_all_onesp @0) @1) + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) + && !VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))) + (out + (imagpart { build_call_expr_internal_loc (UNKNOWN_LOCATION, + IFN_MUL_OVERFLOW, + build_complex_type (TREE_TYPE (@0)), + 2, @0, @1); } ) + { build_zero_cst (TREE_TYPE (@0)); })))) + +(for cmp (gt le) + out (ne eq) + (simplify + (cmp @0 (trunc_div:s integer_all_onesp @1)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0)) + && !VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))) + (out + (imagpart { build_call_expr_internal_loc (UNKNOWN_LOCATION, + IFN_MUL_OVERFLOW, + build_complex_type (TREE_TYPE (@0)), + 2, @0, @1); } ) + { build_zero_cst (TREE_TYPE (@0)); })))) /* Simplification of math builtins. These rules must all be optimizations as well as IL simplifications. If there is a possibility that the new
I think genmatch can handle calls to internal functions, the issue is with guessing the return type. Maybe we could have a specific heuristic for these functions in the case where the type is not explicitly specified. (for cmp (lt ge) out (ne eq) (simplify (cmp (trunc_div:s integer_all_onesp @1) @0) (if (TYPE_UNSIGNED (TREE_TYPE (@0)) && !VECTOR_TYPE_P (TREE_TYPE (@0))) (with { tree cpx = build_complex_type (TREE_TYPE (@0)); } (out (imagpart (IFN_MUL_OVERFLOW:cpx @0 @1)) { build_zero_cst (TREE_TYPE (@0)); }))))) (and the symmetric) The approach seems fine to me.
Author: amonakov Date: Mon May 30 14:37:02 2016 New Revision: 236882 URL: https://gcc.gnu.org/viewcvs?rev=236882&root=gcc&view=rev Log: match.pd: optimize unsigned mul overflow check gcc/ 2016-05-28 Alexander Monakov <amonakov@ispras.ru> Marc Glisse <marc.glisse@inria.fr> PR tree-optimization/71289 * match.pd (-1 / B < A, A > -1 / B): New transformations. gcc/testsuite/ 2016-05-28 Alexander Monakov <amonakov@ispras.ru> PR tree-optimization/71289 * gcc.dg/pr71289.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr71289.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
Fixed on the trunk. Thanks again Marc.
Hello. Thank you guys for implementing the multiplication overflow optimization. Otherwise there have been "complaints" on the web [1] about lack of such optimization. However, from what I tested in the current svn source (r244933), the construct like this (b != 0 && -1 / b < a) still not yet yields optimal code. In particular, there are still unnecessary (b != 0) generated... ...for a code like this: ---------------------------------------------- #define UINT_MAX (~0U) void abort(void); /* from stdlib.h */ unsigned int umul_overflow1(unsigned int a, unsigned int b) { if (b != 0 && UINT_MAX / b < a) abort(); return a * b; } unsigned int umul_overflow2(unsigned int a, unsigned int b) { unsigned int res; if (__builtin_umul_overflow(a, b, &res)) abort(); return res; } ---------------------------------------------- Generated assembly (-O2, gcc configured with --target=x86_64-pc-linux-gnu --with-arch-64=generic) umul_overflow1: .LFB0: .cfi_startproc testl %esi, %esi je .L2 movl %edi, %eax mull %esi jo .L13 .L2: movl %esi, %eax imull %edi, %eax ret .L13: subq $8, %rsp .cfi_def_cfa_offset 16 call abort .cfi_endproc umul_overflow2: .LFB1: .cfi_startproc movl %edi, %eax mull %esi jo .L22 rep ret .L22: subq $8, %rsp .cfi_def_cfa_offset 16 call abort .cfi_endproc ---------------------------------------------- I would expect that ideally, both functions will generate same machine code. [1]http://kqueue.org/blog/2012/03/16/fast-integer-overflow-detection/