| Summary: | Failure to optimize division with numerator of 1 | ||
|---|---|---|---|
| Product: | gcc | Reporter: | Gabriel Ravier <gabravier> |
| Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | gabravier, h13958451065, law, zhaoweiliew |
| Priority: | P3 | Keywords: | missed-optimization |
| Version: | 11.0 | ||
| Target Milestone: | 12.0 | ||
| See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301 | ||
| Host: | Target: | ||
| Build: | Known to work: | ||
| Known to fail: | Last reconfirmed: | 2021-08-14 00:00:00 | |
| Bug Depends on: | |||
| Bug Blocks: | 19987 | ||
| Attachments: |
Tested patch for the case of unsigned integer X
Optimization for both unsigned and signed integer X Optimization patch for both unsigned and signed integer X Optimize 1 / X for integer X |
||
Created attachment 52091 [details]
Tested patch for the case of unsigned integer X
I tried to tackle the unsigned integer X case by adding an optimization in match.pd. I suppose it can be done similarly for the signed integer X case as well, but I'll need more time to look into that.
I've attached the proposed patch. With this patch, GCC generates the same assembly output as Clang.
int f(unsigned int x) {
return 1 / x;
}
.LFB0:
.cfi_startproc
xorl %eax, %eax
cmpl $1, %edi
sete %al
ret
Created attachment 52097 [details]
Optimization for both unsigned and signed integer X
I've attached a new patch which does the optimization for both the unsigned and the signed case.
With this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:
f(int):
lea eax, [rdi + 1]
cmp eax, 2
mov eax, 0
cmovbe eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
which is virtually identical to the following assembly produced by clang++ -std=c++20 -O3:
f(int):
lea ecx, [rdi + 1]
xor eax, eax
cmp ecx, 3
cmovb eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
Comment on attachment 52097 [details]
Optimization for both unsigned and signed integer X
diff --git a/gcc/match.pd b/gcc/match.pd
index 0d7b8dd1334..6fbf1071a97 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -349,6 +349,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(op @0 integer_onep)
(non_lvalue @0)))
+/* 1 / X -> X == 1 for unsigned integer X
+ 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+ (div integer_onep@0 @1)
+ (switch
+ /* 1 / X -> X == 1 for unsigned integer X */
+ (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (eq @0 @1))
+ /* 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+ (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && !TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
+ (le @1 { build_one_cst (integer_type_node); }))
+ @1 { build_zero_cst (type); })))))
+
/* (A / (1 << B)) -> (A >> B).
Only for unsigned A. For signed A, this would not preserve rounding
toward zero.
Comment on attachment 52097 [details]
Optimization for both unsigned and signed integer X
diff --git a/gcc/match.pd b/gcc/match.pd
index 0d7b8dd1334..6fbf1071a97 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -349,6 +349,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(op @0 integer_onep)
(non_lvalue @0)))
+/* 1 / X -> X == 1 for unsigned integer X
+ 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+ (div integer_onep@0 @1)
+ (switch
+ /* 1 / X -> X == 1 for unsigned integer X */
+ (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (eq @0 @1))
+ /* 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+ (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && !TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
+ (le @1 { build_one_cst (integer_type_node); }))
+ @1 { build_zero_cst (type); })))))
+
/* (A / (1 << B)) -> (A >> B).
Only for unsigned A. For signed A, this would not preserve rounding
toward zero.
Created attachment 52098 [details]
Optimization patch for both unsigned and signed integer X
Oops, I uploaded the wrong patch previously. I apologise for the spam. Here, I've fixed the patch.
Created attachment 52126 [details]
Optimize 1 / X for integer X
Here's a new patch along with test cases.
The master branch has been updated by Jeff Law <law@gcc.gnu.org>: https://gcc.gnu.org/g:c2b610e7c6c89fd422c5c31f01023bcddf3cf4a5 commit r12-6924-gc2b610e7c6c89fd422c5c31f01023bcddf3cf4a5 Author: Zhao Wei Liew <zhaoweiliew@gmail.com> Date: Fri Jan 28 13:36:39 2022 -0500 match.pd: Simplify 1 / X for integer X [PR95424] This patch implements an optimization for the following C++ code: int f(int x) { return 1 / x; } int f(unsigned int x) { return 1 / x; } Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly: f(int): xor edx, edx mov eax, 1 idiv edi ret f(unsigned int): xor edx, edx mov eax, 1 div edi ret In comparison, clang++ -std=c++20 -O3 produces the following assembly: f(int): lea ecx, [rdi + 1] xor eax, eax cmp ecx, 3 cmovb eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret Clang's output is more efficient as it avoids expensive div operations. With this patch, GCC now produces the following assembly: f(int): lea eax, [rdi + 1] cmp eax, 2 mov eax, 0 cmovbe eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret which is virtually identical to Clang's assembly output. Any slight differences in the output for f(int) is possibly related to a different missed optimization. v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html Changes from v2: 1. Refactor from using a switch statement to using the built-in if-else statement. v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html Changes from v1: 1. Refactor common if conditions. 2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type. 3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1. gcc/ChangeLog: PR tree-optimization/95424 * match.pd: Simplify 1 / X where X is an integer. Fixed on the trunk. The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:3d41939c8799a51b1cd7f4610c06771bf6d52f15 commit r12-6932-g3d41939c8799a51b1cd7f4610c06771bf6d52f15 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Jan 29 17:55:51 2022 +0100 testsuite: Fix up tree-ssa/divide-7.c testcase [PR95424] This test fails everywhere, because ? doesn't match literal ?. It should use \\? instead. I've also changed those .s in there. 2022-01-29 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/95424 * gcc.dg/tree-ssa/divide-7.c: Fix up regexps in scan-tree-dump{,-not}. When encountering an expression like 1/X where X is a variable (potentially zero), GCC cannot statically determine whether X is zero. Currently, aside from using -fnon-call-exceptions to disable optimizations, are there other ways to make GCC recognize and handle the zero-check for X? Could GCC provide better user diagnostics to highlight this risky pattern? |
unsigned f(unsigned x) { return 1 / x; } This can be optimized to `return (x == 1);`, and for `int` to `return (unsigned)(x + 1) < 3 ? x : 0;`. This transformation is done by LLVM, but not by GCC.