Bug 95424

Summary: Failure to optimize division with numerator of 1
Product: gcc Reporter: Gabriel Ravier <gabravier>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: enhancement CC: gabravier, h13958451065, law, zhaoweiliew
Priority: P3 Keywords: missed-optimization
Version: 11.0   
Target Milestone: 12.0   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2021-08-14 00:00:00
Bug Depends on:    
Bug Blocks: 19987    
Attachments: Tested patch for the case of unsigned integer X
Optimization for both unsigned and signed integer X
Optimization patch for both unsigned and signed integer X
Optimize 1 / X for integer X

Description Gabriel Ravier 2020-05-29 20:21:50 UTC
unsigned f(unsigned x)
{
    return 1 / x;
}

This can be optimized to `return (x == 1);`, and for `int` to `return (unsigned)(x + 1) < 3 ? x : 0;`. This transformation is done by LLVM, but not by GCC.
Comment 1 Zhao Wei Liew 2021-12-30 07:12:45 UTC
Created attachment 52091 [details]
Tested patch for the case of unsigned integer X

I tried to tackle the unsigned integer X case by adding an optimization in match.pd. I suppose it can be done similarly for the signed integer X case as well, but I'll need more time to look into that.

I've attached the proposed patch. With this patch, GCC generates the same assembly output as Clang.

int f(unsigned int x) {
	return 1 / x;
}


.LFB0:
	.cfi_startproc
	xorl	%eax, %eax
	cmpl	$1, %edi
	sete	%al
	ret
Comment 2 Zhao Wei Liew 2021-12-31 01:16:47 UTC
Created attachment 52097 [details]
Optimization for both unsigned and signed integer X

I've attached a new patch which does the optimization for both the unsigned and the signed case.

With this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:

f(int):
	lea eax, [rdi + 1]
	cmp eax, 2
	mov eax, 0
	cmovbe eax, edi
	ret
f(unsigned int):
	xor eax, eax
	cmp edi, 1
	sete al
	ret

which is virtually identical to the following assembly produced by clang++ -std=c++20 -O3:

f(int):
	lea ecx, [rdi + 1]
	xor eax, eax
	cmp ecx, 3
	cmovb eax, edi
	ret
f(unsigned int):
	xor eax, eax
	cmp edi, 1
	sete al
	ret
Comment 3 Zhao Wei Liew 2021-12-31 01:17:57 UTC
Comment on attachment 52097 [details]
Optimization for both unsigned and signed integer X

diff --git a/gcc/match.pd b/gcc/match.pd
index 0d7b8dd1334..6fbf1071a97 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -349,6 +349,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (op @0 integer_onep)
     (non_lvalue @0)))
 
+/* 1 / X -> X == 1 for unsigned integer X
+   1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+  (simplify
+    (div integer_onep@0 @1)
+    (switch
+      /* 1 / X -> X == 1 for unsigned integer X */
+      (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+          && TYPE_UNSIGNED (TREE_TYPE (@1)))
+        (eq @0 @1))
+      /* 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+      (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+          && !TYPE_UNSIGNED (TREE_TYPE (@1)))
+       (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
+                      (le @1 { build_one_cst (integer_type_node); }))
+              @1 { build_zero_cst (type); })))))
+
 /* (A / (1 << B)) -> (A >> B).
    Only for unsigned A.  For signed A, this would not preserve rounding
    toward zero.
Comment 4 Zhao Wei Liew 2021-12-31 01:18:23 UTC
Comment on attachment 52097 [details]
Optimization for both unsigned and signed integer X

diff --git a/gcc/match.pd b/gcc/match.pd
index 0d7b8dd1334..6fbf1071a97 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -349,6 +349,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (op @0 integer_onep)
     (non_lvalue @0)))
 
+/* 1 / X -> X == 1 for unsigned integer X
+   1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+  (simplify
+    (div integer_onep@0 @1)
+    (switch
+      /* 1 / X -> X == 1 for unsigned integer X */
+      (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+          && TYPE_UNSIGNED (TREE_TYPE (@1)))
+        (eq @0 @1))
+      /* 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X */
+      (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
+          && !TYPE_UNSIGNED (TREE_TYPE (@1)))
+       (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
+                      (le @1 { build_one_cst (integer_type_node); }))
+              @1 { build_zero_cst (type); })))))
+
 /* (A / (1 << B)) -> (A >> B).
    Only for unsigned A.  For signed A, this would not preserve rounding
    toward zero.
Comment 5 Zhao Wei Liew 2021-12-31 01:19:53 UTC
Created attachment 52098 [details]
Optimization patch for both unsigned and signed integer X

Oops, I uploaded the wrong patch previously. I apologise for the spam. Here, I've fixed the patch.
Comment 6 Zhao Wei Liew 2022-01-05 03:33:34 UTC
Created attachment 52126 [details]
Optimize 1 / X for integer X

Here's a new patch along with test cases.
Comment 7 GCC Commits 2022-01-28 18:37:31 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:c2b610e7c6c89fd422c5c31f01023bcddf3cf4a5

commit r12-6924-gc2b610e7c6c89fd422c5c31f01023bcddf3cf4a5
Author: Zhao Wei Liew <zhaoweiliew@gmail.com>
Date:   Fri Jan 28 13:36:39 2022 -0500

    match.pd: Simplify 1 / X for integer X [PR95424]
    
    This patch implements an optimization for the following C++ code:
    
    int f(int x) {
        return 1 / x;
    }
    
    int f(unsigned int x) {
        return 1 / x;
    }
    
    Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:
    
    f(int):
        xor edx, edx
        mov eax, 1
        idiv edi
        ret
    f(unsigned int):
        xor edx, edx
        mov eax, 1
        div edi
        ret
    
    In comparison, clang++ -std=c++20 -O3 produces the following assembly:
    
    f(int):
        lea ecx, [rdi + 1]
        xor eax, eax
        cmp ecx, 3
        cmovb eax, edi
        ret
    f(unsigned int):
        xor eax, eax
        cmp edi, 1
        sete al
        ret
    
    Clang's output is more efficient as it avoids expensive div operations.
    
    With this patch, GCC now produces the following assembly:
    
    f(int):
        lea eax, [rdi + 1]
        cmp eax, 2
        mov eax, 0
        cmovbe eax, edi
        ret
    f(unsigned int):
        xor eax, eax
        cmp edi, 1
        sete al
        ret
    
    which is virtually identical to Clang's assembly output. Any slight differences
    in the output for f(int) is possibly related to a different missed optimization.
    
    v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html
    Changes from v2:
    1. Refactor from using a switch statement to using the built-in
    if-else statement.
    
    v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
    Changes from v1:
    1. Refactor common if conditions.
    2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
    3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.
    
    gcc/ChangeLog:
    
            PR tree-optimization/95424
            * match.pd: Simplify 1 / X where X is an integer.
Comment 8 Jeffrey A. Law 2022-01-28 18:41:42 UTC
Fixed on the trunk.
Comment 9 GCC Commits 2022-01-29 16:56:28 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:3d41939c8799a51b1cd7f4610c06771bf6d52f15

commit r12-6932-g3d41939c8799a51b1cd7f4610c06771bf6d52f15
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sat Jan 29 17:55:51 2022 +0100

    testsuite: Fix up tree-ssa/divide-7.c testcase [PR95424]
    
    This test fails everywhere, because ? doesn't match literal ?.
    It should use \\? instead.  I've also changed those .s in there.
    
    2022-01-29  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/95424
            * gcc.dg/tree-ssa/divide-7.c: Fix up regexps in scan-tree-dump{,-not}.
Comment 10 huyubiao 2025-07-04 04:53:37 UTC
When encountering an expression like 1/X where X is a variable (potentially zero), GCC cannot statically determine whether X is zero. 

Currently, aside from using -fnon-call-exceptions to disable optimizations, are there other ways to make GCC recognize and handle the zero-check for X? Could GCC provide better user diagnostics to highlight this risky pattern?