Hello, we noticed that GCC seems to have missed optimizations as stated in the title, and it works as expected on gcc-11.4. After analysis, we found that they (gcc-trunk and gcc-11.4) are different in evrp(tree), and further, the difference is in the calculation of the value range. reduced code: https://godbolt.org/z/Y7v1jsTKf int c; void func(int x){ c=(1/(x+1))/2; } gcc(trunk) -O3 -fwrapv: func(int): lea edx, [rdi+1] add edi, 2 xor eax, eax cmp edi, 2 cmova edx, eax mov eax, edx shr eax, 31 add eax, edx sar eax mov DWORD PTR c[rip], eax ret evrp (tree): Folding statement: _7 = _6 <= 2; Not folded Folding statement: _2 = _7 ? _1 : 0; Possible COND_EXPR adjustment. Range op1 : [irange] int VARYING and Range op2: [irange] int [0, 0] NONZERO 0x0 Not folded Folding statement: _3 = _2 / 2; Global Exported: _3 = [irange] int [-1073741824, 1073741823] Not folded Folding statement: c = _3; Not folded Folding statement: return; Not folded Expected code: gcc(11.4) -O3 -fwrapv: func(int): mov DWORD PTR c[rip], 0 ret evrp (tree): =========== BB 2 ============ <bb 2> : # DEBUG BEGIN_STMT _1 = x_4(D) + 1; _2 = 1 / _1; c = 0; return; _2 : int [-1, 1] Non-varying global ranges: =========================: _2 : int [-1, 1] Also, the following code works as expected (gcc-trunk): void func1(int x){ c=(1/x)/2; } Thank you very much for your time and effort! We look forward to hearing from you.
Confirmed.
Started with r12-6924-gc2b610e7c6c89fd422c5c31f01023bcddf3cf4a5
Thinking about if 1/x or (x+1u) <= 2 ? x : 0 is more conconial for gimple. I suspect 1/x is . Which case this should be move to late gimple. I will look at that tomorrow.
Even then, I wonder why ranger doesn't figure this out. (x+1u) <= 2 ? x : 0 must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no?
(In reply to Jakub Jelinek from comment #4) > Even then, I wonder why ranger doesn't figure this out. > (x+1u) <= 2 ? x : 0 > must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no? its because there is no branch which is what drives ranger. At this point, we aren't quite smart enough to completely evaluate the 2 operands of a conditional as if they were actual branches. ie _1 = x_4(D) + 1; _10 = (unsigned int) x_4(D); _6 = _10 + 2; _7 = _6 <= 2; _2 = _7 ? _1 : 0; if that were: if (_6 <= 2) _2 = _1 we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith [-1, 1] for _1 instead of varying. I'll have to look at whats involved in enhancing the fold code to invoke GORI to reevaluate _1 if _7 is [1,1]. in theory is not too difficult... :-)
(In reply to Andrew Macleod from comment #5) > (In reply to Jakub Jelinek from comment #4) > > Even then, I wonder why ranger doesn't figure this out. > > (x+1u) <= 2 ? x : 0 > > must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no? > > its because there is no branch which is what drives ranger. At this point, > we aren't quite smart enough to completely evaluate the 2 operands of a > conditional as if they were actual branches. > ie > _1 = x_4(D) + 1; > _10 = (unsigned int) x_4(D); > _6 = _10 + 2; > _7 = _6 <= 2; > _2 = _7 ? _1 : 0; > > if that were: > if (_6 <= 2) > _2 = _1 > we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith > [-1, 1] for _1 instead of varying. > > I'll have to look at whats involved in enhancing the fold code to invoke > GORI to reevaluate _1 if _7 is [1,1]. in theory is not too difficult... :-) ah, its more complicated than that. we normally do this evaluation, but the cond_expr is using _1.. if you trace back from _6 in the condition, _1 is not in the dependency chain anywhere, so GORi cannot compute anything for it. it can compute that x_4 is [-2, 0] but it doesnt see any connection between _6 in the condition and _1. the remaining question is whether this can be cheaply identified as a recomputation.. in which case we could recompute _1 usin the [-2, 0] for x_4 and come up with [-1, 1] I'll have a look if we can easily invoke hte recompuation code the edges evaluations use or nor
I have a patch which disables the "simplification" of "1/x" for signed until late which solves this missed optimization. I will post it once it finishes testing. Note we already delayed the simplification of `bool * d` to `bool?A:0` until late for similar reasons, see PR 103257 on that.
Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642582.html
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>: https://gcc.gnu.org/g:7f56a90269b393fcc55ef08e0990fafb7b1c24b4 commit r14-7148-g7f56a90269b393fcc55ef08e0990fafb7b1c24b4 Author: Andrew Pinski <quic_apinski@quicinc.com> Date: Wed Jan 10 14:25:37 2024 -0800 match: Delay folding of 1/x into `(x+1u)<2u?x:0` until late [PR113301] Since currently ranger does not work with the complexity of COND_EXPR in some cases so delaying the simplification of `1/x` for signed types help code generation. tree-ssa/divide-8.c is a new testcase where this can help. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/113301 gcc/ChangeLog: * match.pd (`1/x`): Delay signed case until late. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/divide-8.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Fixed for GCC 14, not expecting to backport ...