int f(int x, int y) { return ~(x - y); } This can be optimized to `~x + y`. While this isn't necessarily faster on most platforms, it is at least equivalent and on x86 the addition can be optimized to `lea` whereas the subtraction can't. This transformation is done by LLVM, but not by GCC.
It's at least a missed canonicalization as ~x + y isn't transformed to ~(x - y) either. int f(int x, int y) { return ~(x - y) + (~x + y); } should see CSE, with a minus folding to zero (that works already for some reason).
I guess ~x + y as canonical form has another advantage, that + is commutative while - is not.
Created attachment 49742 [details] gcc11-pr96685.patch Untested fix.
Though, there is some canonicalization problem GENERIC vs. GIMPLE: unsigned f1 (unsigned x, unsigned y) { unsigned int r = (x - y); return ~r; } unsigned f2 (unsigned x, unsigned y) { unsigned int r = ~(x - y); return r; } unsigned f3 (unsigned x) { unsigned int r = (x - 23); return ~r; } unsigned f4 (unsigned x) { unsigned int r = ~(x - 23); return r; } int f5 (int x, int y) { int r = (x - y); return ~r; } int f6 (int x, int y) { int r = ~(x - y); return r; } int f7 (int x) { int r = (x - 23); return ~r; } int f8 (int x) { int r = ~(x - 23); return r; } Before the above patch, the above testcase emitted: subl %esi, %edi movl %edi, %eax notl %eax for f1/f2/f5/f6 and leal -23(%rdi), %eax notl %eax for f3/f4/f7/f8. With the patch it emits: notl %edi leal (%rdi,%rsi), %eax for f1/f5/f6, subl %edi, %esi leal -1(%rsi), %eax for f2, notl %edi leal 23(%rdi), %eax for f3/f7, movl $22, %eax subl %edi, %eax for f4/f8.
Ok, so for GENERIC it seems to be the associate: in fold_binary_loc that converts ~x + y created by this patch into (y - x) + 1, and we don't have an equivalent for that in GIMPLE. So, shall I restrict this match.pd optimization to #if GIMPLE only, or shall I canonicalize not to ~x + y but to (y - x) + 1, or should we implement the associate: optimization on GIMPLE? I guess the last one might be too much in stage3. I guess the middle option wouldn't help for the testcase in the patch, because we wouldn't canonicalize ~x + y to (y - x) + 1 and the first option would not consider ~x + y or ~(x - y) equivalent to user-written (y - x) + 1.
Created attachment 49745 [details] gcc11-pr96685.patch Updated patch.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:0bd675183d94e6bca100c3aaaf87ee9676fb3c26 commit r11-5958-g0bd675183d94e6bca100c3aaaf87ee9676fb3c26 Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Dec 12 14:49:57 2020 +0100 match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685] This patch adds the ~(X - Y) -> ~X + Y simplification requested in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can be safely negated. The first two simplify blocks is what has been requested in the PR and that makes the first testcase pass. Unfortunately, that change also breaks the second testcase, because while the same expressions appearing in the same stmt and split across multiple stmts has been folded (not really) before, with this optimization fold-const.c optimizes ~X + Y further into (Y - X) - 1 in fold_binary_loc associate: code, but we have nothing like that in GIMPLE and so end up with different expressions. The last simplify is an attempt to deal with just this case, had to rule out there the Y == -1U case, because then we reached infinite recursion as ~X + -1U was canonicalized by the pattern into (-1U - X) + -1U but there is a canonicalization -1 - A -> ~A that turns it back. Furthermore, had to make it #if GIMPLE only, because it otherwise resulted in infinite recursion when interacting with the associate: optimization. The end result is that we pass all 3 testcases and thus canonizalize the 3 possible forms of writing the same thing. 2020-12-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96685 * match.pd (~(X - Y) -> ~X + Y): New optimization. (~X + Y -> (Y - X) - 1): Likewise. * gcc.dg/tree-ssa/pr96685-1.c: New test. * gcc.dg/tree-ssa/pr96685-2.c: New test. * gcc.dg/tree-ssa/pr96685-3.c: New test.
Fixed.
*** Bug 37516 has been marked as a duplicate of this bug. ***