Take these two functions: int f(int a,int b, int c) { return a==c?b:0; } int g(int a, int b, int c) { return b & -(a==c); } ---- CUT ---- We used to produce the same code generation for both of them: cmp w0, w2 csel w0, w1, wzr, eq ret But in GCC 10 we produce for g: cmp w0, w2 csetm w0, eq and w0, w0, w1 ret I think this was introduced by: 2019-05-09 Segher Boessenkool <segher@kernel.crashing.org> * combine.c (combine_simplify_rtx): Don't make IF_THEN_ELSE RTL.
Hi, I reverted Segher's commit in my local tree, but am still seeing the same code-gen for g(). Thanks, Prathamesh
(In reply to prathamesh3492 from comment #1) > Hi, > I reverted Segher's commit in my local tree, but am still seeing the same > code-gen for g(). Oops I was modifying wrong branch :-/ I can confirm reverting the commit fixes this issue. Sorry for the noise. Regards, Prathamesh > > Thanks, > Prathamesh
(In reply to prathamesh3492 from comment #2) > Oops I was modifying wrong branch :-/ > I can confirm reverting the commit fixes this issue. > Sorry for the noise. Funny, I had did the same :).
(That is r271047; confirmed). As the commit message says, with that patch we generate better code on average for all targets (except alpha). Which isn't surprising at all, given what it does. Combine (or better, simplify-rtx) does not know how to handle (and:SI (neg:SI (eq:SI x y)) (reg:SI z)) Btw, try int h(int a, int b, int c, int d) { return (c & -(a==b)) | (d & -(a!=b)); } to see we have some way to go here.
So if the AND-based idiom is now preferred, shouldn't the if-then-else variant be transformed into it? Similarly for IOR, when we get (IOR (NEG (<cond-op>)) (reg)) from (IF_THEN_ELSE (<cond-op>) (reg) (const_int -1))
(In reply to Richard Earnshaw from comment #5) > So if the AND-based idiom is now preferred, shouldn't the if-then-else > variant be transformed into it? Similarly for IOR, when we get > > (IOR (NEG (<cond-op>)) (reg)) > > from > > (IF_THEN_ELSE (<cond-op>) > (reg) > (const_int -1)) except that should be (IF_THEN_ELSE (<cond-op>') (reg) (const_int -1)) Where <cond-op>' is the reversed condition.
(In reply to Richard Earnshaw from comment #5) > So if the AND-based idiom is now preferred, shouldn't the if-then-else > variant be transformed into it? Neither form is canonical currently. The form you *really* want tests a CCmode var, but combine will not give you that (unless it started with one), so you'll need md patterns for it. > Similarly for IOR, when we get > > (IOR (NEG (<cond-op>)) (reg)) > > from > > (IF_THEN_ELSE (<cond-op>) > (reg) > (const_int -1)) Yeah. And some more variants. I think the IF_THEN_ELSE version should be canonical, and it should be formed in simplify_rtx, not at random spots in combine.
(In reply to Richard Earnshaw from comment #6) > (In reply to Richard Earnshaw from comment #5) > > So if the AND-based idiom is now preferred, shouldn't the if-then-else > > variant be transformed into it? Similarly for IOR, when we get > > > > (IOR (NEG (<cond-op>)) (reg)) > > > > from > > > > (IF_THEN_ELSE (<cond-op>) > > (reg) > > (const_int -1)) > > except that should be > > (IF_THEN_ELSE (<cond-op>') > (reg) > (const_int -1)) > > Where <cond-op>' is the reversed condition. Or just (if_then_else (<cond-op>) (const_int -1) (reg)) (No order )like constant last) of those two arguments is prescribed, and this makes sense).
(In reply to Segher Boessenkool from comment #7) > I think the IF_THEN_ELSE version should be canonical, and it should be > formed in simplify_rtx, not at random spots in combine. Why? The and/ior variants are more likely to lead to a useful split if combining 3 insns and it doesn't match.
There are a gazillion ways to write this without if_then_else, none obviously better than any other, and it gets much worse if your b,c have special values. I don't think this optimisation should be done by combine at all. combine has some serious problems doing this. It is closely related to ifcvt, in some ways.
Couldn't we just fix this kind of stuff in gimple?
Gimple can help writing silly expressions like this in a more canonical form (whatever we decide to use for that) at least, yeah. But you can not do RTL's job in gimple ;-)
Note I don't remember where I found this originally. I found this code somewhere back in May 2012 but I did not record where I found it though; It was the opposite way around really for Octeon2/3 (MIPS64).
That wouldn't be a big surprise Andrew. My point is I think we could create some match.pd patterns to canonicalize the form in gimple and it wouldn't matter nearly as much what combine did or didn't do to this kind of code.
For this, I agree we should canonicalize the second form to the first one (well, any commutative x & -boolean_range into boolean_range ? x : 0 because at least in 95% of cases it will be written the former way, the latter form looks like a cute trick trying to workaround compiler deficiencies. If the x & -boolean_range form results in better code than boolean_range ? x : 0 on some target, we should make sure to expand the ?: that way somewhere in RTL.
Seems on x86_64-linux we also used to give the same code in 3.4.6 and earlier, but starting already with 4.0 (already r90000) we emit cmpl %edx, %edi movl $0, %eax cmove %esi, %eax for f and xorl %eax, %eax cmpl %edx, %edi sete %al negl %eax andl %esi, %eax for g. We do it even for -Os, where the f version is clearly a win, for -O2 I'm afraid it depends on how soon the cmove result is used. LLVM seems to use cmove in both cases, ICC does roughly what gcc does (honor what user wrote). Not sure if we can do this transformation in match.pd though, because for a ? b : 0; the gimplifier doesn't emit a COND_EXPR, but GIMPLE_COND and PHI, I'm afraid we can't create new bbs from within match.pd.
Tried timing: __attribute__((noipa)) int f(int a,int b, int c) { return a==c?b:0; } __attribute__((noipa)) int g(int a, int b, int c) { return b & -(a==c); } int main (int argc, const char **argv) { unsigned int c = 0; if (argv[1][0] == 'f') { for (int i = 0; i < 1000000000; i++) c += f(1,1,i&1); } else { for (int i = 0; i < 1000000000; i++) c += g(1,1,i&1); } asm volatile ("" : : "r" (c)); return 0; } and the same without the "i&", i.e. alternating results and always same, and both functions are the same speed in both cases, and cmov is shorter. Without cmov, such as with -m32 -O2 -march=i586, g seems to be faster, so we probably should teach ifcvt.c to try it.
GCC 10.1 has been released.
GCC 10.2 is released, adjusting target milestone.
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
*** Bug 100072 has been marked as a duplicate of this bug. ***
Mine, we should do this transformation on the gimple level really. Let me do that.
Created attachment 51846 [details] Patch which I am testing Note it is really two patches, one which fixes up the multiply case and then one which does the same for &- too.
(In reply to Segher Boessenkool from comment #4) > Btw, try > > int h(int a, int b, int c, int d) > { > return (c & -(a==b)) | (d & -(a!=b)); > } > > to see we have some way to go here. I filed that as PR 103354 which I will handle after I submit the patches for this one.
Submitted the sequence here: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Revised versions of Andrew's patches from comment #25 were posted here: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609349.html
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>: https://gcc.gnu.org/g:fd1f5373b8647a5da2f7f4b42282e676a4b04d98 commit r13-5128-gfd1f5373b8647a5da2f7f4b42282e676a4b04d98 Author: Roger Sayle <roger@nextmovesoftware.com> Date: Thu Jan 12 21:47:40 2023 +0000 PR tree-optimization/92342: Optimize b & -(a==c) in match.pd This patch is an update/tweak of Andrew Pinski's two patches for PR tree-optimization/92342, that were originally posted by in November: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html Technically, the first of those was approved by Richard Biener, though never committed, and my first thought was to simply push it for Andrew, but the review of the second piece expressed concerns over comparisons in non-integral modes, where the result may not be zero-one valued. Indeed both transformations misbehave in the presence of vector mode comparisons (these transformations are already implemented for vec_cond elsewhere in match.pd), so my minor contribution is to limit these new transformations to scalars, by testing that both the operands and results are INTEGRAL_TYPE_P. 2023-01-12 Andrew Pinski <apinski@marvell.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog: PR tree-optimization/92342 * match.pd ((m1 CMP m2) * d -> (m1 CMP m2) ? d : 0): Use tcc_comparison and :c for the multiply. (b & -(a CMP c) -> (a CMP c)?b:0): New pattern. gcc/testsuite/ChangeLog: PR tree-optimization/92342 * gcc.dg/tree-ssa/andnegcmp-1.c: New test. * gcc.dg/tree-ssa/andnegcmp-2.c: New test. * gcc.dg/tree-ssa/multcmp-1.c: New test. * gcc.dg/tree-ssa/multcmp-2.c: New test.
Looks like the patch fixes this bug, unless I'm missing something.
This has now been fixed on mainline (for GCC 13).
Fixed in GCC 13. I don't mind this not being fixed on other branches.