Hello, this is possibly related to bug 28685, or to the bugs about the tracking of flag register values. void f(); enum Sign { NEG=-1, ZERO, POS }; enum Sign sign(double x){ if(x>0) return POS; if(x<0) return NEG; return ZERO; } void g(double x){ if(sign(x)==NEG) f(); } Ideally, the compiler would realize that g is equivalent to: if(x<0) f(); but on x86_64 I get with 4.6 (optimization -O3): xorpd %xmm1, %xmm1 ucomisd %xmm1, %xmm0 ja .L8 ucomisd %xmm0, %xmm1 jbe .L8 xorl %eax, %eax jmp f .L8: rep ret And with 4.7: xorpd %xmm1, %xmm1 ucomisd %xmm1, %xmm0 jbe .L12 .L9: rep ret .L12: ucomisd %xmm0, %xmm1 jbe .L9 xorl %eax, %eax jmp f Other compilers: clang 2.9 does the optimization (it has only one ucomisd and j*), but neither Intel 11.1 nor Oracle 5.12. In my application, this optimization has a 5% impact on the total runtime.
Confirmed. Something for PHI-OPT, recognize <bb 2>: if (x_2(D) > 0.0) goto <bb 5>; else goto <bb 3>; <bb 3>: if (x_2(D) < 0.0) goto <bb 4>; else goto <bb 5>; <bb 4>: <bb 5>: # D.1719_1 = PHI <1(2), -1(4), 0(3)> return D.1719_1; I suspect also worthwhile for integral types. Note that for real types you need -ffinite-math-only - I bet the clang result is wrong for NaNs. Btw, what's the optimal assembly you expect? Not sure what's the best way to represent this on the tree level either, so eventually we have to resort to RTL if-conversion (doesn't work there for integral types either).
(In reply to comment #1) > I suspect also worthwhile for integral types. Note that for real types > you need -ffinite-math-only - I bet the clang result is wrong for NaNs. I hadn't thought about it (this code could indeed use -ffinite-math-only), but it appears I am lucky, the code really is equivalent to if(x<0) f(); as claimed. Indeed, for a NaN, x>0 and x<0 are false, so sign returns ZERO which is not NEG. Comparing sign(x) to ZERO would indeed be different than x==0. On the other hand, ucomisd sets ZF to 1 for QNaN. clang's code appears to be right on all variations I tried. > Btw, what's the optimal assembly you expect? clang generates: pxor %xmm1, %xmm1 ucomisd %xmm0, %xmm1 ja .LBB1_2 ret .LBB1_2: xorb %al, %al jmp f # TAILCALL No idea if that's optimal (it also depends on which branch is most likely), but one pair of ucomisd+ja is certainly better than 2.
Note that if I replace: if(x<0) return NEG; with: if(!(x>=0)) return NEG; then ifcombine recognizes the pattern and optimizes it (the generated code is slightly different since that changes the behavior for NaN). No time to investigate right now, but I wanted to add this note to the PR.
Changing to tree-optimization (doing the optimization at RTL level would require finite-math-only). There is plenty of code that corresponds to A&&B and A||B, but (almost) nothing for A&&!B. Quite a big missing piece... <bb 2>: if (x_2(D) > 0.0) goto <bb 5>; else goto <bb 3>; <bb 3>: if (x_2(D) < 0.0) goto <bb 4>; else goto <bb 5>; The 2 conditions don't share the same then branch or the same else branch (it is a mix), so ifcombine doesn't even try to turn it into if (x_2(D) > 0.0 || !(x_2(D) < 0.0)) goto <bb 5>; else goto <bb 4>; Besides, it doesn't look like the logic is in place to fold that condition into just its second half (but I may have missed it).
The main difficulty is trapping math. It isn't hard to add to tree-ssa-ifcombine an ifcombine_ifnotandif variant, and we could make it call maybe_fold_and_comparisons (as ifcombine_ifandif does) with invert_tree_comparison (gimple_cond_code (outer_cond),...), but unless we pass -fno-trapping-math, it won't help. combine_comparisons (fold-const.c) has relevant code to determine whether the optimization is safe wrt trapping, but it does so with the semantic that UNLT_EXPR never traps, whereas I am (wrongfully) using it for TRUTH_NOT_EXPR(GE_EXPR). So I guess that means two more versions of maybe_fold_and_comparisons/maybe_fold_or_comparisons are needed :-(
Created attachment 27590 [details] incomplete patch I wonder if this is the right direction. At least it passes the testsuite and optimizes the original testcase.
Created attachment 27591 [details] incomplete patch The same with one less typo and more up-to-date comments.
Created attachment 27592 [details] patch Still needs testcases before it can be submitted.
Author: glisse Date: Mon Aug 6 16:38:48 2012 New Revision: 190184 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190184 Log: 2012-08-06 Marc Glisse <marc.glisse@inria.fr> gcc/ PR tree-optimization/51938 PR tree-optimization/52005 * tree-ssa-ifcombine.c (ifcombine_ifandif): New parameters for inverted conditions. (ifcombine_iforif): Remove, merge code into ifcombine_ifandif. (tree_ssa_ifcombine_bb): Update calls to the above. Detect !a&&b and !a||b patterns. gcc/testsuite/ PR tree-optimization/51938 PR tree-optimization/52005 * gcc.dg/tree-ssa/ssa-ifcombine-8.c: New testcase. * gcc.dg/tree-ssa/ssa-ifcombine-9.c: Likewise. * gcc.dg/tree-ssa/ssa-ifcombine-10.c: Likewise. * gcc.dg/tree-ssa/ssa-ifcombine-11.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c (with props) trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c (with props) trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c (with props) trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c (with props) Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-ifcombine.c Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c ('svn:keywords' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c ('svn:eol-style' added) Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c ('svn:keywords' added)
The testcase is now optimized with -fno-trapping-math. Ideally it should also be optimized with -ftrapping-math (which a first unapplied patch did), but that option needs too much work to fix all its issues. Thus closing (feel free to reopen if someone is interested in the trapping case).