Bug 51938 - missed optimization: 2 comparisons
Summary: missed optimization: 2 comparisons
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2012-01-21 22:34 UTC by Marc Glisse
Modified: 2012-08-07 10:55 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-01-23 00:00:00


Attachments
incomplete patch (4.92 KB, patch)
2012-06-08 19:49 UTC, Marc Glisse
Details | Diff
incomplete patch (4.92 KB, patch)
2012-06-08 20:03 UTC, Marc Glisse
Details | Diff
patch (5.17 KB, patch)
2012-06-09 21:36 UTC, Marc Glisse
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2012-01-21 22:34:32 UTC
Hello,

this is possibly related to bug 28685, or to the bugs about the tracking of flag register values.

void f();
enum Sign { NEG=-1, ZERO, POS };
enum Sign sign(double x){
	if(x>0) return POS;
	if(x<0) return NEG;
	return ZERO;
}
void g(double x){
	if(sign(x)==NEG) f();
}


Ideally, the compiler would realize that g is equivalent to:
if(x<0) f();

but on x86_64 I get with 4.6 (optimization -O3):

	xorpd	%xmm1, %xmm1
	ucomisd	%xmm1, %xmm0
	ja	.L8
	ucomisd	%xmm0, %xmm1
	jbe	.L8
	xorl	%eax, %eax
	jmp	f
.L8:
	rep
	ret

And with 4.7:

	xorpd	%xmm1, %xmm1
	ucomisd	%xmm1, %xmm0
	jbe	.L12
.L9:
	rep
	ret
.L12:
	ucomisd	%xmm0, %xmm1
	jbe	.L9
	xorl	%eax, %eax
	jmp	f

Other compilers: clang 2.9 does the optimization (it has only one ucomisd and j*), but neither Intel 11.1 nor Oracle 5.12.

In my application, this optimization has a 5% impact on the total runtime.
Comment 1 Richard Biener 2012-01-23 10:23:31 UTC
Confirmed.

Something for PHI-OPT, recognize

<bb 2>:
  if (x_2(D) > 0.0)
    goto <bb 5>;
  else
    goto <bb 3>;

<bb 3>:
  if (x_2(D) < 0.0)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:

<bb 5>:
  # D.1719_1 = PHI <1(2), -1(4), 0(3)>
  return D.1719_1;

I suspect also worthwhile for integral types.  Note that for real types
you need -ffinite-math-only - I bet the clang result is wrong for NaNs.

Btw, what's the optimal assembly you expect?

Not sure what's the best way to represent this on the tree level either,
so eventually we have to resort to RTL if-conversion (doesn't work
there for integral types either).
Comment 2 Marc Glisse 2012-01-23 12:51:42 UTC
(In reply to comment #1)
> I suspect also worthwhile for integral types.  Note that for real types
> you need -ffinite-math-only - I bet the clang result is wrong for NaNs.

I hadn't thought about it (this code could indeed use -ffinite-math-only), but it appears I am lucky, the code really is equivalent to if(x<0) f(); as claimed. Indeed, for a NaN, x>0 and x<0 are false, so sign returns ZERO which is not NEG. Comparing sign(x) to ZERO would indeed be different than x==0. On the other hand, ucomisd sets ZF to 1 for QNaN. clang's code appears to be right on all variations I tried.

> Btw, what's the optimal assembly you expect?

clang generates:

	pxor	%xmm1, %xmm1
	ucomisd	%xmm0, %xmm1
	ja	.LBB1_2
	ret
.LBB1_2:
	xorb	%al, %al
	jmp	f                       # TAILCALL

No idea if that's optimal (it also depends on which branch is most likely), but one pair of ucomisd+ja is certainly better than 2.
Comment 3 Marc Glisse 2012-06-06 21:22:59 UTC
Note that if I replace:
    if(x<0) return NEG;
with:
    if(!(x>=0)) return NEG;
then ifcombine recognizes the pattern and optimizes it (the generated code is slightly different since that changes the behavior for NaN).

No time to investigate right now, but I wanted to add this note to the PR.
Comment 4 Marc Glisse 2012-06-07 14:54:02 UTC
Changing to tree-optimization (doing the optimization at RTL level would require finite-math-only).

There is plenty of code that corresponds to A&&B and A||B, but (almost) nothing for A&&!B. Quite a big missing piece...

<bb 2>:
  if (x_2(D) > 0.0)
    goto <bb 5>;
  else
    goto <bb 3>;

<bb 3>:
  if (x_2(D) < 0.0)
    goto <bb 4>;
  else
    goto <bb 5>;

The 2 conditions don't share the same then branch or the same else branch (it is a mix), so ifcombine doesn't even try to turn it into

  if (x_2(D) > 0.0 || !(x_2(D) < 0.0))
    goto <bb 5>;
  else
    goto <bb 4>;

Besides, it doesn't look like the logic is in place to fold that condition into just its second half (but I may have missed it).
Comment 5 Marc Glisse 2012-06-08 13:06:52 UTC
The main difficulty is trapping math. It isn't hard to add to tree-ssa-ifcombine an ifcombine_ifnotandif variant, and we could make it call maybe_fold_and_comparisons (as ifcombine_ifandif does) with invert_tree_comparison (gimple_cond_code (outer_cond),...), but unless we pass -fno-trapping-math, it won't help.

combine_comparisons (fold-const.c) has relevant code to determine whether the optimization is safe wrt trapping, but it does so with the semantic that UNLT_EXPR never traps, whereas I am (wrongfully) using it for TRUTH_NOT_EXPR(GE_EXPR). So I guess that means two more versions of maybe_fold_and_comparisons/maybe_fold_or_comparisons are needed :-(
Comment 6 Marc Glisse 2012-06-08 19:49:07 UTC
Created attachment 27590 [details]
incomplete patch

I wonder if this is the right direction. At least it passes the testsuite and optimizes the original testcase.
Comment 7 Marc Glisse 2012-06-08 20:03:35 UTC
Created attachment 27591 [details]
incomplete patch

The same with one less typo and more up-to-date comments.
Comment 8 Marc Glisse 2012-06-09 21:36:04 UTC
Created attachment 27592 [details]
patch

Still needs testcases before it can be submitted.
Comment 9 Marc Glisse 2012-08-06 16:38:52 UTC
Author: glisse
Date: Mon Aug  6 16:38:48 2012
New Revision: 190184

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190184
Log:
2012-08-06 Marc Glisse <marc.glisse@inria.fr>

gcc/
	PR tree-optimization/51938
	PR tree-optimization/52005
	* tree-ssa-ifcombine.c (ifcombine_ifandif): New parameters for
	inverted conditions.
	(ifcombine_iforif): Remove, merge code into ifcombine_ifandif.
	(tree_ssa_ifcombine_bb): Update calls to the above. Detect !a&&b
	and !a||b patterns.

gcc/testsuite/
	PR tree-optimization/51938
	PR tree-optimization/52005
	* gcc.dg/tree-ssa/ssa-ifcombine-8.c: New testcase.
	* gcc.dg/tree-ssa/ssa-ifcombine-9.c: Likewise.
	* gcc.dg/tree-ssa/ssa-ifcombine-10.c: Likewise.
	* gcc.dg/tree-ssa/ssa-ifcombine-11.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c   (with props)
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c   (with props)
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c   (with props)
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c   (with props)
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-ifcombine.c

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-8.c
            ('svn:keywords' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-9.c
            ('svn:keywords' added)
Comment 10 Marc Glisse 2012-08-06 16:53:19 UTC
The testcase is now optimized with -fno-trapping-math. Ideally it should also be optimized with -ftrapping-math (which a first unapplied patch did), but that option needs too much work to fix all its issues.

Thus closing (feel free to reopen if someone is interested in the trapping case).