Bug 21137 - Convert (a >> 2) & 1 != 0 into a & 4 != 0
Summary: Convert (a >> 2) & 1 != 0 into a & 4 != 0
Status: REOPENED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: unknown
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, TREE
Depends on: 29726
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-21 00:47 UTC by Kazu Hirata
Modified: 2016-09-03 21:30 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.3.0
Last reconfirmed: 2016-09-03 00:00:00


Attachments
Patch (647 bytes, patch)
2005-07-11 04:24 UTC, James A. Morrison
Details | Diff
testcase (223 bytes, text/plain)
2005-07-11 04:30 UTC, James A. Morrison
Details
Patch to fix testcase when int isn't exactly 32 bits (336 bytes, patch)
2007-07-24 15:00 UTC, Rask Ingemann Lambertsen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kazu Hirata 2005-04-21 00:47:02 UTC
At tree level, we would like to canonicalize (a >> 2) & 1 != 0 into a & 4 != 0.

Currently,

void bar (void);

void
foo (int a)
{
  if ((a >> 2) & 1)
    bar ();
}

turns into

foo (a)
{
  int D.1235;
  int D.1236;
  _Bool D.1237;

  D.1235 = a >> 2;
  D.1236 = D.1235 & 1;
  D.1237 = (_Bool) D.1236;
  if (D.1237)
    {
      bar ();
    }
  else
    {
      
    }
}
Comment 1 Andrew Pinski 2005-04-21 00:50:52 UTC
Confirmed.
Comment 2 James A. Morrison 2005-07-11 03:56:56 UTC
 This should actually be "Convert (a >> c1) & 2**c2" to a & (c2 << c1)".
Comment 3 James A. Morrison 2005-07-11 04:24:57 UTC
Created attachment 9242 [details]
Patch
Comment 4 James A. Morrison 2005-07-11 04:30:52 UTC
Created attachment 9243 [details]
testcase

 Passes with make check-gcc RUNTESTFLAGS=gcc.dg/dg.exp
Comment 5 roger 2005-08-14 19:17:37 UTC
Hi James,

Unfortunately, there are a few mistakes in your proposed patch for PR21137.

Firstly Kazu's proposed transformation is only valid when the results of the
bitwise-AND are being tested for equality or inequality with zero.  i.e. its
safe to transform
  "((a >> 2) & 1) != 0"  into  "(a & 4) != 0"
but not
  "x = (a >> 2) & 1;"  into "x = (a & 4)".

Your patch is in the general fold path for BIT_AND_EXPR, so you'll transform
both.  It's surprising there are no testsuite checks for the second example
above; it might be worth adding them to prevent anyone making a similar mistake
in future.

Secondly, this transformation is only valid is c1 + c2 < TYPE_PRECISION(type).
Consider the following code:

      signed char c;
      if ((c >> 6) & 64) ...

this is not equivalent to

      if (c & (char)(64<<6)) ...
i.e.  if (c & (char)4096) ...
i.e.  if (c & 0) ...
i.e.  if (0)

Of course, when c1+c2 >= TYPE_PRECISION(type), there are two additional
optimizations that can be performed.  If TYPE_UNSIGNED(type) the result
is always false, and if !TYPE_UNSIGNED(type), the condition is equivalent
to "a < 0".  So in the example of mine above, optimization should produce:
       if (c < 0) ...

Finally, in your patch, you use "break", if the transformation is invalid.
This isn't really the correct "idiom/style" for fold, where if the guard
for a transformation fails, you shouldn't drop out of the switch, but instead
continue onto the following/next transformation "in the list".
So instead of "if (!guard) break; return transform();", this optimization
should be written as "if (guard) return transform();".  I haven't looked
for other examples for "break" in fold_unary/fold_binary/fold_ternary, but
if there are any, they're probably (latent) missed-optimization bugs.

Other than that the patch looks good.  Thanks for looking into this.
Comment 6 James A. Morrison 2005-08-14 21:56:52 UTC
 Actually, doing (a >> c1) & 2**c2 -> (a & (1<<c1+c2)) >> c1 looks worse, but
can be transformed into (a & (1<<c1+c2)) CMP c3 << c1.  However, I haven't
figure out the details of the second transformation.  There is a bug about it
though.
Comment 7 Roger Sayle 2006-02-26 15:36:56 UTC
Subject: Bug 21137

Author: sayle
Date: Sun Feb 26 15:36:52 2006
New Revision: 111453

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111453
Log:
2006-02-26  Roger Sayle  <roger@eyesopen.com>
	    James A. Morrison  <phython@gcc.gnu.org>

	PR middle-end/21137
	* fold-const.c (fold_binary) <EQ_EXPR>:  Fold ((X>>C1)&C2) eq/ne 0,
	when C2 is a power of two, as either (X&(C2<<C1)) eq/ne 0 if the
	new constant C2<<C1, or as (X<0) or (X,false) depending upon the
	signedness of the shift operation.

	* gcc.dg/fold-eqandshift-1.c: New test case.


Added:
    trunk/gcc/testsuite/gcc.dg/fold-eqandshift-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog

Comment 8 roger 2006-02-26 18:39:04 UTC
Fixed on mainline.  I'm still investiating some related optimizations.
Comment 9 Rask Ingemann Lambertsen 2007-07-24 14:56:53 UTC
This is not fully fixed yet. Compile these two functions with -O2 -fdump-tree-original:

void test5_1(int e)
{
  if ((e >> 31) & 64)
    foo();
}

typedef int myint;

void test5_2(myint e)
{
  if ((e >> 31) & 64)
    foo();
}

On x86_64-unknown-linux-gnu, I get (4.2.0, 4.2.1 and mainline):

;; Function test5_1 (test5_1)
;; enabled by -tree-original


{
  if (e < 0)
    {
      foo ();
    }
}



;; Function test5_2 (test5_2)
;; enabled by -tree-original

{
  if (((int) (e >> 31) & 64) != 0)
    {
      foo ();
    }
}


We should get identical dumps for the two functions. I noticed this problem
while trying to fix the original testcase to work targets where an int is
not exactly 32 bits wide. The attached patch to fix the testcase revealed
the problem.
Comment 10 Rask Ingemann Lambertsen 2007-07-24 15:00:27 UTC
Created attachment 13959 [details]
Patch to fix testcase when int isn't exactly 32 bits
Comment 11 Rask Ingemann Lambertsen 2007-07-26 09:33:49 UTC
Reopening since this was only partially fixed.