This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Committed] Optimize clz(x)>>5 as x==0 (where appropriate)


The following small missed optimization patch teaches GCC's RTL
optimizers that on some platforms the (lshiftrt (clz x) C) idiom
is equivalent to (eq X 0) for targets with suitable definitions
of STORE_FLAG_VALUE, CLZ_DEFINED_AT_ZERO and C.

For the testcase,

int bar(int x)
{
  int t = __builtin_clz(x);
  return -(t>>5);
}

On powerpc, we currently generate four instructions

_bar:   cntlzw r3,r3
        srawi r3,r3,5
        neg r3,r3
        blr

with the patch below we now produce three

_bar:   addic r3,r3,-1
        subfe r3,r3,r3
        blr


Although this may seem like an obscure transformation, its related
to PR rtl-optimization/10588 which has been shown to potentially
have a significant performance benefit for GCC on PPC.


On possibly controversial aspect of the patch below, for folks not
familiar with fold-const.c and/or simplify-rtx.c, is the decision
to unfactor the SHIFT and ROTATE cases in simplify_binary_operation_1.
It turns out that complex control flow around these cases in fold and
simplify_rtx is the result of attempting to share one or two common
transformations.  In reality, each of these operators has a very
large number of unique applicable transformations, but the attempts
to factor "x << 0" and "y >>> 0" is akin to additionally unifying
"x + 0" and "x ^ 0", and simply creates complex control flow, numerous
gotos, and slows down the compiler by repeatedly testing an already
switched GET_CODE/TREE_CODE.  Its one aspect of the compiler where
paradoxically a little code duplication makes the code easier to
maintain.  Given that this was scheduled to be cleaned up in 4.3,
I took the liberty cleaning up early, rather apply a patch with an
explicit "code == LSHIFTRT", only to reorganize it in a few months.
See http://gcc.gnu.org/ml/gcc-patches/2006-05/msg01281.html


The following patch has been tested on powerpc-apple-darwin7.9.0
with a full "make bootstrap", all default languages, and regression
tested with a top-level "make -k check" with no new failures.

Committed to mainline as 114239.

This patch includes a new testcase to check that David Edelsohn's
patch for PR rtl-optimization/10588 works OK.  However, there's a
possibility that I may need to add a target specifier to catch
powerpc variants where this fails.  My apologies in advance for
any such temporary failure of the new test case.



2006-05-30  Roger Sayle  <roger@eyesopen.com>

	* simplify-rtx.c (simplify_binary_operation_1): Unfactor the shift
	and rotate cases.
	<LSHIFTRT>: Optimize (lshiftrt (clz X) C) as (eq X 0) where C is
	log2(GET_MODE_BITSIZE(X)) on targets with the appropriate semantics.

	* gcc.target/ppc-eq0-1.c: New test case.
	* gcc.target/ppc-negeq0-1.c: New test case.


Index: simplify-rtx.c
===================================================================
*** simplify-rtx.c	(revision 114163)
--- simplify-rtx.c	(working copy)
*************** simplify_binary_operation_1 (enum rtx_co
*** 2414,2434 ****
      case ROTATERT:
      case ROTATE:
      case ASHIFTRT:
        /* Rotating ~0 always results in ~0.  */
        if (GET_CODE (trueop0) == CONST_INT && width <= HOST_BITS_PER_WIDE_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (trueop0) == GET_MODE_MASK (mode)
  	  && ! side_effects_p (op1))
  	return op0;
!
!       /* Fall through....  */

      case ASHIFT:
      case SS_ASHIFT:
      case LSHIFTRT:
        if (trueop1 == CONST0_RTX (mode))
  	return op0;
        if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
  	return op0;
        break;

      case SMIN:
--- 2414,2458 ----
      case ROTATERT:
      case ROTATE:
      case ASHIFTRT:
+       if (trueop1 == CONST0_RTX (mode))
+ 	return op0;
+       if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
+ 	return op0;
        /* Rotating ~0 always results in ~0.  */
        if (GET_CODE (trueop0) == CONST_INT && width <= HOST_BITS_PER_WIDE_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (trueop0) == GET_MODE_MASK (mode)
  	  && ! side_effects_p (op1))
  	return op0;
!       break;

      case ASHIFT:
      case SS_ASHIFT:
+       if (trueop1 == CONST0_RTX (mode))
+ 	return op0;
+       if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
+ 	return op0;
+       break;
+
      case LSHIFTRT:
        if (trueop1 == CONST0_RTX (mode))
  	return op0;
        if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
  	return op0;
+       /* Optimize (lshiftrt (clz X) C) as (eq X 0).  */
+       if (GET_CODE (op0) == CLZ
+ 	  && GET_CODE (trueop1) == CONST_INT
+ 	  && STORE_FLAG_VALUE == 1
+ 	  && INTVAL (trueop1) < width)
+ 	{
+ 	  enum machine_mode imode = GET_MODE (XEXP (op0, 0));
+ 	  unsigned HOST_WIDE_INT zero_val = 0;
+
+ 	  if (CLZ_DEFINED_VALUE_AT_ZERO (imode, zero_val)
+ 	      && zero_val == GET_MODE_BITSIZE (imode)
+ 	      && INTVAL (trueop1) == exact_log2 (zero_val))
+ 	    return simplify_gen_relational (EQ, mode, imode,
+ 					    XEXP (op0, 0), const0_rtx);
+ 	}
        break;

      case SMIN:


/* PR rtl-optimization/10588 */
/* { dg-do compile } */
/* { dg-options "-O2" } */

int foo(int x)
{
  return x == 0;
}

/* { dg-final { scan-assembler "cntlzw" } } */


/* { dg-do compile } */
/* { dg-options "-O2" } */

int foo(int x)
{
  return -(x == 0);
}

int bar(int x)
{
  int t = __builtin_clz(x);
  return -(t>>5);
}

/* { dg-final { scan-assembler-not "cntlzw" } } */

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]