This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Committed] Optimize clz(x)>>5 as x==0 (where appropriate)
- From: Roger Sayle <roger at eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 30 May 2006 09:15:37 -0600 (MDT)
- Subject: [Committed] Optimize clz(x)>>5 as x==0 (where appropriate)
The following small missed optimization patch teaches GCC's RTL
optimizers that on some platforms the (lshiftrt (clz x) C) idiom
is equivalent to (eq X 0) for targets with suitable definitions
of STORE_FLAG_VALUE, CLZ_DEFINED_AT_ZERO and C.
For the testcase,
int bar(int x)
{
int t = __builtin_clz(x);
return -(t>>5);
}
On powerpc, we currently generate four instructions
_bar: cntlzw r3,r3
srawi r3,r3,5
neg r3,r3
blr
with the patch below we now produce three
_bar: addic r3,r3,-1
subfe r3,r3,r3
blr
Although this may seem like an obscure transformation, its related
to PR rtl-optimization/10588 which has been shown to potentially
have a significant performance benefit for GCC on PPC.
On possibly controversial aspect of the patch below, for folks not
familiar with fold-const.c and/or simplify-rtx.c, is the decision
to unfactor the SHIFT and ROTATE cases in simplify_binary_operation_1.
It turns out that complex control flow around these cases in fold and
simplify_rtx is the result of attempting to share one or two common
transformations. In reality, each of these operators has a very
large number of unique applicable transformations, but the attempts
to factor "x << 0" and "y >>> 0" is akin to additionally unifying
"x + 0" and "x ^ 0", and simply creates complex control flow, numerous
gotos, and slows down the compiler by repeatedly testing an already
switched GET_CODE/TREE_CODE. Its one aspect of the compiler where
paradoxically a little code duplication makes the code easier to
maintain. Given that this was scheduled to be cleaned up in 4.3,
I took the liberty cleaning up early, rather apply a patch with an
explicit "code == LSHIFTRT", only to reorganize it in a few months.
See http://gcc.gnu.org/ml/gcc-patches/2006-05/msg01281.html
The following patch has been tested on powerpc-apple-darwin7.9.0
with a full "make bootstrap", all default languages, and regression
tested with a top-level "make -k check" with no new failures.
Committed to mainline as 114239.
This patch includes a new testcase to check that David Edelsohn's
patch for PR rtl-optimization/10588 works OK. However, there's a
possibility that I may need to add a target specifier to catch
powerpc variants where this fails. My apologies in advance for
any such temporary failure of the new test case.
2006-05-30 Roger Sayle <roger@eyesopen.com>
* simplify-rtx.c (simplify_binary_operation_1): Unfactor the shift
and rotate cases.
<LSHIFTRT>: Optimize (lshiftrt (clz X) C) as (eq X 0) where C is
log2(GET_MODE_BITSIZE(X)) on targets with the appropriate semantics.
* gcc.target/ppc-eq0-1.c: New test case.
* gcc.target/ppc-negeq0-1.c: New test case.
Index: simplify-rtx.c
===================================================================
*** simplify-rtx.c (revision 114163)
--- simplify-rtx.c (working copy)
*************** simplify_binary_operation_1 (enum rtx_co
*** 2414,2434 ****
case ROTATERT:
case ROTATE:
case ASHIFTRT:
/* Rotating ~0 always results in ~0. */
if (GET_CODE (trueop0) == CONST_INT && width <= HOST_BITS_PER_WIDE_INT
&& (unsigned HOST_WIDE_INT) INTVAL (trueop0) == GET_MODE_MASK (mode)
&& ! side_effects_p (op1))
return op0;
!
! /* Fall through.... */
case ASHIFT:
case SS_ASHIFT:
case LSHIFTRT:
if (trueop1 == CONST0_RTX (mode))
return op0;
if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
return op0;
break;
case SMIN:
--- 2414,2458 ----
case ROTATERT:
case ROTATE:
case ASHIFTRT:
+ if (trueop1 == CONST0_RTX (mode))
+ return op0;
+ if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
+ return op0;
/* Rotating ~0 always results in ~0. */
if (GET_CODE (trueop0) == CONST_INT && width <= HOST_BITS_PER_WIDE_INT
&& (unsigned HOST_WIDE_INT) INTVAL (trueop0) == GET_MODE_MASK (mode)
&& ! side_effects_p (op1))
return op0;
! break;
case ASHIFT:
case SS_ASHIFT:
+ if (trueop1 == CONST0_RTX (mode))
+ return op0;
+ if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
+ return op0;
+ break;
+
case LSHIFTRT:
if (trueop1 == CONST0_RTX (mode))
return op0;
if (trueop0 == CONST0_RTX (mode) && ! side_effects_p (op1))
return op0;
+ /* Optimize (lshiftrt (clz X) C) as (eq X 0). */
+ if (GET_CODE (op0) == CLZ
+ && GET_CODE (trueop1) == CONST_INT
+ && STORE_FLAG_VALUE == 1
+ && INTVAL (trueop1) < width)
+ {
+ enum machine_mode imode = GET_MODE (XEXP (op0, 0));
+ unsigned HOST_WIDE_INT zero_val = 0;
+
+ if (CLZ_DEFINED_VALUE_AT_ZERO (imode, zero_val)
+ && zero_val == GET_MODE_BITSIZE (imode)
+ && INTVAL (trueop1) == exact_log2 (zero_val))
+ return simplify_gen_relational (EQ, mode, imode,
+ XEXP (op0, 0), const0_rtx);
+ }
break;
case SMIN:
/* PR rtl-optimization/10588 */
/* { dg-do compile } */
/* { dg-options "-O2" } */
int foo(int x)
{
return x == 0;
}
/* { dg-final { scan-assembler "cntlzw" } } */
/* { dg-do compile } */
/* { dg-options "-O2" } */
int foo(int x)
{
return -(x == 0);
}
int bar(int x)
{
int t = __builtin_clz(x);
return -(t>>5);
}
/* { dg-final { scan-assembler-not "cntlzw" } } */
Roger
--