This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding


Jeff Law writes:

>> +	    && ! unsignedp
> Don't you need to check that the conversion is actually a sign 
> extension.  Oh, you're relying on the signedness of ops->type.  That 
> should be sufficient.

  Exactly.

>> +		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
>> +		    && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode)
>> +		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
>> +			>= GET_MODE_BITSIZE (word_mode)))
> I keep having trouble with this.  I think the key to why this works is 
> you don't actually use the source of the conversion, but instead the 
> destination of the conversion (held in op0).

  Yes, my thought is given any narrower type, the

  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);

  Will do necesary sign extension, then finally want we get from op0 is
  a "2 x word_mode_size" register whose high part contains duplication
  of sign bit, and low part contains the source of the conversion if the
  narrower mode is word_mode_size, or contains sign extened source, but
  either way, the low part can be used as our new shift operand. And the
  high part of op0 generated from above expand_expr is actually dead
  which will be removed by later optimization pass.

>
> So this is OK for the trunk with the typo & whitespace nits fixed.

During my final testing of the patch on arm, mips32, mips64, powerpc32,
powerpc64 I do found two new issues:

  * As you have mentioned, this patch do have endianness issue!
    Although the new testcases (wide-shift-128/64.c) under gcc.dg passed
    my manual powerpc32/powerpc64 test which check combine pass rtl
    dump, but the generated assembly looks weird to me, then I found
    simplify_gen_subreg itself is not endianness safe. We need to use
    subreg_highpart/lowpart_offset to get the offset.    

    So I done minor modifications on the patch by using lowpart_reg and
    subreg_highpart_offset I think it's endianess safe now.

    And the instruction sequences generated for powerpc, for example
    powerpc32 looks sane to me now:

    long long load1 (int data) { return (long long) data << 12;}

    load1-before-patch:
	srawi 9,3,31
	mr 4,3
	srwi 3,3,20
	slwi 4,4,12
	rlwimi 3,9,12,0,31-12
	blr

    load1-after-patch:
	mr 4,3
	srawi 3,3,20
	slwi 4,4,12
	blr

   * The other issue is still what you have noticed. For some targets,
     for example arm, there is backend define_expand for double word
     left shift, then because my patch will only do the tranformation
     when "! have_insn_for (ASHIFT, mode)", thus this transformation
     will not happen on these targets.

     While I belive this transformation do generate better instruction
     sequences for some case. As backend don't know high level type
     conversion information, those backends expand logic like
     arm_emit_coreregs_64bit_shift can only assume the shift operand
     may come from any cases thus the expand logic will be very generic
     thus not optimal.

     We should compare the cost to decide whether to go with this
     transformation or use the backend's customized expand logic.

     But anyway, this patch will not cause regression. We need another
     seperate patch to implement the cost comparision then we can let
     those targets benefit from this transformation.

     Thus I removed the arm target testcase. (I recall arm testcase is
     in this patch because my original local patch don't contain
     the "have_insn_for" check.)

 Thanks very much for the review!
 
 Committed attached patch (above minor issues I treat them as obivious)
 as 227018.
 
-- 
Regards,
Jiong

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 227017)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2015-08-19  Jiong Wang  <jiong.wang@arm.com>
+
+	* expr.c (expand_expr_real_2): Check gimple statement during
+	LSHIFT_EXPR expand.
+
 2015-08-19  Magnus Granberg  <zorry@gentoo.org>
 
 	* common.opt (fstack-protector): Initialize to -1.
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 227017)
+++ gcc/expr.c	(working copy)
@@ -8836,24 +8836,113 @@
 
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
-      /* If this is a fixed-point operation, then we cannot use the code
-	 below because "expand_shift" doesn't support sat/no-sat fixed-point
-         shifts.   */
-      if (ALL_FIXED_POINT_MODE_P (mode))
-	goto binop;
+      {
+	/* If this is a fixed-point operation, then we cannot use the code
+	   below because "expand_shift" doesn't support sat/no-sat fixed-point
+	   shifts.  */
+	if (ALL_FIXED_POINT_MODE_P (mode))
+	  goto binop;
 
-      if (! safe_from_p (subtarget, treeop1, 1))
-	subtarget = 0;
-      if (modifier == EXPAND_STACK_PARM)
-	target = 0;
-      op0 = expand_expr (treeop0, subtarget,
-			 VOIDmode, EXPAND_NORMAL);
-      temp = expand_variable_shift (code, mode, op0, treeop1, target,
-				    unsignedp);
-      if (code == LSHIFT_EXPR)
-	temp = REDUCE_BIT_FIELD (temp);
-      return temp;
+	if (! safe_from_p (subtarget, treeop1, 1))
+	  subtarget = 0;
+	if (modifier == EXPAND_STACK_PARM)
+	  target = 0;
+	op0 = expand_expr (treeop0, subtarget,
+			   VOIDmode, EXPAND_NORMAL);
 
+	/* Left shift optimization when shifting across word_size boundary.
+
+	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
+	   native instruction to support this wide mode left shift.  Given below
+	   scenario:
+
+	    Type A = (Type) B  << C
+
+	    |<		 T	    >|
+	    | dest_high  |  dest_low |
+
+			 | word_size |
+
+	   If the shift amount C caused we shift B to across the word size
+	   boundary, i.e part of B shifted into high half of destination
+	   register, and part of B remains in the low half, then GCC will use
+	   the following left shift expand logic:
+
+	   1. Initialize dest_low to B.
+	   2. Initialize every bit of dest_high to the sign bit of B.
+	   3. Logic left shift dest_low by C bit to finalize dest_low.
+	      The value of dest_low before this shift is kept in a temp D.
+	   4. Logic left shift dest_high by C.
+	   5. Logic right shift D by (word_size - C).
+	   6. Or the result of 4 and 5 to finalize dest_high.
+
+	   While, by checking gimple statements, if operand B is coming from
+	   signed extension, then we can simplify above expand logic into:
+
+	      1. dest_high = src_low >> (word_size - C).
+	      2. dest_low = src_low << C.
+
+	   We can use one arithmetic right shift to finish all the purpose of
+	   steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2.  */
+
+	temp = NULL_RTX;
+	if (code == LSHIFT_EXPR
+	    && target
+	    && REG_P (target)
+	    && ! unsignedp
+	    && mode == GET_MODE_WIDER_MODE (word_mode)
+	    && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
+	    && ! have_insn_for (ASHIFT, mode)
+	    && TREE_CONSTANT (treeop1)
+	    && TREE_CODE (treeop0) == SSA_NAME)
+	  {
+	    gimple def = SSA_NAME_DEF_STMT (treeop0);
+	    if (is_gimple_assign (def)
+		&& gimple_assign_rhs_code (def) == NOP_EXPR)
+	      {
+		machine_mode rmode = TYPE_MODE
+		  (TREE_TYPE (gimple_assign_rhs1 (def)));
+
+		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
+		    && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode)
+		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
+			>= GET_MODE_BITSIZE (word_mode)))
+		  {
+		    unsigned int high_off = subreg_highpart_offset (word_mode,
+								    mode);
+		    rtx low = lowpart_subreg (word_mode, op0, mode);
+		    rtx dest_low = lowpart_subreg (word_mode, target, mode);
+		    rtx dest_high = simplify_gen_subreg (word_mode, target,
+							 mode, high_off);
+		    HOST_WIDE_INT ramount = (BITS_PER_WORD
+					     - TREE_INT_CST_LOW (treeop1));
+		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
+
+		    /* dest_high = src_low >> (word_size - C).  */
+		    temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
+						  rshift, dest_high, unsignedp);
+		    if (temp != dest_high)
+		      emit_move_insn (dest_high, temp);
+
+		    /* dest_low = src_low << C.  */
+		    temp = expand_variable_shift (LSHIFT_EXPR, word_mode, low,
+						  treeop1, dest_low, unsignedp);
+		    if (temp != dest_low)
+		      emit_move_insn (dest_low, temp);
+
+		    temp = target ;
+		  }
+	      }
+	  }
+
+	if (temp == NULL_RTX)
+	  temp = expand_variable_shift (code, mode, op0, treeop1, target,
+					unsignedp);
+	if (code == LSHIFT_EXPR)
+	  temp = REDUCE_BIT_FIELD (temp);
+	return temp;
+      }
+
       /* Could determine the answer when only additive constants differ.  Also,
 	 the addition of one can be handled by changing the condition.  */
     case LT_EXPR:
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(revision 227017)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2015-08-19  Jiong Wang  <jiong.wang@arm.com>
+
+	* gcc.dg/wide_shift_64_1.c: New testcase.
+	* gcc.dg/wide_shift_128_1.c: Likewise.
+	* gcc.target/aarch64/ashlti3_1.c: Likewise.
+
 2015-08-19  Magnus Granberg  <zorry@gentoo.org>
 
 	* lib/target-supports.exp
Index: gcc/testsuite/gcc.dg/wide-shift-128.c
===================================================================
--- gcc/testsuite/gcc.dg/wide-shift-128.c	(revision 0)
+++ gcc/testsuite/gcc.dg/wide-shift-128.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+__int128_t
+load2 (int data)
+{
+    return (__int128_t) data << 50;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
Index: gcc/testsuite/gcc.dg/wide-shift-64.c
===================================================================
--- gcc/testsuite/gcc.dg/wide-shift-64.c	(revision 0)
+++ gcc/testsuite/gcc.dg/wide-shift-64.c	(working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target mips*-*-* sparc*-*-* } } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+long long
+load1 (int data)
+{
+    return (long long) data << 12;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
Index: gcc/testsuite/gcc.target/aarch64/ashltidisi.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/ashltidisi.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ashltidisi.c	(working copy)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x, y, z)\
+__uint128_t __attribute__ ((noinline))\
+ushift_##x##_##z (unsigned y data)\
+{\
+  return (__uint128_t) data << x;\
+}\
+__int128_t __attribute__ ((noinline)) \
+shift_##x##_##z (y data) \
+{\
+  return (__int128_t) data << x;\
+}
+
+GEN_TEST_CASE (53, int, i)
+GEN_TEST_CASE (3, long long, ll)
+GEN_TEST_CASE (13, long long, ll)
+GEN_TEST_CASE (53, long long, ll)
+
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y, z, p) \
+	if (ushift_##y##_##p (x)\
+	    != ((__uint128_t) (unsigned z) x << y)) \
+	  abort ();\
+	if (shift_##y##_##p (x)\
+	    != ((__uint128_t) (signed z) x << y)) \
+	  abort ();
+
+  SHIFT_CHECK (0x12345678, 53, int, i)
+  SHIFT_CHECK (0xcafecafe, 53, int, i)
+
+  SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll)
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 4 } } */
+/* { dg-final { scan-assembler-not "extr\t" } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]