This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
- From: Jiong Wang <jiong dot wang at arm dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: "gcc-patches\ at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 19 Aug 2015 22:57:07 +0100
- Subject: Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
- Authentication-results: sourceware.org; auth=none
- References: <n99sic0uyf2 dot fsf at arm dot com> <5539A922 dot 7060708 at redhat dot com> <n99pp6pwaob dot fsf at arm dot com> <554054E9 dot 4030400 at redhat dot com> <n99vbgebnrr dot fsf at arm dot com> <55415C2E dot 9010001 at redhat dot com> <n99oai9epez dot fsf at arm dot com> <55CE4E7A dot 8000603 at redhat dot com> <n99zj1twngh dot fsf at arm dot com> <n99614cagfa dot fsf at arm dot com> <55D36E59 dot 8090609 at redhat dot com>
Jeff Law writes:
>> + && ! unsignedp
> Don't you need to check that the conversion is actually a sign
> extension. Oh, you're relying on the signedness of ops->type. That
> should be sufficient.
Exactly.
>> + if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
>> + && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode)
>> + && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
>> + >= GET_MODE_BITSIZE (word_mode)))
> I keep having trouble with this. I think the key to why this works is
> you don't actually use the source of the conversion, but instead the
> destination of the conversion (held in op0).
Yes, my thought is given any narrower type, the
op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
Will do necesary sign extension, then finally want we get from op0 is
a "2 x word_mode_size" register whose high part contains duplication
of sign bit, and low part contains the source of the conversion if the
narrower mode is word_mode_size, or contains sign extened source, but
either way, the low part can be used as our new shift operand. And the
high part of op0 generated from above expand_expr is actually dead
which will be removed by later optimization pass.
>
> So this is OK for the trunk with the typo & whitespace nits fixed.
During my final testing of the patch on arm, mips32, mips64, powerpc32,
powerpc64 I do found two new issues:
* As you have mentioned, this patch do have endianness issue!
Although the new testcases (wide-shift-128/64.c) under gcc.dg passed
my manual powerpc32/powerpc64 test which check combine pass rtl
dump, but the generated assembly looks weird to me, then I found
simplify_gen_subreg itself is not endianness safe. We need to use
subreg_highpart/lowpart_offset to get the offset.
So I done minor modifications on the patch by using lowpart_reg and
subreg_highpart_offset I think it's endianess safe now.
And the instruction sequences generated for powerpc, for example
powerpc32 looks sane to me now:
long long load1 (int data) { return (long long) data << 12;}
load1-before-patch:
srawi 9,3,31
mr 4,3
srwi 3,3,20
slwi 4,4,12
rlwimi 3,9,12,0,31-12
blr
load1-after-patch:
mr 4,3
srawi 3,3,20
slwi 4,4,12
blr
* The other issue is still what you have noticed. For some targets,
for example arm, there is backend define_expand for double word
left shift, then because my patch will only do the tranformation
when "! have_insn_for (ASHIFT, mode)", thus this transformation
will not happen on these targets.
While I belive this transformation do generate better instruction
sequences for some case. As backend don't know high level type
conversion information, those backends expand logic like
arm_emit_coreregs_64bit_shift can only assume the shift operand
may come from any cases thus the expand logic will be very generic
thus not optimal.
We should compare the cost to decide whether to go with this
transformation or use the backend's customized expand logic.
But anyway, this patch will not cause regression. We need another
seperate patch to implement the cost comparision then we can let
those targets benefit from this transformation.
Thus I removed the arm target testcase. (I recall arm testcase is
in this patch because my original local patch don't contain
the "have_insn_for" check.)
Thanks very much for the review!
Committed attached patch (above minor issues I treat them as obivious)
as 227018.
--
Regards,
Jiong
Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog (revision 227017)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2015-08-19 Jiong Wang <jiong.wang@arm.com>
+
+ * expr.c (expand_expr_real_2): Check gimple statement during
+ LSHIFT_EXPR expand.
+
2015-08-19 Magnus Granberg <zorry@gentoo.org>
* common.opt (fstack-protector): Initialize to -1.
Index: gcc/expr.c
===================================================================
--- gcc/expr.c (revision 227017)
+++ gcc/expr.c (working copy)
@@ -8836,24 +8836,113 @@
case LSHIFT_EXPR:
case RSHIFT_EXPR:
- /* If this is a fixed-point operation, then we cannot use the code
- below because "expand_shift" doesn't support sat/no-sat fixed-point
- shifts. */
- if (ALL_FIXED_POINT_MODE_P (mode))
- goto binop;
+ {
+ /* If this is a fixed-point operation, then we cannot use the code
+ below because "expand_shift" doesn't support sat/no-sat fixed-point
+ shifts. */
+ if (ALL_FIXED_POINT_MODE_P (mode))
+ goto binop;
- if (! safe_from_p (subtarget, treeop1, 1))
- subtarget = 0;
- if (modifier == EXPAND_STACK_PARM)
- target = 0;
- op0 = expand_expr (treeop0, subtarget,
- VOIDmode, EXPAND_NORMAL);
- temp = expand_variable_shift (code, mode, op0, treeop1, target,
- unsignedp);
- if (code == LSHIFT_EXPR)
- temp = REDUCE_BIT_FIELD (temp);
- return temp;
+ if (! safe_from_p (subtarget, treeop1, 1))
+ subtarget = 0;
+ if (modifier == EXPAND_STACK_PARM)
+ target = 0;
+ op0 = expand_expr (treeop0, subtarget,
+ VOIDmode, EXPAND_NORMAL);
+ /* Left shift optimization when shifting across word_size boundary.
+
+ If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
+ native instruction to support this wide mode left shift. Given below
+ scenario:
+
+ Type A = (Type) B << C
+
+ |< T >|
+ | dest_high | dest_low |
+
+ | word_size |
+
+ If the shift amount C caused we shift B to across the word size
+ boundary, i.e part of B shifted into high half of destination
+ register, and part of B remains in the low half, then GCC will use
+ the following left shift expand logic:
+
+ 1. Initialize dest_low to B.
+ 2. Initialize every bit of dest_high to the sign bit of B.
+ 3. Logic left shift dest_low by C bit to finalize dest_low.
+ The value of dest_low before this shift is kept in a temp D.
+ 4. Logic left shift dest_high by C.
+ 5. Logic right shift D by (word_size - C).
+ 6. Or the result of 4 and 5 to finalize dest_high.
+
+ While, by checking gimple statements, if operand B is coming from
+ signed extension, then we can simplify above expand logic into:
+
+ 1. dest_high = src_low >> (word_size - C).
+ 2. dest_low = src_low << C.
+
+ We can use one arithmetic right shift to finish all the purpose of
+ steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2. */
+
+ temp = NULL_RTX;
+ if (code == LSHIFT_EXPR
+ && target
+ && REG_P (target)
+ && ! unsignedp
+ && mode == GET_MODE_WIDER_MODE (word_mode)
+ && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
+ && ! have_insn_for (ASHIFT, mode)
+ && TREE_CONSTANT (treeop1)
+ && TREE_CODE (treeop0) == SSA_NAME)
+ {
+ gimple def = SSA_NAME_DEF_STMT (treeop0);
+ if (is_gimple_assign (def)
+ && gimple_assign_rhs_code (def) == NOP_EXPR)
+ {
+ machine_mode rmode = TYPE_MODE
+ (TREE_TYPE (gimple_assign_rhs1 (def)));
+
+ if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
+ && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode)
+ && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
+ >= GET_MODE_BITSIZE (word_mode)))
+ {
+ unsigned int high_off = subreg_highpart_offset (word_mode,
+ mode);
+ rtx low = lowpart_subreg (word_mode, op0, mode);
+ rtx dest_low = lowpart_subreg (word_mode, target, mode);
+ rtx dest_high = simplify_gen_subreg (word_mode, target,
+ mode, high_off);
+ HOST_WIDE_INT ramount = (BITS_PER_WORD
+ - TREE_INT_CST_LOW (treeop1));
+ tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
+
+ /* dest_high = src_low >> (word_size - C). */
+ temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
+ rshift, dest_high, unsignedp);
+ if (temp != dest_high)
+ emit_move_insn (dest_high, temp);
+
+ /* dest_low = src_low << C. */
+ temp = expand_variable_shift (LSHIFT_EXPR, word_mode, low,
+ treeop1, dest_low, unsignedp);
+ if (temp != dest_low)
+ emit_move_insn (dest_low, temp);
+
+ temp = target ;
+ }
+ }
+ }
+
+ if (temp == NULL_RTX)
+ temp = expand_variable_shift (code, mode, op0, treeop1, target,
+ unsignedp);
+ if (code == LSHIFT_EXPR)
+ temp = REDUCE_BIT_FIELD (temp);
+ return temp;
+ }
+
/* Could determine the answer when only additive constants differ. Also,
the addition of one can be handled by changing the condition. */
case LT_EXPR:
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog (revision 227017)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2015-08-19 Jiong Wang <jiong.wang@arm.com>
+
+ * gcc.dg/wide_shift_64_1.c: New testcase.
+ * gcc.dg/wide_shift_128_1.c: Likewise.
+ * gcc.target/aarch64/ashlti3_1.c: Likewise.
+
2015-08-19 Magnus Granberg <zorry@gentoo.org>
* lib/target-supports.exp
Index: gcc/testsuite/gcc.dg/wide-shift-128.c
===================================================================
--- gcc/testsuite/gcc.dg/wide-shift-128.c (revision 0)
+++ gcc/testsuite/gcc.dg/wide-shift-128.c (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+__int128_t
+load2 (int data)
+{
+ return (__int128_t) data << 50;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
Index: gcc/testsuite/gcc.dg/wide-shift-64.c
===================================================================
--- gcc/testsuite/gcc.dg/wide-shift-64.c (revision 0)
+++ gcc/testsuite/gcc.dg/wide-shift-64.c (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target mips*-*-* sparc*-*-* } } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+long long
+load1 (int data)
+{
+ return (long long) data << 12;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
Index: gcc/testsuite/gcc.target/aarch64/ashltidisi.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/ashltidisi.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/ashltidisi.c (working copy)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x, y, z)\
+__uint128_t __attribute__ ((noinline))\
+ushift_##x##_##z (unsigned y data)\
+{\
+ return (__uint128_t) data << x;\
+}\
+__int128_t __attribute__ ((noinline)) \
+shift_##x##_##z (y data) \
+{\
+ return (__int128_t) data << x;\
+}
+
+GEN_TEST_CASE (53, int, i)
+GEN_TEST_CASE (3, long long, ll)
+GEN_TEST_CASE (13, long long, ll)
+GEN_TEST_CASE (53, long long, ll)
+
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y, z, p) \
+ if (ushift_##y##_##p (x)\
+ != ((__uint128_t) (unsigned z) x << y)) \
+ abort ();\
+ if (shift_##y##_##p (x)\
+ != ((__uint128_t) (signed z) x << y)) \
+ abort ();
+
+ SHIFT_CHECK (0x12345678, 53, int, i)
+ SHIFT_CHECK (0xcafecafe, 53, int, i)
+
+ SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll)
+ SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll)
+ SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll)
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 4 } } */
+/* { dg-final { scan-assembler-not "extr\t" } } */