This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: ARM 64-bit shifts in NEON


Hi All,

I'm trying to implement DImode shifts using ARM NEON instructions. This wouldn't be difficult in itself, but making it play nice with the existing implementation is causing me problems. I'd like a few suggestions/pointers/comments to help me get this right, please.

The existing shift mechanisms must be kept, partly because the NEON unit is optional, and partly because it does not permit the full range of DImode operations, so sometimes it's more efficient to do 64-bit operations in core-registers, rather than copy all the values over to NEON, do the operation, and move the result back. Which set of patterns are used is determined by the register allocator and its costs mechanism.

The late decision means that the patterns may only use the post-reload splitter, and so cannot rely on many of the usual passes to sort out inefficiencies. In particular, the lack of combine makes it hard to detect and optimize extend-and-copy sequences.

So, I've attached two patches. The first is neon-shifts.patch, and does most of the work. The second is extendsidi2_neon.patch, and is intended to aid moving the shift amount from SImode registers, but doesn't go as far as I'd like.

I've not actually tested any of the output code just yet, so there may be logic errors, but those are easily fixed later, and what I'm trying to get right here is the GCC machine description.

Given this testcase:

   void
   f (long long *a, int b)
   {
     *a = *a << b;
   }

Without any patches, GCC gives this output, using only ARM core registers (in thumb2 mode):

   f:
         ldr     r2, [r0, #0]
         ldr     r3, [r0, #4]
         push    {r4, r5, r6}
         rsb     r6, r1, #32
         sub     r4, r1, #32
         lsrs    r6, r2, r6
         lsls    r5, r2, r4
         lsls    r3, r3, r1
         lsls    r1, r2, r1
         orrs    r3, r3, r6
         str     r1, [r0, #0]
         ands    r4, r3, r4, asr #32
         it      cc
         movcc   r4, r5
         str     r4, [r0, #4]
         pop     {r4, r5, r6}
         bx      lr

With just neon-shifts.patch, we get this output, now with NEON shifts:

f:
        fldd    d17, [r0, #0]   @ int
        mov     r2, r1
        movs    r3, #0
        push    {r4, r5}
        fmdrr   d18, r2, r3     @ int
        vshl.i64        d16, d17, d18
        fstd    d16, [r0, #0]   @ int
        pop     {r4, r5}
        bx      lr


As you can see, the shift is much improved, but the shift amount is first extended into two SImode registers, and then moved to a NEON DImode register, which increases core-register pressure unnecessarily.


With both patches, we now get this:

f:
        fldd    d17, [r0, #0]   @ int
        vdup.32 d16, r1
        vshr.u64        d16, d16, #32   <-- still unnecessary
        vshl.i64        d16, d17, d16
        fstd    d16, [r0, #0]   @ int
        bx      lr

Now the value is copied and then extended. I have chosen to use vdup.32 instead of vmov.i32 because the latter can only target half the DImode registers. The right shift is necessary for a general zero-extend, but is not useful in this case as only the bottom 8 bits are interesting, and vdup has already done the right thing.

Note that the examples I've given are for left shifts. Right shifts are also implemented, but are a little more complicated (in the shift-by-register case) because the shift must be implemented as a left shift by a negative amount, and so an unspec is used to prevent the compiler doing anything 'clever'. Apart from an extra negation, the end result is much the same, but the patterns look different.


All this is a nice improvement, but I'm not happy:


1. The post-reload split means that I've had to add a clobber for CC to all the patterns, even though only some of them really need it. I think I've convinced myself that this is ok because it doesn't matter before scheduling, and after splitting the clobbers are only retained if they're really needed, but it still feels wrong.

2. The extend optimization is fine for general case extends, but it can be improved for the shift-amount case because we actually only need the bottom 8 bits, as indicated above. The problem is that there's no obvious way to achieve this:
- there's no combine pass after this point, so a pattern that recognises and re-splits the extend, move and shift can't be used.
- I don't believe there can be a pattern that uses SImode for the shift amount because the value needs to be in a DImode register eventually, and that means one needs to have been allocated before it gets split, and that means the extend needs to be separate.


3. The type of the shift-amount is determined by the type used in the ashldi3 pattern, and that uses SImode. This is fine for values already in SImode registers (probably the common case), but means that values already in DImode registers will have to get truncated and then re-extended, and this is not an operation that can generally be optimized away once introduced.
- I've considered using a DImode shift-amount for the ashldi3 pattern, and that would solve this problem - extend and truncate *can* be optimized away, but since it doesn't get split until post reload, the register allocator would already have allocated two SImode registers before we have any chance to make it go away.


4. I'm not sure, but I think the general-case shift in core registers is sufficiently long-winded that it might be worthwhile completely discarding that option (i.e. it might cheaper to just always use neon shifts, when neon is available, of course). I'd keep the shift-by-constant-amount variants though. Does anybody have any comments on that?

5. The left and right shift patterns couldn't be unified because I couldn't find a way to do match_operand with unspecs, and anyway, the patterns are a slightly different shape.

6. Same with the logical and arithmetic right shifts; I couldn't find a way to unify those patterns either, even though the only difference is the unspec index number.


Any help would be appreciated. I've probably implemented this backwards, or something ...


Thanks a lot

Andrew


--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3441,7 +3441,13 @@
                    (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (TARGET_NEON)
+    {
+      rtx reg = convert_to_mode (DImode, operands[2], 1);
+      emit_insn (gen_ashldi3_neon (operands[0], operands[1], reg));
+      DONE;
+    }
+  else if (GET_CODE (operands[2]) == CONST_INT)
     {
       if ((HOST_WIDE_INT) INTVAL (operands[2]) == 1)
         {
@@ -3460,8 +3466,8 @@
 )
 
 (define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI            0 "s_register_operand" "=r,&r")
-        (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
+  [(set (match_operand:DI            0 "arm_general_register_operand" "=r,&r")
+        (ashift:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")
                    (const_int 1)))
    (clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT"
@@ -3500,7 +3506,13 @@
                      (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (TARGET_NEON)
+    {
+      rtx reg = convert_to_mode (DImode, operands[2], 1);
+      emit_insn (gen_ashrdi3_neon (operands[0], operands[1], reg));
+      DONE;
+    }
+  else if (GET_CODE (operands[2]) == CONST_INT)
     {
       if ((HOST_WIDE_INT) INTVAL (operands[2]) == 1)
         {
@@ -3557,7 +3569,13 @@
                      (match_operand:SI 2 "reg_or_int_operand" "")))]
   "TARGET_32BIT"
   "
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (TARGET_NEON)
+    {
+      rtx reg = convert_to_mode (DImode, operands[2], 1);
+      emit_insn (gen_lshrdi3_neon (operands[0], operands[1], reg));
+      DONE;
+    }
+  else if (GET_CODE (operands[2]) == CONST_INT)
     {
       if ((HOST_WIDE_INT) INTVAL (operands[2]) == 1)
         {
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz, Pe
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -172,6 +172,11 @@
   (and (match_code "const_int")
        (match_test "TARGET_THUMB1 && ival >= 0 && ival <= 7")))
 
+(define_constraint "Pe"
+  "@internal In ARM/Thumb-2 state, a constant in the range 0 to 63"
+  (and (match_code "const_int")
+       (match_test "TARGET_32BIT && ival >= 0 && ival < 64")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1090,6 +1090,279 @@
   DONE;
 })
 
+;; 64-bit shifts
+
+(define_insn "ashldi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	    "=w, w,?&r,?&r,?w,?w")
+	(ashift:DI (match_operand:DI 1 "s_register_operand" " w, w,  r,  r, w, w")
+		   (match_operand:DI 2 "shift_amount_64"    " w,Pe,  r, Pe, w,Pe")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "@
+   vshl.u64\t%P0, %P1, %P2
+   vshl.u64\t%P0, %P1, %2
+   #
+   #
+   vshl.u64\t%P0, %P1, %P2
+   vshl.u64\t%P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd,neon_vshl_ddd,*,*,neon_vshl_ddd,neon_vshl_ddd")
+   (set_attr "length" "*,*,28,12,*,*")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+)
+
+;; Splitter for 64-bit shifts in core-regs.
+;; Register operands only; constant shift amounds are handled below.
+(define_split
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(ashift:DI (match_operand:DI 1 "s_register_operand" "")
+		   (match_operand:DI 2 "s_register_operand" "")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON && reload_completed && !(IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 5) (ashift:SI (match_dup 7) (match_dup 8)))
+   (parallel
+    [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (minus:SI (const_int 32) (match_dup 8)) (const_int 0)))
+     (set (match_dup 4) (minus:SI (const_int 32) (match_dup 8)))])
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (lshiftrt:SI (match_dup 6) (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (neg:SI (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (ashift:SI (match_dup 6) (match_dup 4))))
+   (set (match_dup 5) (ior:SI (match_dup 5) (match_dup 4)))
+   (set (match_dup 4) (ashift:SI (match_dup 6) (match_dup 8)))]
+  "
+  {
+    operands[4] = gen_lowpart (SImode, operands[0]);
+    operands[5] = gen_highpart (SImode, operands[0]);
+    operands[6] = gen_lowpart (SImode, operands[1]);
+    operands[7] = gen_highpart (SImode, operands[1]);
+    operands[8] = gen_lowpart (SImode, operands[2]);
+  }")
+
+(define_insn "ashrdi3_neon_imm"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w,?&r,?w")
+	(ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w,  r, w")
+		     (match_operand:DI 2 "int_0_to_63"	      "Pe, Pe,Pe")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "@
+   vshr.s64\t%P0, %P1, %2
+   #
+   vshr.s64\t%P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd,*,neon_vshl_ddd")
+   (set_attr "length" "*,12,*")
+   (set_attr "arch" "nota8,*,onlya8")]
+)
+
+(define_insn_and_split "ashrdi3_neon_reg"
+  [(set (match_operand:DI 0 "s_register_operand"	     "=w,?&r,?w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" " w,  r, w")
+		    (match_operand:DI 2 "s_register_operand" " w,  r, w")]
+		   UNSPEC_ASHIFT_SIGNED))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "@
+   vshl.s64\t%P0, %P1, %P2
+   #
+   vshl.s64\t%P0, %P1, %P2"
+  "TARGET_NEON && reload_completed && !(IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 5) (lshiftrt:SI (match_dup 7) (match_dup 8)))
+   (parallel
+    [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (minus:SI (const_int 32) (match_dup 8)) (const_int 0)))
+     (set (match_dup 4) (minus:SI (const_int 32) (match_dup 8)))])
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (ashift:SI (match_dup 6) (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (neg:SI (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (ashiftrt:SI (match_dup 6) (match_dup 4))))
+   (set (match_dup 5) (ior:SI (match_dup 5) (match_dup 4)))
+   (set (match_dup 4) (ashiftrt:SI (match_dup 6) (match_dup 8)))]
+  "
+  {
+    operands[4] = gen_highpart (SImode, operands[0]);
+    operands[5] = gen_lowpart (SImode, operands[0]);
+    operands[6] = gen_highpart (SImode, operands[1]);
+    operands[7] = gen_lowpart (SImode, operands[1]);
+    operands[8] = gen_lowpart (SImode, operands[2]);
+  }"
+  [(set_attr "neon_type" "neon_vshl_ddd,*,neon_vshl_ddd")
+   (set_attr "length" "*,28,*")
+   (set_attr "arch" "nota8,*,onlya8")]
+)
+
+
+(define_expand "ashrdi3_neon"
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:DI 1 "s_register_operand" "")
+   (match_operand:DI 2 "shift_amount_64" "")]
+  "TARGET_NEON"
+{
+  rtx neg = gen_reg_rtx (DImode);
+  if (REG_P (operands[2]))
+    {
+      emit_insn (gen_negdi2 (neg, operands[2]));
+      emit_insn (gen_ashrdi3_neon_reg (operands[0], operands[1], neg));
+    }
+  else
+    emit_insn (gen_ashrdi3_neon_imm (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "lshrdi3_neon_imm"
+  [(set (match_operand:DI 0 "s_register_operand"	      "=w,?&r,?w")
+	(lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w,  r, w")
+		     (match_operand:DI 2 "int_0_to_63"	      "Pe, Pe,Pe")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "@
+   vshr.u64\t%P0, %P1, %2
+   #
+   vshr.u64\t%P0, %P1, %2"
+  [(set_attr "neon_type" "neon_vshl_ddd,*,neon_vshl_ddd")
+   (set_attr "length" "*,12,*")
+   (set_attr "arch" "nota8,*,onlya8")]
+)
+
+(define_insn_and_split "lshrdi3_neon_reg"
+  [(set (match_operand:DI 0 "s_register_operand"	     "=w,?&r,?w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" " w,  r, w")
+		    (match_operand:DI 2 "s_register_operand" " w,  r, w")]
+		   UNSPEC_ASHIFT_UNSIGNED))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON"
+  "@
+   vshl.u64\t%P0, %P1, %P2
+   #
+   vshl.u64\t%P0, %P1, %P2"
+  "TARGET_NEON && reload_completed && !(IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 5) (lshiftrt:SI (match_dup 7) (match_dup 8)))
+   (parallel
+    [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (minus:SI (const_int 32) (match_dup 8)) (const_int 0)))
+     (set (match_dup 4) (minus:SI (const_int 32) (match_dup 8)))])
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (ashift:SI (match_dup 6) (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (neg:SI (match_dup 4))))
+   (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+	      (set (match_dup 4) (lshiftrt:SI (match_dup 6) (match_dup 4))))
+   (set (match_dup 5) (ior:SI (match_dup 5) (match_dup 4)))
+   (set (match_dup 4) (lshiftrt:SI (match_dup 6) (match_dup 8)))]
+  "
+  {
+    operands[4] = gen_highpart (SImode, operands[0]);
+    operands[5] = gen_lowpart (SImode, operands[0]);
+    operands[6] = gen_highpart (SImode, operands[1]);
+    operands[7] = gen_lowpart (SImode, operands[1]);
+    operands[8] = gen_lowpart (SImode, operands[2]);
+  }"
+  [(set_attr "neon_type" "neon_vshl_ddd,*,neon_vshl_ddd")
+   (set_attr "length" "*,28,*")
+   (set_attr "arch" "nota8,*,onlya8")]
+)
+
+(define_expand "lshrdi3_neon"
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:DI 1 "s_register_operand" "")
+   (match_operand:DI 2 "shift_amount_64" "")]
+  "TARGET_NEON"
+{
+  rtx neg = gen_reg_rtx (DImode);
+  if (REG_P (operands[2]))
+    {
+      emit_insn (gen_negdi2 (neg, operands[2]));
+      emit_insn (gen_lshrdi3_neon_reg (operands[0], operands[1], neg));
+    }
+  else
+    emit_insn (gen_lshrdi3_neon_imm (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+;; Split all kinds constant 64-bit shifts, up to 31 bits
+(define_split
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(match_operator:DI 3 "neon_shift_operator"
+	  [(match_operand:DI 1 "s_register_operand" "")
+	   (match_operand:DI 2 "int_0_to_31" "")]))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON && reload_completed && !(IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 4) (match_op_dup 9 [(match_dup 6) (match_dup 2)]))
+   (set (match_dup 4) (ior:SI (match_op_dup 10 [(match_dup 7) (match_dup 8)]) (match_dup 4)))
+   (set (match_dup 5) (match_op_dup 3 [(match_dup 7) (match_dup 2)]))]
+  "
+  {
+    enum rtx_code firstshift;
+    enum rtx_code reverseshift;
+    enum rtx_code lastshift = GET_CODE (operands[3]);
+
+    /* There are patterns in arm.md for 1-bit shifts.  */
+    if (INTVAL (operands[2]) == 1)
+      FAIL;
+
+    switch (lastshift)
+      {
+      case ASHIFT:
+	operands[4] = gen_highpart (SImode, operands[0]);
+	operands[5] = gen_lowpart (SImode, operands[0]);
+	operands[6] = gen_highpart( SImode, operands[1]);
+	operands[7] = gen_lowpart (SImode, operands[1]);
+	firstshift = ASHIFT;
+	reverseshift = LSHIFTRT;
+	break;
+      case ASHIFTRT:
+      case LSHIFTRT:
+	operands[4] = gen_lowpart (SImode, operands[0]);
+	operands[5] = gen_highpart (SImode, operands[0]);
+	operands[6] = gen_lowpart (SImode, operands[1]);
+	operands[7] = gen_highpart( SImode, operands[1]);
+	firstshift = LSHIFTRT;
+	reverseshift = ASHIFT;
+	break;
+      default:
+        gcc_unreachable ();
+      }
+
+    operands[8] = gen_rtx_CONST_INT (VOIDmode, 32 - INTVAL (operands[2]));
+    operands[9] = gen_rtx_fmt_ee (firstshift, SImode, const0_rtx, const0_rtx);
+    operands[10] = gen_rtx_fmt_ee (reverseshift, SImode, const0_rtx, const0_rtx);
+    operands[3] = gen_rtx_fmt_ee (lastshift, SImode, const0_rtx, const0_rtx);
+  }")
+
+;; Split all kinds constant 64-bit shifts, over 31 bits
+(define_split
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(match_operator:DI 3 "neon_shift_operator"
+	  [(match_operand:DI 1 "s_register_operand" "")
+	   (match_operand:DI 2 "int_32_to_63" "")]))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_NEON && reload_completed && !(IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 4) (match_op_dup 3 [(match_dup 6) (match_dup 7)]))
+   (set (match_dup 5) (const_int 0))]
+  "
+  {
+    enum rtx_code code = GET_CODE (operands[3]);
+    operands[3] = gen_rtx_fmt_ee (code, SImode, const0_rtx, const0_rtx);
+
+    switch (code)
+      {
+      case ASHIFT:
+	operands[4] = gen_highpart (SImode, operands[0]);
+	operands[5] = gen_lowpart (SImode, operands[0]);
+	operands[6] = gen_lowpart (SImode, operands[1]);
+	operands[7] = gen_rtx_CONST_INT (VOIDmode, INTVAL (operands[2]) - 32);
+	break;
+      case ASHIFTRT:
+      case LSHIFTRT:
+	operands[4] = gen_lowpart (SImode, operands[0]);
+	operands[5] = gen_highpart (SImode, operands[0]);
+	operands[6] = gen_highpart (SImode, operands[1]);
+	operands[7] = gen_rtx_CONST_INT (VOIDmode, INTVAL (operands[2]) - 32);
+	break;
+      default:
+	gcc_unreachable ();
+      }
+  }")
+
 ;; Widening operations
 
 (define_insn "widen_ssum<mode>3"
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -248,6 +248,12 @@
 		    && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)) <= 32)")
        (match_test "mode == GET_MODE (op)")))
 
+;; NEON 64-bit shifts are a little more limited.
+;; This is only used for constant shifts anyway.
+(define_special_predicate "neon_shift_operator"
+  (and (match_code "ashift,ashiftrt,lshiftrt")
+       (match_test "mode == GET_MODE (op)")))
+
 ;; True for MULT, to identify which variant of shift_operator is in use.
 (define_special_predicate "mult_operator"
   (match_code "mult"))
@@ -764,3 +770,19 @@
 
 (define_special_predicate "add_operator"
   (match_code "plus"))
+
+(define_predicate "int_0_to_63"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 63)")))
+
+(define_predicate "int_0_to_31"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 31)")))
+
+(define_predicate "int_32_to_63"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 32, 63)")))
+
+(define_predicate "shift_amount_64"
+  (ior (match_operand 0 "s_register_operand")
+       (match_operand 0 "int_0_to_63")))

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4403,33 +4403,35 @@
 ;; Zero and sign extension instructions.
 
 (define_insn "zero_extend<mode>di2"
-  [(set (match_operand:DI 0 "s_register_operand" "=r")
+  [(set (match_operand:DI 0 "s_register_operand" "=w, r")
         (zero_extend:DI (match_operand:QHSI 1 "<qhs_zextenddi_op>"
 					    "<qhs_zextenddi_cstr>")))]
   "TARGET_32BIT <qhs_zextenddi_cond>"
   "#"
-  [(set_attr "length" "8")
-   (set_attr "ce_count" "2")
-   (set_attr "predicable" "yes")]
+  [(set_attr "length" "8,8")
+   (set_attr "ce_count" "2,2")
+   (set_attr "predicable" "yes,yes")]
 )
 
 (define_insn "extend<mode>di2"
-  [(set (match_operand:DI 0 "s_register_operand" "=r")
+  [(set (match_operand:DI 0 "s_register_operand" "=w,r")
         (sign_extend:DI (match_operand:QHSI 1 "<qhs_extenddi_op>"
 					    "<qhs_extenddi_cstr>")))]
   "TARGET_32BIT <qhs_sextenddi_cond>"
   "#"
-  [(set_attr "length" "8")
-   (set_attr "ce_count" "2")
-   (set_attr "shift" "1")
-   (set_attr "predicable" "yes")]
+  [(set_attr "length" "8,8")
+   (set_attr "ce_count" "2,2")
+   (set_attr "shift" "1,1")
+   (set_attr "predicable" "yes,yes")]
 )
 
 ;; Splits for all extensions to DImode
 (define_split
   [(set (match_operand:DI 0 "s_register_operand" "")
         (zero_extend:DI (match_operand 1 "nonimmediate_operand" "")))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && (!TARGET_NEON
+		    || (reload_completed
+			&& !(IS_VFP_REGNUM (REGNO (operands[0])))))"
   [(set (match_dup 0) (match_dup 1))]
 {
   rtx lo_part = gen_lowpart (SImode, operands[0]);
@@ -4455,7 +4457,9 @@
 (define_split
   [(set (match_operand:DI 0 "s_register_operand" "")
         (sign_extend:DI (match_operand 1 "nonimmediate_operand" "")))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && (!TARGET_NEON
+		    || (reload_completed
+			&& !(IS_VFP_REGNUM (REGNO (operands[0])))))"
   [(set (match_dup 0) (ashiftrt:SI (match_dup 1) (const_int 31)))]
 {
   rtx lo_part = gen_lowpart (SImode, operands[0]);
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -405,8 +405,8 @@
 (define_mode_attr qhs_extenddi_op [(SI "s_register_operand")
 				   (HI "nonimmediate_operand")
 				   (QI "arm_reg_or_extendqisi_mem_op")])
-(define_mode_attr qhs_extenddi_cstr [(SI "r") (HI "rm") (QI "rUq")])
-(define_mode_attr qhs_zextenddi_cstr [(SI "r") (HI "rm") (QI "rm")])
+(define_mode_attr qhs_extenddi_cstr [(SI "r,r") (HI "r,rm") (QI "r,rUq")])
+(define_mode_attr qhs_zextenddi_cstr [(SI "r,r") (HI "r,rm") (QI "r,rm")])
 
 ;; Mode attributes used for fixed-point support.
 (define_mode_attr qaddsub_suf [(V4UQQ "8") (V2UHQ "16") (UQQ "8") (UHQ "16")
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5818,3 +5818,25 @@
                                    (const_string "neon_fp_vadd_qqq_vabs_qq"))
                      (const_string "neon_int_5")))]
 )
+
+;; Copy from core-to-neon regs, then extend, not vice-versa
+
+(define_split
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(sign_extend:DI (match_operand:SI 1 "s_register_operand" "")))]
+  "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))"
+  [(set (match_dup 0) (vec_duplicate:V2SI (match_dup 1)))
+   (parallel [(set (match_dup 0) (ashiftrt:DI (match_dup 0) (const_int 32)))
+	      (clobber (reg:CC CC_REGNUM))])])
+
+(define_split
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(zero_extend:DI (match_operand:SI 1 "s_register_operand" "")))]
+  "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))"
+  [(set (match_dup 2) (vec_duplicate:V2SI (match_dup 1)))
+   (parallel [(set (match_dup 0) (lshiftrt:DI (match_dup 0) (const_int 32)))
+              (clobber (reg:CC CC_REGNUM))])]
+  "
+  {
+    operands[2] = gen_rtx_REG (V2SImode, REGNO (operands[0]));
+  }")

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]