[RFH] subreg of a vector without going through memory

Marc Glisse marc.glisse@inria.fr
Sun Nov 4 09:31:00 GMT 2012


Hello,

trying to make some progress on PR 53101, I wrote the attached patch
(it might be completely wrong for big endian, I don't know)
(it is also missing a check that it isn't a paradoxical subreg)

 	* simplify-rtx.c (simplify_subreg): For vectors, create a VEC_SELECT.

However, when I compile this code on x86_64:

typedef double v4 __attribute__((vector_size(32)));
typedef double v2 __attribute__((vector_size(16)));
v2 f(v4 x){
   return *(v2*)&x;
}


I see in the *.combine dump:

[...]
Trying 6 -> 7:
Successfully matched this instruction:
(set (reg:V2DF 60 [ <retval> ])
     (vec_select:V2DF (reg/v:V4DF 61 [ x ])
         (parallel [
                 (const_int 0 [0])
                 (const_int 1 [0x1])
             ])))
rejecting combination of insns 6 and 7
original costs 4 + 16 = 20
replacement cost 32
[...]
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:V4DF 61 [ x ])
         (reg:V4DF 21 xmm0 [ x ])) v.cc:3 1123 {*movv4df_internal}
      (expr_list:REG_DEAD (reg:V4DF 21 xmm0 [ x ])
         (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:OI 63 [ x ])
         (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) v.cc:4 60 
{*movoi_internal_avx}
      (expr_list:REG_DEAD (reg/v:V4DF 61 [ x ])
         (nil)))
(insn 7 6 11 2 (set (reg:V2DF 60 [ <retval> ])
         (subreg:V2DF (reg:OI 63 [ x ]) 0)) v.cc:4 1124 {*movv2df_internal}
      (expr_list:REG_DEAD (reg:OI 63 [ x ])
         (nil)))
(insn 11 7 14 2 (set (reg/i:V2DF 21 xmm0)
         (reg:V2DF 60 [ <retval> ])) v.cc:5 1124 {*movv2df_internal}
      (expr_list:REG_DEAD (reg:V2DF 60 [ <retval> ])
         (nil)))
(insn 14 11 0 2 (use (reg/i:V2DF 21 xmm0)) v.cc:5 -1
      (nil))



I am surprised by that high replacement cost that prevents the change. Is 
my approach wrong? Is there an issue with the evaluation of costs?

The approach was suggested by Richard B:
http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html

-- 
Marc Glisse
-------------- next part --------------
Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c	(revision 193127)
+++ simplify-rtx.c	(working copy)
@@ -5884,20 +5884,35 @@ simplify_subreg (enum machine_mode outer
   if (SCALAR_INT_MODE_P (outermode)
       && SCALAR_INT_MODE_P (innermode)
       && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
       && byte == subreg_lowpart_offset (outermode, innermode))
     {
       rtx tem = simplify_truncation (outermode, op, innermode);
       if (tem)
 	return tem;
     }
 
+  if (VECTOR_MODE_P (innermode)
+      && GET_MODE_INNER (innermode) == (VECTOR_MODE_P (outermode)
+					? GET_MODE_INNER (outermode)
+					: outermode))
+    {
+      unsigned elem_size = GET_MODE_SIZE (GET_MODE_INNER (innermode));
+      unsigned n = GET_MODE_SIZE (outermode) / elem_size;
+      unsigned start = byte / elem_size;
+      rtvec vec = rtvec_alloc (n);
+      for (unsigned i = 0; i < n; i++)
+	RTVEC_ELT (vec, i) = GEN_INT (start + i);
+      return simplify_gen_binary (VEC_SELECT, outermode, op,
+				  gen_rtx_PARALLEL (VOIDmode, vec));
+    }
+
   return NULL_RTX;
 }
 
 /* Make a SUBREG operation or equivalent if it folds.  */
 
 rtx
 simplify_gen_subreg (enum machine_mode outermode, rtx op,
 		     enum machine_mode innermode, unsigned int byte)
 {
   rtx newx;


More information about the Gcc-patches mailing list