[PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

Mon Jan 11 13:26:00 GMT 2016

Hi Robert,

Thanks for the update and detailed comments. There are a couple of things
which I think still need addressing based solely on your comments but
generally the changes seem to have simplified things which is nice to see.

I haven't read the patch again yet and given the changes it looks like
a complete re-read is in order so it will be a few days. All being well
there won't be much to say.

> >
> > >+(define_expand "msa_ldi<mode>"
> > >+  [(match_operand:IMSA 0 "register_operand")
> > >+   (match_operand 1 "const_imm10_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  unsigned n_elts = GET_MODE_NUNITS (<MODE>mode);
> > >+  rtvec v = rtvec_alloc (n_elts);
> > >+  HOST_WIDE_INT val = INTVAL (operands[1]);
> > >+  unsigned int i;
> > >+
> > >+  if (<MODE>mode != V16QImode)
> > >+    {
> > >+      unsigned shift = HOST_BITS_PER_WIDE_INT - 10;
> > >+      val = trunc_int_for_mode ((val << shift) >> shift, <UNITMODE>mode);
> > >+    }
> > >+  else
> > >+    val = trunc_int_for_mode (val, <UNITMODE>mode);
> > >+
> > >+  for (i = 0; i < n_elts; i++)
> > >+    RTVEC_ELT (v, i) = GEN_INT (val);
> > >+  emit_move_insn (operands[0],
> > >+		  gen_rtx_CONST_VECTOR (<MODE>mode, v));
> > >+  DONE;
> > >+})
> >
> > This is really weird. We shouldn't be simply discarding bits that don't fit.
> > This needs to accept all immediates and generate the correct code to
> > get a replicated constant of that value into a register. I think it is
> > probably OK to trunc_int_for_mode on the original 'val' for the
> > <UNIT>mode but anything out of range for V*HI/SI/DI needs to be expanded
> > properly.
> >
> > Please do not gen_msa_ldi anywhere other than from MSA builtins. There is
> > no need just emit a move directly.
> 
> AFAICS, the truncation for everything except V16QImode is not needed
> since we have the predicate here. Truncating the immediate for bytes may make
> life easier for users when debugging. Although the extra bits are ignored by
> the hardware, it doesn't stop us from encoding numbers out of range.
> The RTL doesn't seem to have validation of ranges of constants and modes.
> I did a small test and could output any number within the allowable range
> in the predicate.

This is still giving me cause for concern. I think we need to expand this
fully so that constants larger than s10 can be loaded albeit via a longer
sequence. I am not a fan of simplistic builtins that simply map 1:1 with
hardware instructions when that means there is unexpected impact if you
don't know the exact details of the underlying instruction.

> >
> > >+;; Offset load
> > >+(define_expand "msa_ld_<msafmt_f>"
> > >+  [(match_operand:MSA 0 "register_operand")
> > >+   (match_operand 1 "pmode_register_operand")
> > >+   (match_operand 2 "aq10<msafmt>_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> > >+				      INTVAL (operands[2]));
> > >+  mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr));
> > >+  DONE;
> > >+})
> > >+
> > >+;; Offset store
> > >+(define_expand "msa_st_<msafmt_f>"
> > >+  [(match_operand:MSA 0 "register_operand")
> > >+   (match_operand 1 "pmode_register_operand")
> > >+   (match_operand 2 "aq10<msafmt>_operand")]
> > >+  "ISA_HAS_MSA"
> > >+{
> > >+  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1],
> > >+			    INTVAL (operands[2]));
> > >+  mips_emit_move (gen_rtx_MEM (<MODE>mode, addr), operands[0]);
> > >+  DONE;
> > >+})
> >
> > There's no real need to expand these in C code. The patterns can be used
> > to create the RTL. As an aside, I don't really see the point in intrinsics
> > to load and store data the same thing can be done from straight C.
> > The patterns also can't be used for const or volatile data as their
> > builtin prototypes are neither. I suspect they should at least support
> > pointers to const data.
> 
> I was going to remove the expansion in C code but the problem is that
> we need the Pmode. I could duplicate the patterns and override the icode
> e.g for TARGET_64BIT when expanding the built-ins but this doesn't look
> cleaner. Unless I'm missing something here?
> Since the intrinsics have been present for a while it would probably confuse
> if we removed them now. Indeed, the intrinsics are not really needed as we
> can do loads and stores from C. I changed the type in the prototype
> to CVPOINTER so qualifiers like const/volatile are not discarded.

The volatile is however not propagated to the memref so is effectively lost
I believe. The constant in the offset here has a similar problem to msa_ldi
as an out-of-range value will just go wrong. I'll look for other cases when
reading this time.

> >
> > >+  "ISA_HAS_MSA"
> > >+  "andi.b\t%w0,%w1,%E2"
> > >+  [(set_attr "type" "simd_logic")
> > >+   (set_attr "mode" "V16QI")])
> > >+
> >
> > END USEFUL commnets

Oops, thanks for reading on. That was my marker for how far I'd got in one
session!.

> > >+(define_insn "msa_bclr_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BCLR))]
> > >+  "ISA_HAS_MSA"
> > >+  "bclr.<msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_bclri_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand 2 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BCLRI))]
> > >+  "ISA_HAS_MSA"
> > >+  "bclri.<msafmt>\t%w0,%w1,%2"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsl_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand:IMSA 3 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BINSL))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsl.<msafmt>\t%w0,%w2,%w3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsli_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BINSLI))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsli.<msafmt>\t%w0,%w2,%3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsr_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand:IMSA 3 "register_operand" "f")]
> > >+		     UNSPEC_MSA_BINSR))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsr.<msafmt>\t%w0,%w2,%w3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "msa_binsri_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "0")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")
> > >+		      (match_operand 3 "const_<bitimm>_operand" "")]
> > >+		     UNSPEC_MSA_BINSRI))]
> > >+  "ISA_HAS_MSA"
> > >+  "binsri.<msafmt>\t%w0,%w2,%3"
> > >+  [(set_attr "type" "simd_bitins")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > As a follow up these instructions may be represenatble with standard
> > RTL.
> 
> The "insvm" appears to be a good candidate but I'm not sure if this
> is going to work out of the box for vector modes. This would apply
> only for instructions with a replicated immediate.

Let's leave it for follow on work.

> > >+(define_insn "msa_<FCC:fcc>_<FMSA:msafmt>"
> > >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> > >+	(FCC:<VIMODE> (match_operand:FMSA 1 "register_operand" "f")
> > >+		      (match_operand:FMSA 2 "register_operand" "f")))]
> > >+  "ISA_HAS_MSA"
> > >+  "<FCC:fcc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_fcmp")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > Can't the msa builtins target these patterns directly? Follow on work should
> > implement "mov<mode>cc" for vector modes.
> >
> > >+
> > >+(define_insn "msa_<fsc>_<FMSA:msafmt>"
> > >+  [(set (match_operand:<VIMODE> 0 "register_operand" "=f")
> > >+	(unspec:<VIMODE> [(match_operand:FMSA 1 "register_operand" "f")
> > >+			   (match_operand:FMSA 2 "register_operand" "f")]
> > >+			 FSC_UNS))]
> > >+  "ISA_HAS_MSA"
> > >+  "<fsc>.<FMSA:msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_fcmp")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > i.e. this is not necessary.
> 
> The difference between this and above is the signalling vs quiet NaNs floating
> point comparison instructions. The quiet NaNs are represented as standard RTL and
> signalling as UNSPEC. "mov<mode>cc" for vector modes is questionable as "vcondmn"
> is meant to be used for conditional vector moves unless I misunderstood something.

Ah, my mistake. I didn't check if mov<mode>cc was applicable to vector modes.
Same thing applies that this kind of thing is good for future work.

> >
> > >+(define_insn "msa_max_a_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")
> > >+		      (match_operand:IMSA 2 "register_operand" "f")]
> > >+		     UNSPEC_MSA_MAX_A))]
> > >+  "ISA_HAS_MSA"
> > >+  "max_a.<msafmt>\t%w0,%w1,%w2"
> > >+  [(set_attr "type" "simd_int_arith")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > This could be used as part of mov<mode>cc support.
> 
> AFAUI, mov<mode>cc is used for non-vector modes and the support
> should go into vcond[u]mn SPNs. However, for the above and floating-point
> version I'm not sure if we can add it there as we don't have enough
> information as it appears to be simplified to a basic comparison.
> 
> I haven't confirmed it yet but it would appear that the support
> for these operations would have to be explicitly added to
> the auto-vectorizer to recognize these patterns.

OK. We will have to see if the work Prachi and Sameera are doing will
enable these to be targeted.

> >
> > >+(define_insn "msa_nloc_<msafmt>"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(unspec:IMSA [(match_operand:IMSA 1 "register_operand" "f")]
> > >+		     UNSPEC_MSA_NLOC))]
> > >+  "ISA_HAS_MSA"
> > >+  "nloc.<msafmt>\t%w0,%w1"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> > >+
> > >+(define_insn "clz<mode>2"
> > >+  [(set (match_operand:IMSA 0 "register_operand" "=f")
> > >+	(clz:IMSA (match_operand:IMSA 1 "register_operand" "f")))]
> > >+  "ISA_HAS_MSA"
> > >+  "nlzc.<msafmt>\t%w0,%w1"
> > >+  [(set_attr "type" "simd_bit")
> > >+   (set_attr "mode" "<MODE>")])
> >
> > Can you confirm that nlzc has a natural value when given an operand of zero?
> > I.e. 8 for B 16 for H 32 for W and 64 for D?
> >
> > Also CLZ_DEFINED_VALUE_AT_ZERO looks like it needs updating to know the
> > bitsize of elements in a vector rather than the whole vector. I.e. I think
> > it will say the result is 128 for any vector mode.
> 
> I'm not so sure about this one.
> 
> The above pattern as it stands doesn't accept the zero operand, would the macro
> would still matter? The built-in also rejects the zero argument.

I guess not then.

> It appears that the code around, where the macro is used, doesn't handle vector modes
> and the modification of this macro may potentially lead to strange errors.
> AFAICS, it's very unlikely that we ever hit those paths so it appears we are safe.
> 
> For the sake of consistency, I think that we still should consider
> vector elements so I updated the macro to use GET_MODE_UNIT_BITSIZE.
> >

OK. I'll take a read when I go through this again.

Thanks,
Matthew