This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] MIPS add support for MIPS SIMD ARCHITECTURE V1.07


Graham Stott <Graham.Stott@imgtec.com> writes:
> Index: gcc/config/mips/mips-msa.md
> ===================================================================
> --- gcc/config/mips/mips-msa.md	(revision 0)
> +++ gcc/config/mips/mips-msa.md	(working copy)
> @@ -0,0 +1,3717 @@
> +;; Machine Description for MIPS MSA ASE
> +;; Based on the MIPS MSA spec Revision 0.??? 11/5/2012

Would be good to update this now that there's a published spec.

> +;; This attribute gives the integer mode for selection mask in vec_perm.
> +;; vcond also uses MODE128_I for operand 0, 1, and 2.
> +(define_mode_attr MODE128_I
> +  [(V2DF "V2DI")
> +   (V4SF "V4SI")
> +   (V2DI "V2DI")
> +   (V4SI "V4SI")
> +   (V8HI "V8HI")
> +   (V16QI "V16QI")])

I think instead we should extend IMODE in mips.md.

> +;; This attributes gives the element mode for vector mode.
> +(define_mode_attr ELEMMODE
> +  [(V2DF "DF")
> +   (V4SF "SF")
> +   (V2DI "DI")
> +   (V4SI "SI")
> +   (V8HI "HI")
> +   (V16QI "QI")])

Here too I think we should extend UNITMODE.

> +(define_expand "vec_extract<mode>"
> +  [(match_operand:<ELEMMODE> 0 "register_operand")
> +   (match_operand:IMODE128 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  gcc_assert (INTVAL (operands[2]) <= <max_elem_index>);
> +  enum machine_mode mode0 = GET_MODE (operands[0]);
> +  if (mode0 == QImode || mode0 == HImode)
> +    {
> +      unsigned int byte = 0;
> +      rtx dest1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_msa_copy_s_<msafmt> (dest1, operands[1], operands[2]));
> +      if (TARGET_BIG_ENDIAN)
> +	{
> +	  if (mode0 == QImode)
> +	    byte = 3;
> +	  else if (mode0 == HImode)
> +	    byte = 2;
> +	}
> +      rtx dest2 = simplify_gen_subreg (mode0, dest1, SImode, byte);

Looks like this is:

         rtx dest2 = gen_lowpart (mode0, dest1);

> +(define_expand "vec_set<mode>"
> +  [(match_operand:FMODE128 0 "register_operand")
> +   (match_operand:<ELEMMODE> 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  int index = INTVAL (operands[2]);
> +  gcc_assert (index <= <max_elem_index>);

The snipped hunks above had:

  gcc_assert (INTVAL (operands[2]) <= <max_elem_index>);

which seems better (INTVAL is a HOST_WIDE_INT rather than an int).

> +(define_insn "msa_insert_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand 2 "const_uimm4_operand" "")
> +		       (match_operand:SI 3 "reg_or_0_operand" "dJ")]
> +		      UNSPEC_MSA_INSERT))]
> +  "ISA_HAS_MSA"
> +  "insert.b\t%w0[%2],%z3"
> +  [(set_attr "type"	"mtc")
> +   (set_attr "mode"	"TI")])
> +
> +; Similar to msa_insert_b but with QImode for operand 3.

But...

> +(define_insn "*msa_insert_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand 2 "const_uimm4_operand" "")
> +		       (match_operand:QI 3 "reg_or_0_operand" "dJ")]
> +		      UNSPEC_MSA_INSERT))]
> +  "ISA_HAS_MSA"
> +  "insert.b\t%w0[%2],%z3"
> +  [(set_attr "type"	"mtc")
> +   (set_attr "mode"	"TI")])

...why do you need both?  Why not just create a QImode subreg?
At least the comment should be expanded.

Would be better to use mode iterators for these patterns if possible.

> +;; Note that insert.d will be splitted later if !TARGET_64BIT.
> [...]
> +;; Note that insert.d will be splitted later if !TARGET_64BIT.

s/splittted/split/.  Several other cases later.  Pedantically you
should force the split by making the define_insn return "#" for the
!TARGET_64BIT case, although I don't know whether that's needed in
practice these days.

> +(define_split
> +  [(set (match_operand:V2DI 0 "register_operand")
> +	(unspec: V2DI [(match_operand:V2DI 1 "register_operand")
> +		       (match_operand 2 "const_0_or_1_operand")
> +		       (match_operand:DI 3 "reg_or_0_operand" "")]
> +		      UNSPEC_MSA_INSERT))]
> [...]
> +(define_split
> +  [(set (match_operand:V2DF 0 "register_operand")
> +	(unspec: V2DF [(match_operand:V2DF 1 "register_operand")
> +		       (match_operand 2 "const_0_or_1_operand")
> +		       (match_operand:DF 3 "register_operand" "")]
> +		      UNSPEC_MSA_INSERT))]

Excess space after "unspec:".  No "" on operand 3.  Mode iterators
would help here too.

Several more examples later.

> +(define_insn "msa_insve_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand 2 "const_uimm4_operand" "")
> +		       (match_operand:V16QI 3 "register_operand" "f")]
> +		      UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.b\t%w0[%2],%w3[0]"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_insve_h"
> +  [(set (match_operand:V8HI 0 "register_operand" "=f")
> +	(unspec:V8HI [(match_operand:V8HI 1 "register_operand" "0")
> +		      (match_operand 2 "const_uimm3_operand" "")
> +		      (match_operand:V8HI 3 "register_operand" "f")]
> +		      UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.h\t%w0[%2],%w3[0]"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_insve_w"
> +  [(set (match_operand:V4SI 0 "register_operand" "=f")
> +	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
> +		      (match_operand 2 "const_0_to_3_operand" "")
> +		      (match_operand:V4SI 3 "register_operand" "f")]
> +		      UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.w\t%w0[%2],%w3[0]"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_insve_w_f"
> +  [(set (match_operand:V4SF 0 "register_operand" "=f")
> +	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
> +		      (match_operand 2 "const_0_to_3_operand" "")
> +		      (match_operand:V4SF 3 "register_operand" "f")]
> +		      UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.w\t%w0[%2],%w3[0]"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Looks like these ought to be able to use mode iterators too.

> +(define_insn "msa_copy_s_b"
> +  [(set (match_operand:SI 0 "register_operand" "=d")
> +	(unspec:SI [(match_operand:V16QI 1 "register_operand" "f")
> +		    (match_operand 2 "const_uimm4_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.b\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_copy_s_h"
> +  [(set (match_operand:SI 0 "register_operand" "=d")
> +	(unspec:SI [(match_operand:V8HI 1 "register_operand" "f")
> +		    (match_operand 2 "const_uimm3_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.h\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_copy_s_w"
> +  [(set (match_operand:SI 0 "register_operand" "=d")
> +	(unspec:SI [(match_operand:V4SI 1 "register_operand" "f")
> +		    (match_operand 2 "const_0_to_3_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.w\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_copy_s_w_f"
> +  [(set (match_operand:SF 0 "register_operand" "=d")
> +	(unspec:SF [(match_operand:V4SF 1 "register_operand" "f")
> +		    (match_operand 2 "const_0_to_3_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.w\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])
> +
> +;; Note that copy_s.d will be splitted later if !TARGET_64BIT.
> +(define_insn "msa_copy_s_d"
> +  [(set (match_operand:DI 0 "register_operand" "=d")
> +	(unspec:DI [(match_operand:V2DI 1 "register_operand" "f")
> +		    (match_operand 2 "const_0_or_1_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.d\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])
> +
> +;; Note that copy_s.d will be splitted later if !TARGET_64BIT.
> +(define_insn "msa_copy_s_d_f"
> +  [(set (match_operand:DF 0 "register_operand" "=d")
> +	(unspec:DF [(match_operand:V2DF 1 "register_operand" "f")
> +		    (match_operand 2 "const_0_or_1_operand" "")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "ISA_HAS_MSA"
> +  "copy_s.d\t%0,%w1[%2]"
> +  [(set_attr "type"	"mfc")
> +   (set_attr "mode"	"TI")])

Here too.

> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> +	(unspec: DI [(match_operand:V2DI 1 "register_operand")
> +		     (match_operand 2 "const_0_or_1_operand")]
> +		    UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], false);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:DF 0 "register_operand")
> +	(unspec: DF [(match_operand:V2DF 1 "register_operand")
> +		     (match_operand 2 "const_0_or_1_operand")]
> +		    UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], false);
> +  DONE;
> +})

Same comments as for the insert splitters.

All the copy_s comments apply to copy_u too.

> +(define_expand "vec_perm<mode>"
> +  [(match_operand:MODE128 0 "register_operand" "")
> +   (match_operand:MODE128 1 "register_operand" "")
> +   (match_operand:MODE128 2 "register_operand" "")
> +   (match_operand:<MODE128_I> 3 "register_operand" "")]
> +  "ISA_HAS_MSA"
> +{
> +  /* Note that GCC always uses memory order (as big-endian) in indexing,
> +     and layouts operands[1] frist and then operands[2] next.
> +     However, vshf starts indexes from wt to ws, so so we need to swap
> +     two operands.  MSA loads or stores elements to or from the rightmost
> +     position of vector registers, for both big-endian and little-endian CPUs.
> +     No need to change any index numbers.  */

Oh no, it's like VSX all over again.  Although I think the comment makes
it sound more complicated than it is.  Maybe:

  /* The optab semantics are that index 0 selects the first element
     of operands[1] and the highest index selects the last element
     of operands[2].  This is the opposite order from "vshf.df wd,ws,wt",
     where index 0 selects the first element of wt and the highest index
     selects the last element of ws.  We therefore swap the operands here.  */

> +(define_expand "neg<mode>2"
> +  [(match_operand:MODE128 0 "register_operand" "")
> +   (match_operand:MODE128 1 "register_operand" "")]
> +  "ISA_HAS_MSA"
> +{
> +  rtx reg = gen_reg_rtx (<MODE>mode);
> +  emit_insn (gen_msa_ldi<mode> (reg, GEN_INT (0)));

"const0_rtx" rather than "GEN_INT (0)" (all occurences).

> +;; 128-bit integer/MSA vector registers moves
> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and J, so that move_type for J is fmove and its
> +;; instruction length can be 1.

I don't understand the last sentence.  It looks like mips_output_move
uses LDI.x for the f<-J case, and msa_ldi<mode> has type arith.
I think these should be kept as separate alternatives.

Same comment for the TARGET_64BIT pattern.

> +(define_insn "movti_msa"
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=*d,*d,*d,*R,*d,*f,f,R,f")
> +        (match_operand:TI 1 "move_operand" "*d,*i,*R,*d*J,*f,*d,R,f,fJ"))]
> +  "ISA_HAS_MSA
> +   && !TARGET_64BIT
> +   && (register_operand (operands[0], TImode)
> +       || reg_or_0_operand (operands[1], TImode))"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"        "move,const,load,store,mfc,mtc,fpload,fpstore,fmove")
> +   (set_attr "mode"     "TI")])

Why do we need to allow TImode in GPRs for 32-bit mode?  I think we should
try to avoid that if at all possible.

> +(define_expand "mov<mode>"
> +  [(set (match_operand:MODE128 0)
> +        (match_operand:MODE128 1))]
> +  "TARGET_64BIT || TARGET_MSA"
> +{
> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
> +    DONE;
> +})

Why the "TARGET_64BIT || "?  Why do we want to allow these modes without
MSA in that case?

> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and YG, so that move_type for YG is fmove and its
> +;; instruction length can be 1.
> +(define_insn "mov<mode>_msa"
> +  [(set (match_operand:MODE128 0 "nonimmediate_operand" "=f,f,R,R,*f,*d,*d,*d,*R")
> +        (match_operand:MODE128 1 "move_operand" "fYG,R,f,YG,*d,*f,*d*YG,*R,*d"))]
> +  "ISA_HAS_MSA
> +   && (register_operand (operands[0], <MODE>mode)
> +       || reg_or_0_operand (operands[1], <MODE>mode))"
> +{ return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"        "fmove,fpload,fpstore,store,mtc,mfc,move,load,store")
> +   (set_attr "mode"     "TI")])

Same question I suppose: why do we want to allow these vectors into GPRs?

> +(define_split
> +  [(set (match_operand:TI 0 "nonimmediate_operand")
> +        (match_operand:TI 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:MODE128 0 "nonimmediate_operand")
> +        (match_operand:MODE128 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})

Please instead add whatever needs to be done to the existing
mips_split_move_insn_p/mips_split_move_insn pair.

> +;; Offset load
> +(define_expand "msa_ld_<msafmt>"
> +  [(match_operand:IMODE128 0 "register_operand")
> +   (match_operand 1 "pmode_register_operand")
> +   (match_operand 2 "const_msa_addr_offset_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], INTVAL (operands[2]));
>[...]
> +  rtx addr = plus_constant (GET_MODE (operands[1]), operands[1], INTVAL (operands[2]));

Long lines.  Same for the stores.

I think these two can be combined into a single mode iterator.
Same for the other IMODE128/FMODE128 pairs.

> +  mips_emit_move (operands[0], gen_rtx_MEM (<MODE>mode, addr));
> +  DONE;
> +})

I'd have expected this to generate poor code in practice, since we have
no aliasing information.  Why not ask people to use normal C accesses
instead?

> +;; Integer operations
> +(define_insn "add<mode>3"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f,f")
> +	(plus:IMODE128 (match_operand:IMODE128 1 "register_operand" "f,f")
> +		       (match_operand:IMODE128 2 "reg_or_vector_uimm5_operand" "f,YC")))]
> +  "ISA_HAS_MSA"
> +  "@
> +   addv.<msafmt>\t%w0,%w1,%w2
> +   addvi.<msafmt>\t%w0,%w1,%D2"
> +  [(set_attr "alu_type"	"add")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "sub<mode>3"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f,f")
> +	(minus:IMODE128 (match_operand:IMODE128 1 "register_operand" "f,f")
> +		        (match_operand:IMODE128 2 "reg_or_vector_uimm5_operand" "f,YC")))]
> +  "ISA_HAS_MSA"
> +  "@
> +   subv.<msafmt>\t%w0,%w1,%w2
> +   subvi.<msafmt>\t%w0,%w1,%D2"
> +  [(set_attr "alu_type"	"sub")
> +   (set_attr "mode"	"TI")])

The constant cases should be handled by the addition pattern.
Subtraction should be register-only.

> +(define_insn "and<mode>3"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(and:IMODE128 (match_operand:IMODE128 1 "register_operand" "f")
> +		      (match_operand:IMODE128 2 "register_operand" "f")))]
> +  "ISA_HAS_MSA"
> +  "and.v\t%w0,%w1,%w2"
> +  [(set_attr "alu_type"	"and")
> +   (set_attr "mode"	"TI")])

Please separate out the V16QI case and add an alternative that uses
ANDI.V for immediate operands.

Please also add alternatives for BCLRI to all forms.

Same for ior<mode>3 with BSETI and xor<mode>3 with BNEGI.

> +(define_expand "one_cmpl<mode>2"
> +  [(match_operand:IMODE128 0 "register_operand" "")
> +   (match_operand:IMODE128 1 "register_operand" "")]
> +  "ISA_HAS_MSA"
> +{
> +  rtx reg = gen_reg_rtx (<MODE>mode);
> +  emit_insn (gen_msa_ldi<mode> (reg, GEN_INT (0)));
> +  emit_insn (gen_msa_nor_v_<msafmt> (operands[0], reg, operands[1]));
> +  DONE;
> +})

Can't this be done with a single instruction using NORI.B?

> +;; Built-in functions
> +(define_insn "msa_add_a_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_ADD_A))]
> +  "ISA_HAS_MSA"
> +  "add_a.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Not sure about the unspec, isn't this just:

  [(set (match_operand:IMODE128 0 "register_operand" "=f")
	(plus:IMODE128
          (abs:IMODE128 (match_operand:IMODE128 1 "register_operand" "f"))
          (abs:IMODE128 (match_operand:IMODE128 2 "register_operand" "f"))))]

> +(define_insn "msa_adds_a_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_ADDS_A))]
> +  "ISA_HAS_MSA"
> +  "adds_a.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Same here with (ss_plus (abs ...) (abs ..)).

> +(define_insn "msa_adds_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_ADDS_S))]
> +  "ISA_HAS_MSA"
> +  "adds_s.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

I think this is "ssaddM3" and should use (ss_plus ...).

> +(define_insn "msa_adds_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_ADDS_U))]
> +  "ISA_HAS_MSA"
> +  "adds_u.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Likewise "usaddM3"/(us_plus ...).

> +(define_insn "msa_addvi_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_uimm5_operand" "")]
> +			 UNSPEC_MSA_ADDVI))]
> +  "ISA_HAS_MSA"
> +  "addvi.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Please expand to a normal addition, duplicating the constant.

> +(define_insn "msa_andi_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> +		       (match_operand 2 "const_uimm8_operand" "")]
> +		      UNSPEC_MSA_ANDI_B))]
> +  "ISA_HAS_MSA"
> +  "andi.b\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

With the AND change above, this too could be an expand.

> +(define_insn "msa_bclri_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> +		       (match_operand 2 "const_uimm3_operand" "")]
> +		      UNSPEC_MSA_BCLRI))]
> +  "ISA_HAS_MSA"
> +  "bclri.b\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bclri_h"
> +  [(set (match_operand:V8HI 0 "register_operand" "=f")
> +	(unspec:V8HI [(match_operand:V8HI 1 "register_operand" "f")
> +		      (match_operand 2 "const_uimm4_operand" "")]
> +		     UNSPEC_MSA_BCLRI))]
> +  "ISA_HAS_MSA"
> +  "bclri.h\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bclri_w"
> +  [(set (match_operand:V4SI 0 "register_operand" "=f")
> +	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "f")
> +		      (match_operand 2 "const_uimm5_operand" "")]
> +		     UNSPEC_MSA_BCLRI))]
> +  "ISA_HAS_MSA"
> +  "bclri.w\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bclri_d"
> +  [(set (match_operand:V2DI 0 "register_operand" "=f")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "f")
> +		      (match_operand 2 "const_uimm6_operand" "")]
> +		     UNSPEC_MSA_BCLRI))]
> +  "ISA_HAS_MSA"
> +  "bclri.d\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

With the AND changes above, these could become expands too.
Please use mode iterators.  Same comments for the bneg* and bset* patterns.

> +(define_insn "msa_binsli_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand:V16QI 2 "register_operand" "f")
> +		       (match_operand 3 "const_uimm3_operand" "")]
> +		      UNSPEC_MSA_BINSLI))]
> +  "ISA_HAS_MSA"
> +  "binsli.b\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_binsli_h"
> +  [(set (match_operand:V8HI 0 "register_operand" "=f")
> +	(unspec:V8HI [(match_operand:V8HI 1 "register_operand" "0")
> +		      (match_operand:V8HI 2 "register_operand" "f")
> +		      (match_operand 3 "const_uimm4_operand" "")]
> +		     UNSPEC_MSA_BINSLI))]
> +  "ISA_HAS_MSA"
> +  "binsli.h\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_binsli_w"
> +  [(set (match_operand:V4SI 0 "register_operand" "=f")
> +	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
> +		      (match_operand:V4SI 2 "register_operand" "f")
> +		      (match_operand 3 "const_uimm5_operand" "")]
> +		     UNSPEC_MSA_BINSLI))]
> +  "ISA_HAS_MSA"
> +  "binsli.w\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_binsli_d"
> +  [(set (match_operand:V2DI 0 "register_operand" "=f")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
> +		      (match_operand:V2DI 2 "register_operand" "f")
> +		      (match_operand 3 "const_uimm6_operand" "")]
> +		     UNSPEC_MSA_BINSLI))]
> +  "ISA_HAS_MSA"
> +  "binsli.d\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Iterators here too please, and for msa_binsri_*.

> +(define_insn "msa_bmnz_v_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "0")
> +			  (match_operand:IMODE128 2 "register_operand" "f")
> +			  (match_operand:IMODE128 3 "register_operand" "f")]
> +			 UNSPEC_MSA_BMNZ_V))]
> +  "ISA_HAS_MSA"
> +  "bmnz.v\t%w0,%w2,%w3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bmnzi_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand:V16QI 2 "register_operand" "f")
> +		       (match_operand 3 "const_uimm8_operand" "")]
> +		      UNSPEC_MSA_BMNZI_B))]
> +  "ISA_HAS_MSA"
> +  "bmnzi.b\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bmz_v_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "0")
> +		          (match_operand:IMODE128 2 "register_operand" "f")
> +			  (match_operand:IMODE128 3 "register_operand" "f")]
> +			 UNSPEC_MSA_BMZ_V))]
> +  "ISA_HAS_MSA"
> +  "bmz.v\t%w0,%w2,%w3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_bmzi_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0")
> +		       (match_operand:V16QI 2 "register_operand" "f")
> +		       (match_operand 3 "const_uimm8_operand" "")]
> +		      UNSPEC_MSA_BMZI_B))]
> +  "ISA_HAS_MSA"
> +  "bmzi.b\t%w0,%w2,%3"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

These and BSEL* can be represented in normal rtl rather than unspecs.
I think BMZ, BMNZ and BSEL should be three alternatives in a combined
pattern, with the choice depending on which source gets tied to the
destination.

Please also add tests to make sure that combine generates these
forms from the individual &s, |s and ~s.

> +(define_insn "msa_ceq_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_CEQ))]
> +  "ISA_HAS_MSA"
> +  "ceq.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_ceqi_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_imm5_operand" "")]
> +			 UNSPEC_MSA_CEQI))]
> +  "ISA_HAS_MSA"
> +  "ceqi.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_cle_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_CLE_S))]
> +  "ISA_HAS_MSA"
> +  "cle_s.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_cle_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_CLE_U))]
> +  "ISA_HAS_MSA"
> +  "cle_u.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clei_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_imm5_operand" "")]
> +			 UNSPEC_MSA_CLEI_S))]
> +  "ISA_HAS_MSA"
> +  "clei_s.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clei_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_uimm5_operand" "")]
> +			 UNSPEC_MSA_CLEI_U))]
> +  "ISA_HAS_MSA"
> +  "clei_u.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clt_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_CLT_S))]
> +  "ISA_HAS_MSA"
> +  "clt_s.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clt_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_CLT_U))]
> +  "ISA_HAS_MSA"
> +  "clt_u.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clti_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_imm5_operand" "")]
> +			 UNSPEC_MSA_CLTI_S))]
> +  "ISA_HAS_MSA"
> +  "clti_s.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_clti_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_uimm5_operand" "")]
> +			 UNSPEC_MSA_CLTI_U))]
> +  "ISA_HAS_MSA"
> +  "clti_u.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

The nonimmediate forms could use rtl comparison codes directly,
except that I'm not sure how STORE_FLAG_VALUE applies to vectors.
(I'm pretty sure it doesn't, but I couldn't find definite proof.)

Even if we need the unspec, at least let's have a single unspec
that takes a (eq ...), (leu ...), etc. result as an argument.
We can then use code iterators.

Same comments for the floating-point comparisons.

I think the integer immediate and nonimmediate forms should be combined,
with alternatives to choose between them.

> +(define_insn "msa_ffint_s_w"
> +  [(set (match_operand:V4SF 0 "register_operand" "=f")
> +	(unspec:V4SF [(match_operand:V4SI 1 "register_operand" "f")]
> +		     UNSPEC_MSA_FFINT_S))]
> +  "ISA_HAS_MSA"
> +  "ffint_s.w\t%w0,%w1"
> +  [(set_attr "type"	"fcvt")
> +   (set_attr "cnv_mode"	"I2S")
> +   (set_attr "mode"	"SF")])
> +
> +(define_insn "msa_ffint_s_d"
> +  [(set (match_operand:V2DF 0 "register_operand" "=f")
> +	(unspec:V2DF [(match_operand:V2DI 1 "register_operand" "f")]
> +		     UNSPEC_MSA_FFINT_S))]
> +  "ISA_HAS_MSA"
> +  "ffint_s.d\t%w0,%w1"
> +  [(set_attr "type"	"fcvt")
> +   (set_attr "cnv_mode"	"I2D")
> +   (set_attr "mode"	"DF")])

These are "floatMN2" and (float ...).  Please use mode iterators.

Same for ffint_u and "floatunsMN2"/(unsigned_float ...).

> +(define_insn "msa_ffql_w"
> +  [(set (match_operand:V4SF 0 "register_operand" "=f")
> +	(unspec:V4SF [(match_operand:V8HI 1 "register_operand" "f")]
> +		     UNSPEC_MSA_FFQL))]
> +  "ISA_HAS_MSA"
> +  "ffql.w\t%w0,%w1"
> +  [(set_attr "type"	"fcvt")
> +   (set_attr "cnv_mode"	"I2S")
> +   (set_attr "mode"	"SF")])
> +
> +(define_insn "msa_ffql_d"
> +  [(set (match_operand:V2DF 0 "register_operand" "=f")
> +	(unspec:V2DF [(match_operand:V4SI 1 "register_operand" "f")]
> +		     UNSPEC_MSA_FFQL))]
> +  "ISA_HAS_MSA"
> +  "ffql.d\t%w0,%w1"
> +  [(set_attr "type"	"fcvt")
> +   (set_attr "cnv_mode"	"I2D")
> +   (set_attr "mode"	"DF")])

Mode iterators here too please, and anywhere else that we have
w/d pairs.

> +(define_insn "msa_fill_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:SI 1 "reg_or_0_operand" "dJ")]
> +		      UNSPEC_MSA_FILL))]
> +  "ISA_HAS_MSA"
> +  "fill.b\t%w0,%z1"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

With operand 1 changed to QImode, this is (vec_duplicate ...).
Please use mode iterators.

> +(define_insn "msa_fmax_<msafmt>"
> +  [(set (match_operand:FMODE128 0 "register_operand" "=f")
> +	(unspec:FMODE128 [(match_operand:FMODE128 1 "register_operand" "f")
> +			  (match_operand:FMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_FMAX))]
> +  "ISA_HAS_MSA"
> +  "fmax.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"fadd")
> +   (set_attr "mode"	"<UNITMODE>")])

This is "smaxM3".

> +(define_insn "msa_fmin_<msafmt>"
> +  [(set (match_operand:FMODE128 0 "register_operand" "=f")
> +	(unspec:FMODE128 [(match_operand:FMODE128 1 "register_operand" "f")
> +			  (match_operand:FMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_FMIN))]
> +  "ISA_HAS_MSA"
> +  "fmin.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"fadd")
> +   (set_attr "mode"	"<UNITMODE>")])

This is "sminM3".

> +(define_insn "msa_frint_<msafmt>"
> +  [(set (match_operand:FMODE128 0 "register_operand" "=f")
> +	(unspec:FMODE128 [(match_operand:FMODE128 1 "register_operand" "f")]
> +			 UNSPEC_MSA_FRINT))]
> +  "ISA_HAS_MSA"
> +  "frint.<msafmt>\t%w0,%w1"
> +  [(set_attr "type"	"fmul")
> +   (set_attr "mode"	"<UNITMODE>")])

This is "rintM2".

> +(define_insn "msa_max_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_MAX_S))]
> +  "ISA_HAS_MSA"
> +  "max_s.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_max_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_MAX_U))]
> +  "ISA_HAS_MSA"
> +  "max_u.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_maxi_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_imm5_operand" "")]
> +			 UNSPEC_MSA_MAXI_S))]
> +  "ISA_HAS_MSA"
> +  "maxi_s.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_maxi_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_uimm5_operand" "")]
> +			 UNSPEC_MSA_MAXI_U))]
> +  "ISA_HAS_MSA"
> +  "maxi_u.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Here too please treat the immediate and nonimmediate forms as two
alternatives in a combined pattern.  These are "smaxM3" and "umaxM3".

Same comments for min.

> +(define_insn "msa_nloc_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")]
> +			 UNSPEC_MSA_NLOC))]
> +  "ISA_HAS_MSA"
> +  "nloc.<msafmt>\t%w0,%w1"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Looks like we could use this to help implement clrsb, but that's
future work...

> +(define_insn "msa_nlzc_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")]
> +			 UNSPEC_MSA_NLZC))]
> +  "ISA_HAS_MSA"
> +  "nlzc.<msafmt>\t%w0,%w1"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

This is "clzM2"/(clz ...).

> +(define_insn "msa_ori_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> +		       (match_operand 2 "const_uimm8_operand" "")]
> +		      UNSPEC_MSA_ORI_B))]
> +  "ISA_HAS_MSA"
> +  "ori.b\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

These should expand to normal ORs.

> +(define_insn "msa_pcnt_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")]
> +			 UNSPEC_MSA_PCNT))]
> +  "ISA_HAS_MSA"
> +  "pcnt.<msafmt>\t%w0,%w1"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

"popcountM2"/(popcount ...).

> +(define_insn "msa_slli_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> +		       (match_operand 2 "const_uimm3_operand" "")]
> +		      UNSPEC_MSA_SLLI))]
> +  "ISA_HAS_MSA"
> +  "slli.b\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

This and the other simple shifts can expand to normal shifts.

> +(define_insn "msa_subs_s_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_SUBS_S))]
> +  "ISA_HAS_MSA"
> +  "subs_s.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

"ssubM3"/(ss_minus ...).

> +(define_insn "msa_subs_u_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand:IMODE128 2 "register_operand" "f")]
> +			 UNSPEC_MSA_SUBS_U))]
> +  "ISA_HAS_MSA"
> +  "subs_u.<msafmt>\t%w0,%w1,%w2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

"usubM3"/(us_minus ...).

> +(define_insn "msa_subvi_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "f")
> +			  (match_operand 2 "const_uimm5_operand" "")]
> +			 UNSPEC_MSA_SUBVI))]
> +  "ISA_HAS_MSA"
> +  "subvi.<msafmt>\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])
> +
> +(define_insn "msa_xori_b"
> +  [(set (match_operand:V16QI 0 "register_operand" "=f")
> +	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "f")
> +		       (match_operand 2 "const_uimm8_operand" "")]
> +		      UNSPEC_MSA_XORI_B))]
> +  "ISA_HAS_MSA"
> +  "xori.b\t%w0,%w1,%2"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

These should expand to normal operations.

> +(define_insn "msa_sld_<msafmt>"
> +  [(set (match_operand:IMODE128 0 "register_operand" "=f")
> +	(unspec:IMODE128 [(match_operand:IMODE128 1 "register_operand" "0")
> +			  (match_operand:IMODE128 2 "register_operand" "f")
> +			  (match_operand:SI 3 "reg_or_0_operand" "dJ")]
> +			 UNSPEC_MSA_SLD))]
> +  "ISA_HAS_MSA"
> +  "sld.<msafmt>\t%w0,%w2[%z3]"
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Things had been pretty much in alphabetical order until now, which was
pretty helpful.  Please reorder this and the following patterns with the
previous ones.

> +(define_insn "msa_bn_v_<msafmt>"
> +  [(set (match_operand:SI 0 "register_operand" "=d")
> +	(unspec:SI [(match_operand:IMODE128 1 "register_operand" "f")]
> +		   UNSPEC_MSA_BN_V))]
> +  "ISA_HAS_MSA"
> +  "%(bnz.v\t%w1,1f; li\t%0,1; li\t%0,0%)\n1:"
> +  [(set_attr "type"	"arith")
> +   (set_attr "length"	"12")
> +   (set_attr "mode"	"TI")])

Please expand these as branches at the rtl level.  That should allow
much better optimisation of the built-in function.

> +;; Note that this instruction treats scalar as vector registers freely.
> +(define_insn "msa_cast_to_vector_<msafmt>_f"
> +  [(set (match_operand:FMODE128 0 "register_operand" "=f")
> +	(unspec:FMODE128 [(match_operand:<ELEMMODE> 1 "register_operand" "f")]
> +			 UNSPEC_MSA_CAST_TO_VECTOR))]
> +  "ISA_HAS_MSA"
> +{
> +  if (REGNO (operands[0]) == REGNO (operands[1]))
> +    return "nop\t# Cast %1 to %w0";
> +  else
> +    return "mov.<msafmt2>\t%w0,%1\t# Cast %1 to %w0";
> +}
> +  [(set_attr "type"	"arith")
> +   (set_attr "mode"	"TI")])

Why not just expand to a subreg move?

> @@ -419,11 +423,17 @@
>    /* The number of words passed in registers, rounded up.  */
>    unsigned int reg_words;
>  
> -  /* For EABI, the offset of the first register from GP_ARG_FIRST or
> -     FP_ARG_FIRST.  For other ABIs, the offset of the first register from
> -     the start of the ABI's argument structure (see the CUMULATIVE_ARGS
> -     comment for details).
> +  /* If msa_reg_p is true, the offset of the first register from
> +     MSA_ARG_FIRST.
>  
> +     The value is MAX_ARGS_IN_MSA_REGISTERS if the argument is passed entirely
> +     on the stack.
> +
> +     If msa_reg_p is false, for EABI, the offset of the first register from
> +     GP_ARG_FIRST or FP_ARG_FIRST.  For other ABIs, the offset of the first
> +     register from the start of the ABI's argument structure
> +     (see the CUMULATIVE_ARGS comment for details).
> +
>       The value is MAX_ARGS_IN_REGISTERS if the argument is passed entirely
>       on the stack.  */
>    unsigned int reg_offset;

So the MSA arguments operate outside the normal "argument structure"
for o32, n32 and n64?  That's OK with me but please write up what the
new ABI is.

> +/* Return true if OP is a constant vector with the number of units in MODE,
> +   and each unit has the same integer value in the range fo LOW and HIGH.  */

s/fo LOW and HIGH/[LOW, HIGH]/ (to emphasise that this is an inclusive range).

> +bool
> +mips_const_vector_same_int_p (rtx op, enum machine_mode mode, HOST_WIDE_INT low, HOST_WIDE_INT high)

Long line.

> +{
> +  HOST_WIDE_INT value, prev_value;
> +  int i, nunits = GET_MODE_NUNITS (mode);
> +  if (GET_CODE (op) != CONST_VECTOR || CONST_VECTOR_NUNITS (op) != nunits)
> +    return false;
> +
> +  for (i = 0; i < nunits; i++)
> +    {
> +      rtx x = CONST_VECTOR_ELT (op, i);
> +      if (!CONST_INT_P (x))
> +	return false;
> +
> +      value = INTVAL (x);
> +
> +      if ((i != 0 && value != prev_value) || value < low || value > high)
> +	return false;
> +
> +      prev_value = value;
> +    }

There must always be 1 element, so it seems easier to handle element 0
outside the loop.

> @@ -2041,6 +2079,11 @@
>  static int
>  mips_symbol_insns (enum mips_symbol_type type, enum machine_mode mode)
>  {
> +  /* MSA ld.* and st.* cannot support loading symbols via an immediate
> +     operand.  */

Nit, sorry, but LD.df and ST.df, for consistency with the manual.
Same everywhere.

> @@ -2182,6 +2225,17 @@
>        && !SMALL_OPERAND (INTVAL (x) + GET_MODE_SIZE (mode) - UNITS_PER_WORD))
>      return false;
>  
> +  if (TARGET_MSA && MSA_SUPPORTED_MODE_P (mode))
> +    {
> +      int offset = INTVAL (x);
> +      /* MSA ld.* and st.* supports 5-bit signed offsets left-shifted
> +	 by UNITS_PER_MSA_REG.  */
> +      if (offset < -16 * UNITS_PER_MSA_REG
> +	  || offset > 15 * UNITS_PER_MSA_REG
> +	  || (offset & (UNITS_PER_MSA_REG - 1)) != 0)
> +	return false;

mips_signed_immediate_p (INTVAL (x), 5, ...).  I guess you'll need
a new macro for the "...".

Is this from a previous version of the spec though?  The one I'm
looking at has a 10-bit offset.

> +	if (msa_p)
> +	  {
> +	    int offset = INTVAL (addr.offset);
> +	    /* MSA ld.* and st.* supports 5-bit signed offsets left-shifted
> +	       by UNITS_PER_MSA_REG.  */
> +	    if (offset >= -16 * UNITS_PER_MSA_REG
> +		&& offset <= 15 * UNITS_PER_MSA_REG
> +		&& (offset & (UNITS_PER_MSA_REG - 1)) == 0)
> +	      return 1;
> +	    else
> +	      return 0;
> +	  }

This should be:

     if (msa_p && !mips_signed_immediate_p (...))
       return 0;
     ...

> @@ -2395,13 +2468,13 @@
>  	return factor;
>  
>        case ADDRESS_LO_SUM:
> -	return TARGET_MIPS16 ? factor * 2 : factor;
> +	return msa_p ? 0 : (TARGET_MIPS16 ? factor * 2 : factor);

No brackets needed here.

> @@ -2506,8 +2579,13 @@
>  
>        return mips_build_integer (codes, INTVAL (x));
>  
> +    case CONST_VECTOR:
> +      if (0 && TARGET_MSA
> +	  && mips_const_vector_same_int_p (x, GET_MODE (x), -512, 511))
> +	return 1;
> +      /* Fallthrough */

Why the "0 &&" ?

> +/* Split copy_s.d.  */
> +
> +void
> +mips_split_msa_copy_d (rtx dest, rtx src, rtx index, bool unsigned_p)
> +{
> +  gcc_assert ((GET_MODE (src) == V2DImode && GET_MODE (dest) == DImode)
> +	      || (GET_MODE (src) == V2DFmode && GET_MODE (dest) == DFmode));
> +
> +  /* Note that low is always from the lower index, and high is always
> +     from the higher index.  */
> +  rtx low = mips_subword (dest, 0);
> +  rtx high = mips_subword (dest, 1);
> +  rtx new_src = simplify_gen_subreg (V4SImode, src, GET_MODE (src), 0);
> +  if (unsigned_p)
> +    {
> +      emit_insn (gen_msa_copy_u_w (low, new_src, GEN_INT (INTVAL (index) * 2)));
> +      emit_insn (gen_msa_copy_u_w (high, new_src,
> +				   GEN_INT (INTVAL (index) * 2 + 1)));
> +    }
> +  else
> +    {
> +      emit_insn (gen_msa_copy_s_w (low, new_src, GEN_INT (INTVAL (index) * 2)));
> +      emit_insn (gen_msa_copy_s_w (high, new_src,
> +				   GEN_INT (INTVAL (index) * 2 + 1)));
> +    }
> +}

Please pass in the generator function instead of an unsigned_p.

> +
> +/* Split insert.d.  */
> +
> +void
> +mips_split_msa_insert_d (rtx dest, rtx src1, rtx index, rtx src2)
> +{
> +  gcc_assert (GET_MODE (dest) == GET_MODE (src1));
> +  gcc_assert ((GET_MODE (dest) == V2DImode
> +	       && (GET_MODE (src2) == DImode
> +		   || src2 == CONST0_RTX (GET_MODE (src2))))
> +	      || (GET_MODE (dest) == V2DFmode && GET_MODE (src2) == DFmode));

s/src2 == CONST0_RTX (GET_MODE (src2))/src2 == const0_rtx/,
especially since I assume GET_MODE (src2) will be VOIDmode in this case.

Same for later functions.

> +  /* Note that low is always from the lower index, and high is always
> +     from the higher index.  */
> +  rtx low, high;
> +  if (src2 == CONST0_RTX (GET_MODE (src2)))
> +    {
> +      low = src2;
> +      high = src2;
> +    }
> +  else
> +    {
> +      low = mips_subword (src2, 0);
> +      high = mips_subword (src2, 1);
> +    }

Why the special treatment of 0?  Please use false/true for the
mips_subword.

Same for later functions.

> @@ -4417,8 +4803,25 @@
>  	    }
>  
>  	  if (FP_REG_P (REGNO (dest)))
> -	    return dbl_p ? "dmtc1\t%z1,%0" : "mtc1\t%z1,%0";
> +	    {
> +	      if (msa_p)
> +		gcc_assert (src == CONST0_RTX (GET_MODE (src)));
>  
> +	      switch (GET_MODE (dest))
> +		{
> +		case TImode: return "ldi.b\t%w0,0";
> +		case V16QImode: return "ldi.b\t%w0,0";
> +		case V8HImode: return "ldi.h\t%w0,0";
> +		case V4SImode: return "ldi.w\t%w0,0";
> +		case V2DImode: return "ldi.d\t%w0,0";
> +		case V4SFmode: return "ldi.w\t%w0,0";
> +		case V2DFmode: return "ldi.d\t%w0,0";
> +		default: break;
> +		}
> +
> +	      return dbl_p ? "dmtc1\t%z1,%0" : "mtc1\t%z1,%0";
> +	    }
> +

I think the switch should be guarded by msa_p too and should
have a gcc_unreachable () in the default case.

Same for other switches in this function.

>  	  if (FP_REG_P (REGNO (src)))
> -	    return dbl_p ? "dmfc1\t%0,%1" : "mfc1\t%0,%1";
> +	    {
> +	      gcc_assert (!msa_p);
>  
> +	      return dbl_p ? "dmfc1\t%0,%1" : "mfc1\t%0,%1";
> +	    }

No need for a blank line here.

>        if (dest_code == MEM)
> -	return dbl_p ? "sdc1\t%1,%0" : "swc1\t%1,%0";
> +	{
> +	  switch (mode)
> +	    {
> +	    case TImode: /* Just use st.d/st.w to store.  */
> +	      return TARGET_64BIT ? "st.d\t%w1,%0" : "st.w\t%w1,%0";
> +	    case V16QImode: return "st.b\t%w1,%0";
> +	    case V8HImode: return "st.h\t%w1,%0";
> +	    case V4SImode: return "st.w\t%w1,%0";
> +	    case V2DImode: return "st.d\t%w1,%0";
> +	    case V4SFmode: return "st.w\t%w1,%0";
> +	    case V2DFmode: return "st.d\t%w1,%0";
> +	    default: break;
> +	    }
> +
> +	  return dbl_p ? "sdc1\t%1,%0" : "swc1\t%1,%0";
> +	}

Why the different handling for TARGET_64BIT?  It deserves a comment
at least.  Same for the loads, but...

> +	    case TImode: /* Juld use ld.d/ld.w to ldore.  */

...typo: Just.

> @@ -5002,7 +5453,8 @@
>      case ABI_EABI:
>        /* The EABI conventions have traditionally been defined in terms
>  	 of TYPE_MODE, regardless of the actual type.  */
> -      info->fpr_p = ((GET_MODE_CLASS (mode) == MODE_FLOAT
> +      info->fpr_p = (!info->msa_reg_p
> +		     && (GET_MODE_CLASS (mode) == MODE_FLOAT
>  		      || mode == V2SFmode)
>  		     && GET_MODE_SIZE (mode) <= UNITS_PER_FPVALUE);
>        break;
> @@ -5012,7 +5464,8 @@
>        /* Only leading floating-point scalars are passed in
>  	 floating-point registers.  We also handle vector floats the same
>  	 say, which is OK because they are not covered by the standard ABI.  */
> -      info->fpr_p = (!cum->gp_reg_found
> +      info->fpr_p = (!info->msa_reg_p
> +		     && !cum->gp_reg_found
>  		     && cum->arg_number < 2
>  		     && (type == 0
>  			 || SCALAR_FLOAT_TYPE_P (type)
> @@ -5027,7 +5480,8 @@
>        /* Scalar, complex and vector floating-point types are passed in
>  	 floating-point registers, as long as this is a named rather
>  	 than a variable argument.  */
> -      info->fpr_p = (named
> +      info->fpr_p = (!info->msa_reg_p
> +		     && named
>  		     && (type == 0 || FLOAT_TYPE_P (type))
>  		     && (GET_MODE_CLASS (mode) == MODE_FLOAT
>  			 || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT

Are the fpr_p and msa_reg_p cases naturally mutually-exclusive?
If not, what cases overlap?  I'd prefer that we simply assert
!isa->msa_reg_p || !info->fpr_p at the end if we can.

> @@ -5067,7 +5521,36 @@
>    /* See whether the argument has doubleword alignment.  */
>    doubleword_aligned_p = (mips_function_arg_boundary (mode, type)
>  			  > BITS_PER_WORD);
> +  /* See whether the argument has quadword alignment.  */
> +  quadword_aligned_p = (mips_function_arg_boundary (mode, type)
> +			> 2 * BITS_PER_WORD);

Please fold these two into a single variable.

> @@ -5255,6 +5758,24 @@
>  
>    mips_get_arg_info (&info, cum, mode, type, named);
>  
> +  if (info.msa_reg_p)
> +    {
> +      if (info.reg_words > 0)
> +	{
> +	  gcc_assert (info.stack_words == 0);
> +	  cum->num_msa_regs++;
> +	}
> +
> +      /* Advance the stack word count.  */
> +      if (info.stack_words > 0)
> +	{
> +	  gcc_assert (info.reg_words == 0);
> +	  cum->stack_words = info.stack_offset + info.stack_words;
> +	}
> +
> +      return;	/* Just return.  */
> +    }

Please don't duplicate the stack_words handling like this.
Just reorganise the function so that the msa handling fits
more naturally.

> @@ -6445,6 +6979,233 @@
>      }
>  }
>  
> +/* Write a stub for the current function.  This stub is used
> +   for functions from FR0 to FR1 or from FR1 to FR0.
> +   If FR0_TO_FR1_P is true, we need to set the FR mode to 1,
> +   set up floating-point arguments, align the stack pointer to 16 bytes,
> +   copy the arguments on the stack, call the function, set up the returned
> +   floating-point value, and restore the FR mode to 0.
> +   If FR0_TO_FR1_P is false, the stub will be similar but we won't need to
> +   align the stack pointer to 16 bytes and copy the arguments.  */

Hmm, what's all this about?

> @@ -8216,6 +9006,13 @@
>  	output_operand_lossage ("invalid use of '%%%c'", letter);
>        break;
>  
> +    case 'w':
> +      if (code == REG && MSA_REG_P (REGNO (op)))
> +	fprintf (file, "$w%s", &reg_names[REGNO (op)][2]);
> +      else
> +	output_operand_lossage ("invalid use of '%%%c'", letter);
> +      break;

Perhaps instead we should do this automatically if the register
is wider than 64 bits, and drop all the %w prefixes?

> @@ -11740,13 +12577,27 @@
>    if (hard_reg_set_intersect_p (left, reg_class_contents[(int) ST_REGS]))
>      {
>        if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
> -	size = MIN (size, 4);
> +	{
> +	  /* For MSA supported modes, size is UNITS_PER_MSA_REG.  */
> +	  if (TARGET_MSA && MSA_SUPPORTED_MODE_P (mode))
> +	    size = MIN (size, UNITS_PER_MSA_REG);
> +	  else
> +	    size = MIN (size, UNITS_PER_FPREG);

Comment is superfluous.  Same later.

> @@ -11818,6 +12674,15 @@
>      case V2SFmode:
>        return TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT;
>  
> +    case TImode:
> +    case V2DFmode:
> +    case V4SFmode:
> +    case V2DImode:
> +    case V4SImode:
> +    case V8HImode:
> +    case V16QImode:
> +      return TARGET_MSA;
> +
>      default:
>        return false;
>      }

Seems better to handle this as a MSA_SUPPORTED_MODE_P test in the
default case.

Same for other cases where this occurs.

> @@ -11999,9 +12880,15 @@
>  static int
>  mips_memory_move_cost (enum machine_mode mode, reg_class_t rclass, bool in)
>  {
> -  return (mips_cost->memory_latency
> +  int multiplier = 1;
> +  /* We prefer FP_REGS for MSA supported vector modes, via setting larger
> +     multplier to other register classes.  */
> +  if (TARGET_MSA && MSA_SUPPORTED_MODE_P (mode) && rclass != FP_REGS)
> +    multiplier = (TARGET_64BIT ? 2 : 4);

The comment makes it sound like more of a hack than it is.  Maybe:

  /* Account for the number of loads and stores that are needed to handle
     MSA types in GPRs.  */

And please use:

  multiplier = GET_MODE_SIZE (mode) / UNITS_PER_WORD;

which is more future-proof if a 256-bit extension comes along.

> +  return (mips_cost->memory_latency * multiplier
>  	  + memory_move_secondary_cost (mode, rclass, in));

I think in principle the multiplier applies to both, although in
practice it shouldn't matter, since you disallow the cases where
a secondary reload is needed.

> @@ -12930,11 +13829,43 @@
>  	    }
>  	}
>        else
> +      {
> +	output_asm_insn ("%(bne\t%2,%.,1f", operands);
> +	output_asm_insn (s, operands);
> +	s = "break\t7%)\n1:";
> +      }
> +    }
> +  return s;

Looks like some unwanted reformatting here.

> +	case V16QImode:
> +	  output_asm_insn ("%(bnz.b\t%w2,1f", operands);
> +	  break;
> +	case V8HImode:
> +	  output_asm_insn ("%(bnz.h\t%w2,1f", operands);
> +	  break;
> +	case V4SImode:
> +	  output_asm_insn ("%(bnz.w\t%w2,1f", operands);
> +	  break;
> +	case V2DImode:
> +	  output_asm_insn ("%(bnz.d\t%w2,1f", operands);
> +	  break;

We could handle this more elegantly by using an asm modifier
to print "b", "h", "w" or "d" based on the mode.

> @@ -13641,11 +14572,18 @@
>  AVAIL_NON_MIPS16 (mips3d, TARGET_MIPS3D)
>  AVAIL_NON_MIPS16 (dsp, TARGET_DSP)
>  AVAIL_NON_MIPS16 (dspr2, TARGET_DSPR2)
> +#if 0
> +AVAIL_NON_MIPS16 (dspr3, TARGET_DSPR3)
> +#endif
>  AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
>  AVAIL_NON_MIPS16 (dsp_64, TARGET_64BIT && TARGET_DSP)
>  AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
> +#if 0
> +AVAIL_NON_MIPS16 (dspr3_32, !TARGET_64BIT && TARGET_DSPR3)
> +#endif
>  AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS)
>  AVAIL_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)

As per Joseph's comment, I'm suspicious of these #if 0s.

> @@ -17055,6 +18647,11 @@
>  	   TARGET_MIPS3D ? "-mips3d" : "-mpaired-single",
>  	   TARGET_HARD_FLOAT_ABI ? "-mfp64" : "-mhard-float");
>  
> +  /* Make sure that when TARGET_MSA is true, TARGET_FLOAT64 and
> +     TARGET_HARD_FLOAT_ABI and  both true.  */
> +  if (TARGET_MSA && !(TARGET_FLOAT64 && TARGET_HARD_FLOAT_ABI))
> +    error ("-mmsa must be used with -mfp64 and -mhard-float");
> +

Use %< %> to quote the options.

> +  /* If TARGET_DSPR3, enable MASK_DSPR2 and MASK_DSP.  */
> +  if (TARGET_DSPR3)
> +    {
> +      target_flags |= MASK_DSPR2;
> +      target_flags |= MASK_DSP;
> +    }

Is this really MSA-related?

> @@ -17173,8 +18782,10 @@
>  
>    if (TARGET_SYNCI && !ISA_HAS_SYNCI)
>      {
> +#if 0
>        warning (0, "the %qs architecture does not support the synci "
>  	       "instruction", mips_arch_info->name);
> +#endif
>        target_flags &= ~MASK_SYNCI;
>      }

No.

> @@ -18679,6 +20378,347 @@
>    emit_insn (gen_rtx_SET (VOIDmode, target, x));
>  }
>  
> +static void
> +mips_expand_msa_one_cmpl (rtx dest, rtx src)
> +{
> +  enum machine_mode mode = GET_MODE (dest);
> +  rtx reg = gen_reg_rtx (mode);
> +  switch (mode)
> +    {
> +    case V16QImode:
> +      emit_insn (gen_msa_ldiv16qi (reg, GEN_INT (0)));
> +      emit_insn (gen_msa_nor_v_b (dest, reg, src));
> +      break;
> +    case V8HImode:
> +      emit_insn (gen_msa_ldiv8hi (reg, GEN_INT (0)));
> +      emit_insn (gen_msa_nor_v_h (dest, reg, src));
> +      break;
> +    case V4SImode:
> +      emit_insn (gen_msa_ldiv4si (reg, GEN_INT (0)));
> +      emit_insn (gen_msa_nor_v_w (dest, reg, src));
> +      break;
> +    case V2DImode:
> +      emit_insn (gen_msa_ldiv2di (reg, GEN_INT (0)));
> +      emit_insn (gen_msa_nor_v_d (dest, reg, src));
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +
> +static void
> +mips_expand_msa_cmp (rtx dest, enum rtx_code cond, rtx op0, rtx op1)
> +{
> +  enum machine_mode cmp_mode = GET_MODE (op0);
> +
> +  switch (cmp_mode)
> +    {
> +    case V16QImode:
> +      switch (cond)
> +	{
> +	case EQ:
> +	  emit_insn (gen_msa_ceq_b (dest, op0, op1));
> +	  break;
> +	case LT:
> +	  emit_insn (gen_msa_clt_s_b (dest, op0, op1));
> +	  break;
> +	case LE:
> +	  emit_insn (gen_msa_cle_s_b (dest, op0, op1));
> +	  break;
> +	case LTU:
> +	  emit_insn (gen_msa_clt_u_b (dest, op0, op1));
> +	  break;
> +	case LEU:
> +	  emit_insn (gen_msa_cle_u_b (dest, op0, op1));
> +	  break;
> +	case GE: // swap
> +	  emit_insn (gen_msa_clt_s_b (dest, op1, op0));
> +	  break;
> +	case GT: // swap
> +	  emit_insn (gen_msa_cle_s_b (dest, op1, op0));
> +	  break;
> +	case GEU: // swap
> +	  emit_insn (gen_msa_clt_u_b (dest, op1, op0));
> +	  break;
> +	case GTU: // swap
> +	  emit_insn (gen_msa_cle_u_b (dest, op1, op0));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    case V8HImode:
> +      switch (cond)
> +	{
> +	case EQ:
> +	  emit_insn (gen_msa_ceq_h (dest, op0, op1));
> +	  break;
> +	case LT:
> +	  emit_insn (gen_msa_clt_s_h (dest, op0, op1));
> +	  break;
> +	case LE:
> +	  emit_insn (gen_msa_cle_s_h (dest, op0, op1));
> +	  break;
> +	case LTU:
> +	  emit_insn (gen_msa_clt_u_h (dest, op0, op1));
> +	  break;
> +	case LEU:
> +	  emit_insn (gen_msa_cle_u_h (dest, op0, op1));
> +	  break;
> +	case GE: // swap
> +	  emit_insn (gen_msa_clt_s_h (dest, op1, op0));
> +	  break;
> +	case GT: // swap
> +	  emit_insn (gen_msa_cle_s_h (dest, op1, op0));
> +	  break;
> +	case GEU: // swap
> +	  emit_insn (gen_msa_clt_u_h (dest, op1, op0));
> +	  break;
> +	case GTU: // swap
> +	  emit_insn (gen_msa_cle_u_h (dest, op1, op0));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    case V4SImode:
> +      switch (cond)
> +	{
> +	case EQ:
> +	  emit_insn (gen_msa_ceq_w (dest, op0, op1));
> +	  break;
> +	case LT:
> +	  emit_insn (gen_msa_clt_s_w (dest, op0, op1));
> +	  break;
> +	case LE:
> +	  emit_insn (gen_msa_cle_s_w (dest, op0, op1));
> +	  break;
> +	case LTU:
> +	  emit_insn (gen_msa_clt_u_w (dest, op0, op1));
> +	  break;
> +	case LEU:
> +	  emit_insn (gen_msa_cle_u_w (dest, op0, op1));
> +	  break;
> +	case GE: // swap
> +	  emit_insn (gen_msa_clt_s_w (dest, op1, op0));
> +	  break;
> +	case GT: // swap
> +	  emit_insn (gen_msa_cle_s_w (dest, op1, op0));
> +	  break;
> +	case GEU: // swap
> +	  emit_insn (gen_msa_clt_u_w (dest, op1, op0));
> +	  break;
> +	case GTU: // swap
> +	  emit_insn (gen_msa_cle_u_w (dest, op1, op0));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    case V2DImode:
> +      switch (cond)
> +	{
> +	case EQ:
> +	  emit_insn (gen_msa_ceq_d (dest, op0, op1));
> +	  break;
> +	case LT:
> +	  emit_insn (gen_msa_clt_s_d (dest, op0, op1));
> +	  break;
> +	case LE:
> +	  emit_insn (gen_msa_cle_s_d (dest, op0, op1));
> +	  break;
> +	case LTU:
> +	  emit_insn (gen_msa_clt_u_d (dest, op0, op1));
> +	  break;
> +	case LEU:
> +	  emit_insn (gen_msa_cle_u_d (dest, op0, op1));
> +	  break;
> +	case GE: // swap
> +	  emit_insn (gen_msa_clt_s_d (dest, op1, op0));
> +	  break;
> +	case GT: // swap
> +	  emit_insn (gen_msa_cle_s_d (dest, op1, op0));
> +	  break;
> +	case GEU: // swap
> +	  emit_insn (gen_msa_clt_u_d (dest, op1, op0));
> +	  break;
> +	case GTU: // swap
> +	  emit_insn (gen_msa_cle_u_d (dest, op1, op0));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    case V4SFmode:
> +      switch (cond)
> +	{
> +	case UNORDERED:
> +	  emit_insn (gen_msa_fcun_w (dest, op0, op1));
> +	  break;
> +	case EQ:
> +	  emit_insn (gen_msa_fceq_w (dest, op0, op1));
> +	  break;
> +	case LTGT:
> +	  emit_insn (gen_msa_fcne_w (dest, op0, op1));
> +	  break;
> +	case GT: // use slt, swap op0 and op1
> +	  emit_insn (gen_msa_fslt_w (dest, op1, op0));
> +	  break;
> +	case GE: // use sle, swap op0 and op1
> +	  emit_insn (gen_msa_fsle_w (dest, op1, op0));
> +	  break;
> +	case LT: // use slt
> +	  emit_insn (gen_msa_fslt_w (dest, op0, op1));
> +	  break;
> +	case LE: // use sle
> +	  emit_insn (gen_msa_fsle_w (dest, op0, op1));
> +	  break;
> +	case UNGE: // use cule, swap op0 and op1
> +	  emit_insn (gen_msa_fcule_w (dest, op1, op0));
> +	  break;
> +	case UNGT: // use cult, swap op0 and op1
> +	  emit_insn (gen_msa_fcult_w (dest, op1, op0));
> +	  break;
> +	case UNLE:
> +	  emit_insn (gen_msa_fcule_w (dest, op0, op1));
> +	  break;
> +	case UNLT:
> +	  emit_insn (gen_msa_fcult_w (dest, op0, op1));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    case V2DFmode:
> +      switch (cond)
> +	{
> +	case UNORDERED:
> +	  emit_insn (gen_msa_fcun_d (dest, op0, op1));
> +	  break;
> +	case EQ:
> +	  emit_insn (gen_msa_fceq_d (dest, op0, op1));
> +	  break;
> +	case LTGT:
> +	  emit_insn (gen_msa_fcne_d (dest, op0, op1));
> +	  break;
> +	case GT: // use slt, swap op0 and op1
> +	  emit_insn (gen_msa_fslt_d (dest, op1, op0));
> +	  break;
> +	case GE: // use sle, swap op0 and op1
> +	  emit_insn (gen_msa_fsle_d (dest, op1, op0));
> +	  break;
> +	case LT: // use slt
> +	  emit_insn (gen_msa_fslt_d (dest, op0, op1));
> +	  break;
> +	case LE: // use sle
> +	  emit_insn (gen_msa_fsle_d (dest, op0, op1));
> +	  break;
> +	case UNGE: // use cule, swap op0 and op1
> +	  emit_insn (gen_msa_fcule_d (dest, op1, op0));
> +	  break;
> +	case UNGT: // use cult, swap op0 and op1
> +	  emit_insn (gen_msa_fcult_d (dest, op1, op0));
> +	  break;
> +	case UNLE:
> +	  emit_insn (gen_msa_fcule_d (dest, op0, op1));
> +	  break;
> +	case UNLT:
> +	  emit_insn (gen_msa_fcult_d (dest, op0, op1));
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +      break;
> +
> +    default:
> +      gcc_unreachable ();
> +      break;
> +    }
> +}

Bleah.  I think this is one where it's better to generate the rtx directly.
Especially if we go for the change suggested above, and put the comparison
code inside the unspec.

> +/* Implement HARD_REGNO_CALLER_SAVE_MODE.  */
> +
> +enum machine_mode
> +mips_hard_regno_caller_save_mode (unsigned int regno,
> +				  unsigned int nregs,
> +				  enum machine_mode mode)
> +{
> +  /* For performance, to avoid saving/restoring upper parts of a register,
> +     we return MODE as save mode when MODE is not VOIDmode.  */
> +  if (mode == VOIDmode)
> +    return choose_hard_reg_mode (regno, nregs, false);
> +  else
> +    return mode;
> +}

I think Chao-ying suggested this on gcc@ a while ago.  Please split it
out as a separate patch with Chao-ying's original non-MSA testcase.
Maybe:

  /* Narrower loads and stores often perform as well as or better than
     wider loads and stores and narrower spills require less stack space.
     Prefer values to be spilt in their natural mode where possible.  */

The review for the split-out patch should be a formality.

> Index: gcc/config/mips/mips.h
> ===================================================================
> --- gcc/config/mips/mips.h	(revision 205118)
> +++ gcc/config/mips/mips.h	(working copy)
> @@ -406,15 +406,27 @@
>        if (TARGET_DSP)							\
>  	{								\
>  	  builtin_define ("__mips_dsp");				\
> -	  if (TARGET_DSPR2)						\
> +	  if (TARGET_DSPR3)						\
>  	    {								\
> +	      builtin_define ("__mips_dspr3");				\
>  	      builtin_define ("__mips_dspr2");				\
> +	      builtin_define ("__mips_dsp_rev=3");			\
> +	    }								\
> +	  else if (TARGET_DSPR2)					\
> +	    {								\
> +	      builtin_define ("__mips_dspr2");				\
>  	      builtin_define ("__mips_dsp_rev=2");			\
>  	    }								\

More dspr3 stuff.

> @@ -1037,6 +1049,12 @@
>  /* Revision 2 of the DSP ASE is available.  */
>  #define ISA_HAS_DSPR2		(TARGET_DSPR2 && !TARGET_MIPS16)
>  
> +/* Revision 3 of the DSP ASE is available.  */
> +#define ISA_HAS_DSPR3		(TARGET_DSPR3 && !TARGET_MIPS16)

Here too.

> @@ -1163,11 +1181,12 @@
>  %{mdmx} %{mno-mdmx:-no-mdmx} \
>  %{mdsp} %{mno-dsp} \
>  %{mdspr2} %{mno-dspr2} \
> +%{mdspr3} %{mno-dspr3} \

And here.

>  %{msmartmips} %{mno-smartmips} \
>  %{mmt} %{mno-mt} \
> -%{mfix-rm7000} %{mno-fix-rm7000} \
>  %{mfix-vr4120} %{mfix-vr4130} \
>  %{mfix-24k} \
>  %{noasmopt:-O0; O0|fno-delayed-branch:-O1; O*:-O2; :-O1} \

No, looks like a merge-o.

> @@ -1410,8 +1436,11 @@
>  /* 8 is observed right on a DECstation and on riscos 4.02.  */
>  #define STRUCTURE_SIZE_BOUNDARY 8
>  
> -/* There is no point aligning anything to a rounder boundary than this.  */
> -#define BIGGEST_ALIGNMENT LONG_DOUBLE_TYPE_SIZE
> +/* There is no point aligning anything to a rounder boundary than
> +   LONG_DOUBLE_TYPE_SIZE, unless under MSA the bigggest alignment is
> +   BITS_PER_MSA_REG.  */
> +#define BIGGEST_ALIGNMENT \
> +  (TARGET_MSA ? BITS_PER_MSA_REG : LONG_DOUBLE_TYPE_SIZE)

I'm not sure about the ABI implications of this -- will have to think
about it.  Please integrate MSA into the comment a bit more rather than
tacking it onto the end.

> @@ -1649,6 +1678,10 @@
>  #define MD_REG_NUM   (MD_REG_LAST - MD_REG_FIRST + 1)
>  #define MD_DBX_FIRST (FP_DBX_FIRST + FP_REG_NUM)
>  
> +#define MSA_REG_FIRST 32
> +#define MSA_REG_LAST  63
> +#define MSA_REG_NUM   (MSA_REG_LAST - MSA_REG_FIRST + 1)

Please forward directly to the FP definitions.  Same for the other
MSA register macros.

> @@ -2293,8 +2372,8 @@
>  /* Treat LOC as a byte offset from the stack pointer and round it up
>     to the next fully-aligned offset.  */
>  #define MIPS_STACK_ALIGN(LOC) \
> -  (TARGET_NEWABI ? ((LOC) + 15) & -16 : ((LOC) + 7) & -8)
> -
> +  ((TARGET_NEWABI || TARGET_FLOAT64) ? \
> +   ((LOC) + 15) & -16 : ((LOC) + 7) & -8)

What's this about -- again seems dangerous ABIwise.

> @@ -848,7 +874,7 @@
>  
>  ;; This attribute gives the upper-case mode name for one unit of a
>  ;; floating-point mode.
> -(define_mode_attr UNITMODE [(SF "SF") (DF "DF") (V2SF "SF")])
> +(define_mode_attr UNITMODE [(SF "SF") (DF "DF") (V2SF "SF") (V4SF "SF") (V2DF "DF")])

Long line.

> Index: gcc/config/mips/mips.opt
> ===================================================================
> --- gcc/config/mips/mips.opt	(revision 205118)
> +++ gcc/config/mips/mips.opt	(working copy)
> @@ -123,6 +123,10 @@
>  Target Report Mask(DSPR2)
>  Use MIPS-DSP REV 2 instructions
>  
> ++mdspr3
> +Target Report Mask(DSPR3)
> +Use MIPS-DSP REV 3 instructions
> +
>  mdebug
>  Target Var(TARGET_DEBUG_MODE) Undocumented
>  

More dspr3 stuff.

> +(define_predicate "const_msa_addr_offset_operand"
> +  (and (match_code "const_int")
> +       (match_test "(INTVAL (op) & 15) == 0")
> +       (match_test "IN_RANGE (INTVAL (op), -256, 255)")))

Same question as above about the actual offset range.  Assuming it
really is a signed 10-bit number, "sq10_operand" would be more
consistent with the existing ones.  Please use mips_signed_immediate_
here too.

So in general:

- Please use mode iterators so that all formats are handled by a single
  pattern.  This includes folding the foo and foo_f distinctions.

- Please double-check the instructions against the named patterns in md.texi
  to see whether they are something that could be exploited by the vectoriser.
  I've tried to catch some of them above, but I probably missed some too.

- Please use normal rtl instead of unspecs where possible, unless the
  result gets too complex to be useful.  Again I've tried to catch some
  cases above, but might have missed some.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]