This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)

From: Richard Sandiford <rdsandiford at googlemail dot com>
To: Graham Stott <Graham dot Stott at imgtec dot com>
Cc: "gcc-patches\ at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Sun, 01 Jun 2014 21:42:20 +0100
Subject: Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
Authentication-results: sourceware.org; auth=none
References: <83760FF4D445E74A822AAB0E7BFE5F754051CCC4 at KLMAIL01 dot kl dot imgtec dot org>
Hi Graham,

Thanks for the patch.  I agree with what Richard and Joseph said.  Also...

I think it'd be better to keep the p5600 bits separate and wait until
the main p5600 patch has gone in.  It looks like it uses a different
naming scheme for the insn reservations.

Graham Stott <Graham.Stott@imgtec.com> writes:
> +(define_constraint "Un31"
> +  "@internal
> +   A replicated vector const in which the replicated value is negative
> +   integer number in range [-31,0]."
> +  (and (match_code "const_vector")
> +       (match_test "mips_const_vector_same_int_p (op, mode, -31, 0)")))
> +
> +(define_constraint "Up31"
> +  "@internal
> +   A replicated vector const in which the replicated value is positive
> +   integer number in range [0,31]."
> +  (and (match_code "const_vector")
> +       (match_test "mips_const_vector_same_int_p (op, mode, 0, 31)")))

The convention so far (and followed by the other constraints in the patch)
is for the number to be a bit count rather than a limit.  So "Unv5" and
"Uuv5" would probably be better ("u" for "unsigned").

> +;; Same as MODE128.  Used by vcond to iterate two modes.
> +(define_mode_iterator MSA_2     [V2DF V4SF V2DI V4SI V8HI V16QI])

"Same as MSA".

> +;; This attribute is used to form the MODE for reg_or_0_operand
> +;; constraint.
> +(define_mode_attr REGOR0
> +  [(V2DF "DF")
> +   (V4SF "SF")
> +   (V2DI "DI")
> +   (V4SI "SI")
> +   (V8HI "SI")
> +   (V16QI "SI")])

I still don't really like this part.  It's used only for two rvalues
and in both cases I think should be using UNITMODE instead.  If necessary,
we should take a lowpart subreg _before_ calling the gen_* routine.
As well as (IMO) being cleaner, it will tell the rtl optimisers that
bits above the unit mode are "don't care".

> +(define_expand "vec_extract<mode>"
> +  [(match_operand:<UNITMODE> 0 "register_operand")
> +   (match_operand:IMSA 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  gcc_assert (UINTVAL (operands[2]) < GET_MODE_NUNITS (<MODE>mode));

Why not:

(define_expand "vec_extract<mode>"
  [(match_operand:<UNITMODE> 0 "register_operand")
   (match_operand:IMSA 1 "register_operand")
   (match_operand 2 "const_<indeximm>_operand")]
  "ISA_HAS_MSA"
{
...
}

without the assert?

> +      rtx dest1 = gen_reg_rtx (SImode);
> +      emit_insn (gen_msa_copy_s_<msafmt> (dest1, operands[1], operands[2]));
> +      emit_move_insn (operands[0],
> +		      gen_lowpart (<UNITMODE>mode, dest1));

Why not just make copy_s assign directly in UNITMODE?

> +(define_expand "vec_extract<mode>"
> +  [(match_operand:<UNITMODE> 0 "register_operand")
> +   (match_operand:FMSA 1 "register_operand")
> +   (match_operand 2 "const_int_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  rtx temp;
> +  HOST_WIDE_INT val = UINTVAL (operands[2]);
> +
> +  gcc_assert (val < GET_MODE_NUNITS (<MODE>mode));

Looks like const_<indeximm>_operand would be better than an assert
here too.  Several other cases later; won't list them all.

> +      /* We need to do the SLDI operation in V16QImode and adjust
> +       * operand[2] accordingly.  */

No "*" at the beginning of the second line.

> +      rtx op2b  = GEN_INT (val * GET_MODE_SIZE (<UNITMODE>mode));

Should only be one space before "=".

> +      rtx res = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx temp1 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx temp2 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xres = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xop1 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +      rtx xop2 = gen_reg_rtx (<MSA_2:VIMODE>mode);
> +
> +      mips_expand_msa_vcond (res, true_val, false_val,
> +			     GET_CODE (operands[3]), operands[4], operands[5]);
> +      // Results in -1 or 0 so need to convert this to correct result for the
> +      // correct true/false given by operands[1]/operands[2] repectively.
> +      emit_move_insn (xres, res);
> +      if (operands[1] != true_val)
> +	{
> +	  emit_move_insn (xop1, operands[1]);
> +	  emit_insn (gen_and<MSA_2:mode_i>3 (temp1, xres, xop1));
> +	}
> +      else
> +	emit_move_insn (temp1, xres);

Should only create xop1 and xop2 if they're needed (i.e. in the
"if" arms), otherwise the register number goes to waste.  Why is
the temporary needed though?  The predicates require a register_operand
or true_val, so can't you use operands[1] directly?

> +      emit_move_insn (temp2, CONSTM1_RTX (<MSA_2:VIMODE>mode));
> +      emit_insn (gen_xor<MSA_2:mode_i>3 (temp2, xres, temp2));

Since there's a NOR instruction, I think we should have a (not ...)
pattern and use that instead of this sequence.

> +      if (operands[2] != false_val)
> +	{
> +	  emit_move_insn (xop2, operands[2]);
> +	  emit_insn (gen_and<MSA_2:mode_i>3 (temp2, temp2, xop2));
> +	}
> +      emit_insn (gen_ior<MSA_2:mode_i>3 (xres, temp1, temp2));
> +      emit_move_insn (operands[0], xres);

Same comment about needing xop2 here.

> +(define_expand "vcond<MSA_2:mode><MSA:mode>"
> +  [(set (match_operand:MSA_2 0 "register_operand")
> +	(if_then_else:MSA_2
> +	  (match_operator 3 ""
> +	    [(match_operand:MSA 4 "register_operand")
> +	     (match_operand:MSA 5 "register_operand")])
> +	  (match_operand:MSA_2 1 "reg_or_m1_operand")
> +	  (match_operand:MSA_2 2 "reg_or_0_operand")))]
> +  "ISA_HAS_MSA
> +   && (GET_MODE_NUNITS (<MSA_2:MODE>mode)
> +       == GET_MODE_NUNITS (<MSA:MODE>mode))"

The main C code for this pattern is a cut-&-paste of vcondu.  It'd be
better to factor it into a common routine.

> +(define_insn "msa_insert_<msafmt>"
> +  [(set (match_operand:IMSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<REGOR0> 3 "reg_or_0_operand" "dJ")]
> +		       UNSPEC_MSA_INSERT))]

<MODE> looks weird here.  Isn't that just IMSA?  I think we should
use IMSA consistently, so that the rhs of the (set ...) obviously
agrees with the lhs and so that operand 1 obviously agrees with
operand 0 (which it matches).

Same for other patterns where :<MODE> is used.

> +; Similar to msa_insert_<msafmt> but with <UNITMODE>mode for operand 3.
> +(define_insn "*msa_insert_<msafmt_f>"
> +  [(set (match_operand:MSA_3 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<UNITMODE> 3 "reg_or_0_operand" "dJ")]
> +		       UNSPEC_MSA_INSERT))]

As above, I don't think we want both these patterns.  We should just have
the <UNITMODE> one and if necessary take the <UNITMODE> subreg before
calling the gen_* pattern.

> +;; Note that insert.d and insert.d_f will be split later if !TARGET_64BIT.
> +
> +(define_split
> +  [(set (match_operand:V2DI 0 "register_operand")
> +	(unspec:V2DI [(match_operand:V2DI 1 "register_operand")
> +		      (match_operand 2 "const_0_or_1_operand")
> +		      (match_operand:DI 3 "reg_or_0_operand")]
> +		     UNSPEC_MSA_INSERT))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_insert_d (operands[0], operands[1], operands[2], operands[3]);
> +  DONE;
> +})

Comment seems misplaced; probably belongs above the insn definition instead.

> +(define_insn "msa_insve_<msafmt_f>"
> +  [(set (match_operand:MSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<MODE> 3 "register_operand" "f")]
> +		       UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.<msafmt>\t%w0[%2],%w3[0]"
> +  [(set_attr "type"     "arith")
> +   (set_attr "mode"     "TI")
> +   (set_attr "msa_execunit" "msa_eu_logic_l")])
> +
> +;; operand 3 is a scalar
> +(define_insn "msa_insve_<msafmt>_f_s"
> +  [(set (match_operand:FMSA 0 "register_operand" "=f")
> +	(unspec:<MODE> [(match_operand:<MODE> 1 "register_operand" "0")
> +			(match_operand 2 "const_<indeximm>_operand" "")
> +			(match_operand:<UNITMODE> 3 "register_operand" "f")]
> +		       UNSPEC_MSA_INSVE))]
> +  "ISA_HAS_MSA"
> +  "insve.<msafmt>\t%w0[%2],%w3[0]"
> +  [(set_attr "type"     "arith")
> +   (set_attr "mode"     "TI")
> +   (set_attr "msa_execunit" "msa_eu_logic_l")])

Here too I think we just want the <UNITMODE> version, creating lowparts
where necessary.

> +;; Note that copy_s.d will be split later if !TARGET_64BIT.
> +;; Note that copy_s.d_f will be split later if !TARGET_64BIT.
> +
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> +	(unspec:DI [(match_operand:V2DI 1 "register_operand")
> +		    (match_operand 2 "const_0_or_1_operand")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], gen_msa_copy_s_w);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:DF 0 "register_operand")
> +	(unspec:DF [(match_operand:V2DF 1 "register_operand")
> +		    (match_operand 2 "const_0_or_1_operand")]
> +		   UNSPEC_MSA_COPY_S))]
> +  "reload_completed && TARGET_MSA && !TARGET_64BIT"
> +  [(const_int 0)]
> +{
> +  mips_split_msa_copy_d (operands[0], operands[1], operands[2], gen_msa_copy_s_w);
> +  DONE;
> +})

Same misplaced comment.  Please use mode iterators to combine the splits.

> +(define_expand "vec_perm<mode>"
> +  [(match_operand:MSA 0 "register_operand")
> +   (match_operand:MSA 1 "register_operand")
> +   (match_operand:MSA 2 "register_operand")
> +   (match_operand:<VIMODE> 3 "register_operand")]
> +  "ISA_HAS_MSA"
> +{
> +  /* Note that GCC always uses memory order (as big-endian) in indexing,
> +     and layouts operands[1] frist and then operands[2] next.
> +     However, vshf starts indexes from wt to ws, so so we need to swap
> +     two operands.  MSA loads or stores elements to or from the rightmost
> +     position of vector registers, for both big-endian and little-endian CPUs.
> +     No need to change any index numbers.  */
> +  emit_insn (gen_msa_vshf<mode> (operands[0], operands[3], operands[2],
> +				 operands[1]));
> +  DONE;
> +})

TBH I still prefer the wording I suggested in the previous review
to this version.

> +(define_insn "msa_lsa"
> + [(set (match_operand:SI 0 "register_operand" "=d")
> +       (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d")
> +			 (match_operand    2 "const_immlsa_operand" ""))
> +		(match_operand:SI 3 "register_operand" "d")))]
> + "ISA_HAS_LSA"
> + "lsa\t%0,%1,%3,%y2"
> + [(set_attr "type"      "arith")
> +  (set_attr "mode"      "SI")])

Seems like this should be a :P pattern, to cope with 64-bit addresses too.
Operand 2 should have a mode.

> +;; 128-bit integer/MSA vector registers moves
> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and J, so that move_type for J is fmove and its
> +;; instruction length can be 1.
> +(define_insn "movti_msa"
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=*d,*d,*d,*R,*d,*f,f,R,f,*m")
> +	(match_operand:TI 1 "move_operand" "*d,*i,*R,*d*J,*f,*d,R,f,fJ,*i*d"))]
> +  "ISA_HAS_MSA
> +   && !TARGET_64BIT
> +   && (register_operand (operands[0], TImode)
> +       || reg_or_0_operand (operands[1], TImode))"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"	"move,const,load,store,mfc,mtc,fpload,fpstore,fmove,store")
> +   (set_attr "mode"     "TI")])

As I said in the first review:

  I don't understand the last sentence.  It looks like mips_output_move
  uses LDI.x for the f<-J case, and msa_ldi<mode> has type arith.
  I think these should be kept as separate alternatives.

Please address this.  The type of LDI.x should be consistent whether
it comes from msa_ldi<mode> or the move patterns.

Again from the first review:

  Why do we need to allow TImode in GPRs for 32-bit mode?  I think we should
  try to avoid that if at all possible.

Please address this too.  If we really do need TImode in GPRs for some
reason, please say what it is.

> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and J, so that move_type for J is fmove and its
> +;; instruction length can be 1.
> +(define_insn "movti_msa_64bit"
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=*d,*d,*d,*R,*a,*d,*d,*f,f,R,f,*m")
> +	(match_operand:TI 1 "move_operand" "*d,*i,*R,*d*J,*d*J,*a,*f,*d,R,f,fJ,*i*d"))]
> +  "ISA_HAS_MSA
> +   && TARGET_64BIT
> +   && (register_operand (operands[0], TImode)
> +       || reg_or_0_operand (operands[1], TImode))"
> +  { return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type" "move,const,load,store,mtlo,mflo,mfc,mtc,fpload,fpstore,fmove,store")
> +   (set_attr "mode" "TI")])

Rather than copying the comment from the previous pattern, maybe use
something like:

;; Similarly for 64-bit hosts.  In this case we also need to provide
;; alternatives for the accumulator registers.

> +(define_expand "mov<mode>"
> +  [(set (match_operand:MODE128 0)
> +	(match_operand:MODE128 1))]
> +  "TARGET_64BIT || TARGET_MSA"
> +{
> +  if (register_operand (operands[0], <MODE>mode)
> +      && mips_const_vector_same_int_p (operands[1], GET_MODE (operands[1]), -512, 511))
> +    {
> +      emit_insn (gen_msa_ldi<mode> (operands[0], CONST_VECTOR_ELT (operands[1], 0)));
> +      DONE;
> +    }
> +  if (mips_legitimize_move (<MODE>mode, operands[0], operands[1]))
> +    DONE;

Again from the first review:

  Why the "TARGET_64BIT || "?  Why do we want to allow these modes without
  MSA in that case?

Two lines longer than 80 chars.

> +;; Note that we prefer floating-point loads, stores, and moves by adding * to
> +;; other register preferences.
> +;; Note that we combine f and YG, so that move_type for YG is fmove and its
> +;; instruction length can be 1.
> +(define_insn "mov<mode>_msa"
> +  [(set (match_operand:MODE128 0 "nonimmediate_operand" "=f,f,R,R,*f,*d,*d,*d,*R")
> +	(match_operand:MODE128 1 "move_operand" "fYG,R,f,YG,*d,*f,*d*YG,*R,*d"))]
> +  "ISA_HAS_MSA
> +   && (register_operand (operands[0], <MODE>mode)
> +       || reg_or_0_operand (operands[1], <MODE>mode))"
> +{ return mips_output_move (operands[0], operands[1]); }
> +  [(set_attr "move_type"	"fmove,fpload,fpstore,store,mtc,mfc,move,load,store")
> +   (set_attr "mode"     "TI")])

Also from the first review:

  Same question I suppose: why do we want to allow these vectors into GPRs?

I.e. why not restrict MSA vector modes to vector registers?  I don't
see any reason off-hand why we'd need to put those in GPRs.

> +(define_split
> +  [(set (match_operand:TI 0 "nonimmediate_operand")
> +	(match_operand:TI 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})
> +
> +(define_split
> +  [(set (match_operand:MSA 0 "nonimmediate_operand")
> +	(match_operand:MSA 1 "move_operand"))]
> +  "reload_completed && TARGET_MSA
> +   && mips_split_128bit_move_p (operands[0], operands[1])"
> +  [(const_int 0)]
> +{
> +  mips_split_128bit_move (operands[0], operands[1]);
> +  DONE;
> +})

Also from the first review:

  Please instead add whatever needs to be done to the existing
  mips_split_move_insn_p/mips_split_move_insn pair.

I'm not sure whether this patch was supposed to address the previous
comments or not, but it probably isn't productive for me to just quote
what I said first time round.  If the patch is still WIP and you're just
asking for comments about particular parts then I'd be happy to review those,
but given the size of this patch, I'd rather not re-review the whole
thing until the points from earlier reviews have been sorted out.

Thanks,
Richard
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]