[PATCH,rs6000] Add builtins for accessing the FPSCR

Fri Aug 17 21:34:00 GMT 2018

Hi Carl,

On Fri, Aug 17, 2018 at 11:46:06AM -0700, Carl Love wrote:
> > In addition to listing
> > the builtin, I added a C style comment to describe the builtin a
> > little.Â Â I don't see any of the other builtins documented like this.Â 
> > But I felt some explanation of the builtins were
> > helpful.Â Â Suggestions
> > on a better way to add the comments on the builtins would be
> > appreciated.

I think this is fine.

> 	* config/rs6000/rs6000-builtin.def: Add definitions for __builtin_mffsl,
> 	__builtin_mtfsb0, __builtin_mtfsb1, __builtin_set_fpscr_rn,
> 	__builtin_set_fpscr_drn.

	* config/rs6000/rs6000-builtin.def (__builtin_mffsl): New.
	(__builtin_mtfsb0): New.
	(__builtin_mtfsb1): New.
	(__builtin_set_fpscr_rn): New.
	(__builtin_set_fpscr_drn): New.

or

	* config/rs6000/rs6000-builtin.def (__builtin_mffsl, __builtin_mtfsb0,
	__builtin_mtfsb1, __builtin_set_fpscr_rn, __builtin_set_fpscr_drn): New.

> 	* config/rs6000.c: Add functions rs6000_expand_mtfsb0_mtfsb1_builtin,
> 	rs6000_expand_set_fpscr_rn_builtin, rs6000_expand_set_fpscr_drn_builtin.

Same here (and further on).

> 	Add case statement entries for the new builtins.

To what function(s)?

> 	* testsuite/gcc.target/powerpc/test_mffsl-p9.c: New file.
> 	* testsuite/gcc.target/powerpc/test_fpscr_builtins.c: New file.
> 	* testsuite/gcc.target/powerpc/test_fpscr_builtins_error.c: New file.

testsuite/ has its own changelog.  Entries in there do not include
"testsuite/".

> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2486,11 +2486,34 @@ BU_SPECIAL_X (RS6000_BUILTIN_MFTB, "__builtin_ppc_mftb",
>  BU_SPECIAL_X (RS6000_BUILTIN_MFFS, "__builtin_mffs",
>  	      RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
>  
> +BU_SPECIAL_X (RS6000_BUILTIN_MFFSL, "__builtin_mffsl",
> +	      RS6000_BTM_ALWAYS, RS6000_BTC_MISC)

Should this be RS6000_BTM_MISC_P9 (or similar) instead?  Same for the other
ISA 3.0 ops.

> +static rtx
> +rs6000_expand_mtfsb0_mtfsb1_builtin (enum insn_code icode, tree exp)

I'd call this rs6000_expand_mtfsb_builtin, but please use which you think
is clearest.

> +  /* Only allow bit numbers 0 to 31.  */
> +  if (GET_CODE (op0) != CONST_INT || INTVAL (op0) < 0 || INTVAL (op0) > 31)

if (!u5bit_cint_operand (op0, VOIDmode))

should do the trick I think.

> +    {
> +       error ("Argument must be a constant between 0 and 31.");
> +       return const0_rtx;
> +     }
> +
> +  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
> +    op0 = copy_to_mode_reg (mode0, op0);

Is this correct?  It must be a constant integer already, and if it fails
copying it into a register is surely not the right thing to do.

> +  /* If the argument is a constant, check the range. Agrument can only be a
> +     2-bit value.  Unfortunately, can't check the range of the value at
> +     compile time if the argument is a variable.
> +  */
> +  if (GET_CODE (op0) == CONST_INT && (INTVAL (op0) < 0 || INTVAL (op0) > 3))

const_0_to_3_operand

> +    /* Builtin not supported on this processor.  */
> +    return 0;
> +
> +  /* If we got invalid arguments bail out before generating bad rtl.  */
> +  if (arg0 == error_mark_node)
> +    return const0_rtx;
> +
> +  /* If the argument is a constant, check the range. Agrument can only be a
> +     3-bit value.  Unfortunately, can't check the range of the value at
> +     compile time if the argument is a variable.
> +  */
> +  if (GET_CODE (op0) == CONST_INT && (INTVAL (op0) < 0 || INTVAL (op0) > 7))

(Typo, "argument").  const_0_to_7_operand or u3bit_cint_operand (both exist,
and they are identical.  Hrm.)

>  
> @@ -16370,6 +16497,30 @@ rs6000_init_builtins (void)
>    ftype = build_function_type_list (double_type_node, NULL_TREE);
>    def_builtin ("__builtin_mffs", ftype, RS6000_BUILTIN_MFFS);
>  
> +  ftype = build_function_type_list (double_type_node, NULL_TREE);
> +  def_builtin ("__builtin_mffsl", ftype, RS6000_BUILTIN_MFFSL);
> +
> +  ftype = build_function_type_list (void_type_node,
> +				    intSI_type_node,
> +				    NULL_TREE);
> +
> +  def_builtin ("__builtin_mtfsb0", ftype, RS6000_BUILTIN_MTFSB0_SI);

No blank line between ftype and def_builtin please?

> +(define_insn "rs6000_mtfsb0_si"

Why the _si?  Won't just rs6000_mtfsb0 do?

> + [(use (match_operand:SI 0 "short_cint_operand" "n"))
> +  (unspec_volatile:SI [(const_int 0)] UNSPECV_MTFSFB0)]

UNSPECV_MTFSB0 please.

operands[0] should be an argument of the unspec... so something like

(define_insn "rs6000_mtfsb0"
  [(unspec_volatile [(match_operand:SI 0 "u5bit_cint_operand" "n")]
		    UNSPECV_MTFSB0)]
  "TARGET_HARD_FLOAT"
  "mtfsb0 %0")

(and you should set the "type" attribute to something useful, ideally).

> +(define_insn "rs6000_mffscrn"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +   (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCRN))
> +   (use (match_operand:DF 1 "gpc_reg_operand" "d"))]
> +   "TARGET_HARD_FLOAT"
> +   "mffscrn %0,%1")

(define_insn "rs6000_mffscrn"
  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
	(unspec_volatile:DF [(match_operand:DF 1 "gpc_reg_operand" "d")]
			    UNSPECV_MFFSCRN))]
   "TARGET_HARD_FLOAT"
   "mffscrn %0,%1")

(you also need a check for ISA 3.0).

> +(define_expand "rs6000_set_fpscr_rn"
> +  [(match_operand:DI 0  "gpc_reg_operand")]
> +  "TARGET_HARD_FLOAT"
> +{
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +
> +  /* The floating point rounding control bits are FPSCR[62:63]. Put the
> +     new rounding mode bits from operands[0][62:63] into FPSCR[62:63].  */
> +  if (TARGET_P9_VECTOR)

It does not depend on vector stuff (say, you use -mcpu=power9 -mno-altivec).

> +    {
> +      rtx tmp_rn = gen_reg_rtx (DImode);
> +      rtx tmp_di = gen_reg_rtx (DImode);
> +
> +      /* Extract new RN mode from operand.  */
> +      emit_insn (gen_anddi3_mask (tmp_rn, operands[0], GEN_INT (0x3)));

This doesn't work for -m32 afaics.  Either disallow it, or make it work?

> +      /* Insert new RN mode into FSCPR.  */
> +      emit_insn (gen_rs6000_mffs (tmp_df));
> +      tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +      emit_insn (gen_anddi3_mask (tmp_di, tmp_di, GEN_INT (0xFFFFFFFC)));
> +      emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> +
> +      /* Need to write to field k=15.  The fields are [0:15].  Hence with L=0,
> +         W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is an 8 bit
> +         field[0:7]. Need to set the bit that corresponds to the value of i
> +         that you want [0:7].
> +      */

(The */ should not go on a new line).
The derivation isn't super clear to me, but 1 is the correct mask, yes.

> +      tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +      emit_insn (gen_rs6000_mtfsf (GEN_INT (0x01), tmp_df));

> +(define_expand "rs6000_set_fpscr_drn"

> +      /* Insert new RN mode into FSCPR.  */
> +      emit_insn (gen_rs6000_mffs (tmp_df));
> +      tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +      emit_insn (gen_anddi3_mask (tmp_di, tmp_di, GEN_INT (0xFFF8FFFFFFFF)));
> +      emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));

Why is this masking off the top 16 bits of the original contents?  That
seems wrong.

> +;; The ISA 3.0 mffsl instruction is a lower latency instruction
> +;; for reading the FPSCR

For reading _a part_ of the FPSCR, the other bits are set to 0 in the result.

This matters, because otherwise we should just use __builtin_mffs always;
but it does not do the same thing, so we cannot.

> +(define_insn "rs6000_mffsl0"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +        (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSL))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_MISC"
> +  "mffsl %0")

(Please use a better name than that "0"...  We have used "_hw" before).

> +(define_expand "rs6000_mffsl"
> +  [(set (match_operand:DF 0 "gpc_reg_operand")
> +	(unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSL))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_MISC"

You don't want the latter...

> +{
> +  /* If the low latency mffsl instruction (ISA 3.0) is available use it,
> +     otherwise fall back to the older mffs instruction which does the same
> +     thing but with a little more latency.  */
> +
> +  if (TARGET_P9_VECTOR)

... but you want it here (instead of the _VECTOR).

> +    emit_insn (gen_rs6000_mffsl0 (operands[0]));
> +  else
> +    emit_insn (gen_rs6000_mffs (operands[0]));

> +(define_insn "rs6000_mtfsf_L0W1"
> +  [(unspec_volatile [(match_operand:SI 0 "const_int_operand" "i")
> +		     (match_operand:DF 1 "gpc_reg_operand" "d")]
> +		    UNSPECV_MTFSF_L0W1)]
> +  "TARGET_HARD_FLOAT"
> +  "mtfsf %0,%1,0,1")

Maybe name it rs6000_mtsfs_high?  L0W1 reads like "low" :-)

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -15745,6 +15745,10 @@ uint64_t __builtin_ppc_get_timebase ();
>  unsigned long __builtin_ppc_mftb ();
>  __ibm128 __builtin_unpack_ibm128 (__ibm128, int);
>  __ibm128 __builtin_pack_ibm128 (double, double);
> +double __builtin_mffs(void);
> +void __builtin_mtfsb0(const int);
> +void __builtin_mtfsb1(const int);
> +void __builtin_set_fpscr_rn(int);
>  @end smallexample

(space before opening paren)

> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_builtins.c
> @@ -0,0 +1,282 @@
> +/* { dg-do run { target { powerpc64*-*-* && lp64 } } } */
> +/* { dg-require-effective-target lp64 } */

You have "lp64" in the selector already, repeating it here doesn't do
anything.

> +/* { dg-options "-pedantic" } */

Why is this?

Segher