[PATCH 1/2] IBM Z: Store long doubles in vector registers when possible

Tue Nov 10 08:33:53 GMT 2020

On 09.11.20 20:54, Ilya Leoshkevich wrote:
> On z14+, there are instructions for working with 128-bit floats (long
> doubles) in vector registers.  It's beneficial to use them instead of
> instructions that operate on floating point register pairs, because it
> allows to store 4 times more data in registers at a time, relieving
> register pressure.  The raw performance of the new instructions is
> almost the same as that of the new ones.
> 
> Implement by storing TFmode values in vector registers on z14+.  Since
> not all operations are available with the new instructions, keep the
> old ones available using the new FPRX2 mode, and convert between it and
> TFmode when necessary (this is called "forwarder" expanders below).
> Change the existing TFmode expanders to call either new- or old-style
> ones depending on whether we are on z14+ or older machines
> ("dispatcher" expanders).
> 
> gcc/ChangeLog:
> 
> 2020-11-03  Ilya Leoshkevich  <iii@linux.ibm.com>
> 
> 	* config/s390/s390-modes.def (FPRX2): New mode.
> 	* config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> 	* config/s390/s390.c (s390_fma_allowed_p): Likewise.
> 	(s390_build_signbit_mask): Support 128-bit masks.
> 	(print_operand): Support printing the second word of a TFmode
> 	operand as vector register.
> 	(constant_modes): Add FPRX2mode.
> 	(s390_class_max_nregs): Return 1 for TFmode on z14+.
> 	(s390_is_fpr128): New function.
> 	(s390_is_vr128): Likewise.
> 	(s390_can_change_mode_class): Use s390_is_fpr128 and
> 	s390_is_vr128 in order to determine whether mode refers to a FPR
> 	pair or to a VR.
> 	(s390_emit_compare): Force TFmode operands into registers on
> 	z14+.
> 	* config/s390/s390.h (HAVE_TF): New macro.
> 	(EXPAND_MOVTF): New macro.
> 	(EXPAND_TF): Likewise.
> 	* config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> 	alias.
> 	(ALL): Add FPRX2.
> 	(FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> 	(FP): Likewise.
> 	(FP_ANYTF): New mode iterator.
> 	(BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> 	(TD_TF): Likewise.
> 	(xde): Add FPRX2.
> 	(nBFP): Likewise.
> 	(nDFP): Likewise.
> 	(DSF): Likewise.
> 	(DFDI): Likewise.
> 	(SFSI): Likewise.
> 	(DF): Likewise.
> 	(SF): Likewise.
> 	(fT0): Likewise.
> 	(bt): Likewise.
> 	(_d): Likewise.
> 	(HALF_TMODE): Likewise.
> 	(tf_fpr): New mode_attr.
> 	(type): New mode_attr.
> 	(*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
> 	(*cmp<mode>_ccs_0_fastmath): Likewise.
> 	(*cmptf_ccs): New pattern for wfcxb.
> 	(*cmptf_ccsfps): New pattern for wfkxb.
> 	(mov<mode>): Rename to mov<mode><tf_fpr>.
> 	(signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
> 	(isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
> 	(*TDC_insn_<mode>): Use type instead of mode with fsimp.
> 	(fixuns_trunc<FP:mode><GPR:mode>2): Rename to
> 	fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
> 	(fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
> 	(floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
> 	instead of mode with itof.
> 	(floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
> 	instead of mode with itof.
> 	(*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
> 	itof.
> 	(floatuns<GPR:mode><FP:mode>2): Rename to
> 	floatuns<GPR:mode><FP:mode>2<tf_fpr>.
> 	(trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type instead
> 	of mode with fsimp.
> 	(extend<DSF:mode><BFP:mode>2): Rename to
> 	extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
> 	(<FPINT:fpint_name><BFP:mode>2): Rename to
> 	<FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
> 	mode with fsimp.
> 	(rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
> 	type instead of mode with fsimp.
> 	(<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
> 	fsimp.
> 	(rint<DFP:mode>2): Likewise.
> 	(trunc<BFP:mode><DFP_ALL:mode>2): Rename to
> 	trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> 	(trunc<DFP_ALL:mode><BFP:mode>2): Rename to
> 	trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> 	(extend<BFP:mode><DFP_ALL:mode>2): Rename to
> 	extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> 	(extend<DFP_ALL:mode><BFP:mode>2): Rename to
> 	extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> 	(add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
> 	mode with fsimp.
> 	(*add<mode>3_cc): Use type instead of mode with fsimp.
> 	(*add<mode>3_cconly): Likewise.
> 	(sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
> 	mode with fsimp.
> 	(*sub<mode>3_cc): Use type instead of mode with fsimp.
> 	(*sub<mode>3_cconly): Likewise.
> 	(mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
> 	mode with fsimp.
> 	(fma<mode>4): Restrict using s390_fma_allowed_p.
> 	(fms<mode>4): Restrict using s390_fma_allowed_p.
> 	(div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
> 	mode with fdiv.
> 	(neg<mode>2): Rename to neg<mode>2<tf_fpr>.
> 	(*neg<mode>2_cc): Use type instead of mode with fsimp.
> 	(*neg<mode>2_cconly): Likewise.
> 	(*neg<mode>2_nocc): Likewise.
> 	(*neg<mode>2): Likeiwse.
> 	(abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
> 	mode with fdiv.
> 	(*abs<mode>2_cc): Use type instead of mode with fsimp.
> 	(*abs<mode>2_cconly): Likewise.
> 	(*abs<mode>2_nocc): Likewise.
> 	(*abs<mode>2): Likewise.
> 	(*negabs<mode>2_cc): Likewise.
> 	(*negabs<mode>2_cconly): Likewise.
> 	(*negabs<mode>2_nocc): Likewise.
> 	(*negabs<mode>2): Likewise.
> 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
> 	of mode with fsqrt.
> 	(cbranch<mode>4): Use FP_ANYTF instead of FP.
> 	(copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
> 	instead of mode with fsimp.
> 	* config/s390/s390.opt (flag_vx_long_double_fma): New
> 	undocumented option.
> 	* config/s390/vector.md (V_HW): Add TF for z14+.
> 	(V_HW2): Likewise.
> 	(VFT): Likewise.
> 	(VF_HW): Likewise.
> 	(V_128): Likewise.
> 	(tf_vr): New mode_attr.
> 	(tointvec): Add TF.
> 	(mov<mode>): Rename to mov<mode><tf_vr>.
> 	(movetf): New dispatcher.
> 	(*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
> 	z13-.
> 	(*vec_tf_to_v1tf_vr): New pattern for z14+.
> 	(*fprx2_to_tf): Likewise.
> 	(*mov_tf_to_fprx2_0): Likewise.
> 	(*mov_tf_to_fprx2_1): Likewise.
> 	(add<mode>3): Rename to add<mode>3<tf_vr>.
> 	(addtf3): New dispatcher.
> 	(sub<mode>3): Rename to sub<mode>3<tf_vr>.
> 	(subtf3): New dispatcher.
> 	(mul<mode>3): Rename to mul<mode>3<tf_vr>.
> 	(multf3): New dispatcher.
> 	(div<mode>3): Rename to div<mode>3<tf_vr>.
> 	(divtf3): New dispatcher.
> 	(sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
> 	(sqrttf2): New dispatcher.
> 	(fma<mode>4): Restrict using s390_fma_allowed_p.
> 	(fms<mode>4): Likewise.
> 	(neg_fma<mode>4): Likewise.
> 	(neg_fms<mode>4): Likewise.
> 	(neg<mode>2): Rename to neg<mode>2<tf_vr>.
> 	(negtf2): New dispatcher.
> 	(abs<mode>2): Rename to abs<mode>2<tf_vr>.
> 	(abstf2): New dispatcher.
> 	(float<mode>tf2_vr): New forwarder.
> 	(float<mode>tf2): New dispatcher.
> 	(floatuns<mode>tf2_vr): New forwarder.
> 	(floatuns<mode>tf2): New dispatcher.
> 	(fix_trunctf<mode>2_vr): New forwarder.
> 	(fix_trunctf<mode>2): New dispatcher.
> 	(fixuns_trunctf<mode>2_vr): New forwarder.
> 	(fixuns_trunctf<mode>2): New dispatcher.
> 	(<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
> 	(<FPINT:fpint_name>tf2): New forwarder.
> 	(rint<mode>2<tf_vr>): New pattern.
> 	(rinttf2): New forwarder.
> 	(*trunctfdf2_vr): New pattern.
> 	(trunctfdf2_vr): New forwarder.
> 	(trunctfdf2): New dispatcher.
> 	(trunctfsf2_vr): New forwarder.
> 	(trunctfsf2): New dispatcher.
> 	(extenddftf2_vr): New pattern.
> 	(extenddftf2): New dispatcher.
> 	(extendsftf2_vr): New forwarder.
> 	(extendsftf2): New dispatcher.
> 	(signbittf2_vr): New forwarder.
> 	(signbittf2): New dispatchers.
> 	(isinftf2_vr): New forwarder.
> 	(isinftf2): New dispatcher.
> 	* config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
> 	instead of VECF_HW, add missing constraint, add vw support.
> 	(vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
> 	(*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
> 	VECF_HW, and vw support.
> 	(vftci<mode>_intcc): Use VF_HW instead of VECF_HW.

Ok. Thanks!

Andreas