[PATCH], PowerPC PR90822 (cleanup lfiwax, lfiwzx generation)

Mon Jun 17 21:24:00 GMT 2019

I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in the
2010 time frame.  At the time, we did not allow SImode to go into floating
point and vector registers.  As part of the power9 work, we now allow SImode to
go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or
higher.  But we never went back and tweaked the LFIWAX/LFIWZX support.

I was writing code for a possible future PowerPC machine, and the new code
added an attribute that caused some of the -mno-vsx tests to fail.  This was
due to the floatsi<mode>2_lfiwax and floatunssi<mode>2_lfiwzx patterns did not
have a non-VSX alternative, and the attribute processing needed to process the
alternatives before the first split pass.

In looking at the code, I decided to also clean up the underlying lfiwax and
lfiwzx patterns.  In this code, on machines that support SImode in floating
point and vector registers, after register allocation we split the conversion
to SFmode and DFmode into a sign/zero extend operation.  On machines that do
not support SImode in floating point and vector registers, we continue to use
the lfiwax and lfiwzx unspec patterns.

I have tested this code by doing bootstraps and make checks on both little
endian and big endian systems, and there are no regressions.

I did build (but not run) the following versions of Spec 2006 and every version
built.

	little endian power8 64-bit
	little endian power9 64-bit
	big endian power7 64-bit
	big endian power8 64-bit
	big endian power9 64-bit
	bit endian power7 32-bit
	bit endian power8 32-bit
	bit endian power9 32-bit

In general, the 32-bit code seems to generate a lot less instructions,
including fewer lfiwax/lfiwzx instructions.  On power8/power9 32-bit code,
there was more mtvsrwz mtvsrwa instructions.

The 64-bit code is more similar, but I notice that we aren't generating as many
mtvsrd instructions, but instead generating mtvsrwa and mtvsrwd instructions.

Can I check this into the trunk?

[gcc]
2019-06-14  Michael Meissner  <meissner@linux.ibm.com>

	PR target/90822
	* config/rs6000/rs6000.md (lfiwax): Update comment.  Add
	vupkhsw/xxspltd or vupklsw/xxspltd split support to sign extend
	SImode on ISA 2.07 systems.
	(floatsi<mode>2_lfiwax): Rewrite.  Do not split the insn until
	after register allocation so that we should always get lfiwax
	generated instead of doing a gpr load and direct move.  On 64-bit
	systems that allow SImode in vector registers, split to do a
	sign_extend instead of lfiwax.
	(floatsi<mode>2_lfiwax_mem): Delete, no longer used.
	(lfiwzx): Move so this is next to lfiwax.
	(floatunssi<mode>2_lfiwax): Rewrite.  Do not split the insn until
	after register allocation so that we should always get lfiwzx
	generated instead of doing a gpr load and direct move.  On 64-bit
	systems that allow SImode in vector registers, split to do a
	zero_extend instead of lfiwzx.
	(floatunssi<mode>2_lfiwzx_mem): Delete, no longer used.

[gcc/testsuite]
2019-06-14  Michael Meissner  <meissner@linux.ibm.com>

	PR target/90822
	* gcc.target/powerpc/pr81348.c: Use -O2 instead of -Og.

Index: gcc/config/rs6000/rs6000.md
===================================================================

--- gcc/config/rs6000/rs6000.md	(revision 272166)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -5231,88 +5231,46 @@ (define_insn "*xxsel<mode>"
 
 ;; Conversions to and from floating-point.
 
-; We don't define lfiwax/lfiwzx with the normal definition, because we
-; don't want to support putting SImode in FPR registers.
-(define_insn "lfiwax"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v")
-	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v")]
+; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX because
+; the sign/zero extend insns are not defined.
+(define_insn_and_split "lfiwax"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v")
+	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")]
 		   UNSPEC_LFIWAX))]
   "TARGET_HARD_FLOAT && TARGET_LFIWAX"
   "@
    lfiwax %0,%y1
    lxsiwax %x0,%y1
    mtvsrwa %x0,%1
-   vextsw2d %0,%1"
-  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
-   (set_attr "isa" "*,p8v,p8v,p9v")])
-
-; This split must be run before register allocation because it allocates the
-; memory slot that is needed to move values to/from the FPR.  We don't allocate
-; it earlier to allow for the combiner to merge insns together where it might
-; not be needed and also in case the insns are deleted as dead code.
-
-(define_insn_and_split "floatsi<mode>2_lfiwax"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>")
-	(float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r")))
-   (clobber (match_scratch:DI 2 "=wa"))]
-  "TARGET_HARD_FLOAT && TARGET_LFIWAX
-   && <SI_CONVERT_FP> && can_create_pseudo_p ()"
-  "#"
-  ""
-  [(pc)]
+   vextsw2d %0,%1
+   #"
+  "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR
+   && altivec_register_operand (operands[1], SImode)"
+  [(const_int 0)]
 {
   rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp;
+  int dest_regno = REGNO (dest);
+  int src_regno = REGNO (src);
+  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
+  rtx src_v4si = gen_rtx_REG (V4SImode, src_regno);
 
-  if (!MEM_P (src) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
-    tmp = convert_to_mode (DImode, src, false);
-  else
+  if (BYTES_BIG_ENDIAN)
     {
-      tmp = operands[2];
-      if (GET_CODE (tmp) == SCRATCH)
-	tmp = gen_reg_rtx (DImode);
-      if (MEM_P (src))
-	{
-	  src = rs6000_force_indexed_or_indirect_mem (src);
-	  emit_insn (gen_lfiwax (tmp, src));
-	}
-      else
-	{
-	  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
-	  emit_move_insn (stack, src);
-	  emit_insn (gen_lfiwax (tmp, stack));
-	}
+      emit_insn (gen_altivec_vupkhsw (dest_v2di, src_v4si));
+      emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const1_rtx));
+      DONE;
     }
-  emit_insn (gen_floatdi<mode>2 (dest, tmp));
-  DONE;
-}
-  [(set_attr "length" "12")
-   (set_attr "type" "fpload")])
-
-(define_insn_and_split "floatsi<mode>2_lfiwax_mem"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>")
-	(float:SFDF
-	 (sign_extend:DI
-	  (match_operand:SI 1 "indexed_or_indirect_operand" "Z"))))
-   (clobber (match_scratch:DI 2 "=wa"))]
-  "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>"
-  "#"
-  ""
-  [(pc)]
-{
-  operands[1] = rs6000_force_indexed_or_indirect_mem (operands[1]);
-  if (GET_CODE (operands[2]) == SCRATCH)
-    operands[2] = gen_reg_rtx (DImode);
-  if (TARGET_P8_VECTOR)
-    emit_insn (gen_extendsidi2 (operands[2], operands[1]));
   else
-    emit_insn (gen_lfiwax (operands[2], operands[1]));
-  emit_insn (gen_floatdi<mode>2 (operands[0], operands[2]));
-  DONE;
+    {
+      emit_insn (gen_altivec_vupklsw (dest_v2di, src_v4si));
+      emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const0_rtx));
+      DONE;
+    }
 }
-  [(set_attr "length" "8")
-   (set_attr "type" "fpload")])
+  [(set_attr "type" "fpload,fpload,mffgpr,vecexts,vecexts")
+   (set_attr "isa" "*,p8v,p8v,p9v,p8v")
+   (set_attr "length" "*,*,*,*,8")])
 
 (define_insn "lfiwzx"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,wa")
@@ -5327,67 +5285,61 @@ (define_insn "lfiwzx"
   [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
    (set_attr "isa" "*,p8v,p8v,p9v")])
 
-(define_insn_and_split "floatunssi<mode>2_lfiwzx"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>")
-	(unsigned_float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r")))
-   (clobber (match_scratch:DI 2 "=wa"))]
-  "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>"
+;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode through
+;; register allocation so that the register allocator generates a LFIWAX or
+;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* instruction
+;; on power8 and LWA + STD + LFD on power7/power6 systems.
+
+;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT
+;; The first alternative is to support -mno-vsx and -mcpu=power6.
+(define_insn_and_split "floatsi<mode>2_lfiwax"
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa,wa")
+	(float:SFDF
+	 (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,v,v")))
+   (clobber (match_scratch:DI 2 "=d,d,v,wa,v,v"))]
+  "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>"
   "#"
-  ""
-  [(pc)]
+  "&& reload_completed"
+  [(match_dup 3)
+   (set (match_dup 0)
+	(float:SFDF (match_dup 2)))]
 {
-  rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp;
+  rtx tmp = operands[2];
 
-  if (!MEM_P (src) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
-    tmp = convert_to_mode (DImode, src, true);
-  else
-    {
-      tmp = operands[2];
-      if (GET_CODE (tmp) == SCRATCH)
-	tmp = gen_reg_rtx (DImode);
-      if (MEM_P (src))
-	{
-	  src = rs6000_force_indexed_or_indirect_mem (src);
-	  emit_insn (gen_lfiwzx (tmp, src));
-	}
-      else
-	{
-	  rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
-	  emit_move_insn (stack, src);
-	  emit_insn (gen_lfiwzx (tmp, stack));
-	}
-    }
-  emit_insn (gen_floatdi<mode>2 (dest, tmp));
-  DONE;
+  operands[3] = (TARGET_DIRECT_MOVE_64BIT
+		 ? gen_extendsidi2 (tmp, src)
+		 : gen_lfiwax (tmp, src));
 }
-  [(set_attr "length" "12")
-   (set_attr "type" "fpload")])
+  [(set_attr "length" "8,8,8,8,8,12")
+   (set_attr "type" "fpload,fpload,fpload,mffgpr,fp,fp")
+   (set_attr "isa" "*,p7v,p8v,p8v,p9v,p8v")])
 
-(define_insn_and_split "floatunssi<mode>2_lfiwzx_mem"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>")
+;; LFIWZX LXSIWZX MTVSRWZ XXEXTRACTUW
+;; The first alternative is to support -mno-vsx.
+(define_insn_and_split "floatunssi<mode>2_lfiwzx"
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa")
 	(unsigned_float:SFDF
-	 (zero_extend:DI
-	  (match_operand:SI 1 "indexed_or_indirect_operand" "Z"))))
-   (clobber (match_scratch:DI 2 "=wa"))]
+	 (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,wa")))
+   (clobber (match_scratch:DI 2 "=d,d,v,wa,wa"))]
   "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>"
   "#"
-  ""
-  [(pc)]
+  "&& reload_completed"
+  [(match_dup 3)
+   (set (match_dup 0)
+	(float:SFDF (match_dup 2)))]
 {
-  operands[1] = rs6000_force_indexed_or_indirect_mem (operands[1]);
-  if (GET_CODE (operands[2]) == SCRATCH)
-    operands[2] = gen_reg_rtx (DImode);
-  if (TARGET_P8_VECTOR)
-    emit_insn (gen_zero_extendsidi2 (operands[2], operands[1]));
-  else
-    emit_insn (gen_lfiwzx (operands[2], operands[1]));
-  emit_insn (gen_floatdi<mode>2 (operands[0], operands[2]));
-  DONE;
+  rtx src = operands[1];
+  rtx tmp = operands[2];
+
+  operands[3] = (TARGET_DIRECT_MOVE_64BIT
+		 ? gen_zero_extendsidi2 (tmp, src)
+		 : gen_lfiwzx (tmp, src));
+
 }
   [(set_attr "length" "8")
-   (set_attr "type" "fpload")])
+   (set_attr "type" "fpload,fpload,fpload,mffgpr,vecexts")
+   (set_attr "isa" "*,p7v,p8v,p8v,p9v")])
 
 ; For each of these conversions, there is a define_expand, a define_insn
 ; with a '#' template, and a define_split (with C code).  The idea is
Index: gcc/testsuite/gcc.target/powerpc/pr81348.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr81348.c	(revision 272165)
+++ gcc/testsuite/gcc.target/powerpc/pr81348.c	(working copy)
@@ -1,9 +1,11 @@
 /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power9 -Og" } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
 
 /* PR target/81348: Compiler died in doing short->float conversion due to using
-   the wrong register in a define_split.  */
+   the wrong register in a define_split.  Originially it failed with -Og.
+   Changes due to PR 90822 meant that -Og does not generate the lxsihzx and
+   vextsh2d instructions, but -O2 does.  */
 
 int a;
 short b;


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797