[PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

Tue Feb 4 09:39:00 GMT 2020

Hi!

As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
registers that aren't ever live in the function before and break the
prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
consider all [xyz]mm registers call clobbered, but the ms ABI considers
xmm0-15 call used but the bits above low 128 ones call clobbered).
The following patch fixes it by not adding the clobbers during vzeroupper
pass (before pro_and_epilogue), but adding them for -fipa-ra purposes only
during the final output.  Perhaps we could add some CLOBBERs early (say for
df_regs_ever_live_p regs that aren't live in the live_regs bitmap, or
depending on the ABI either add all of them immediately, or for ms ABI add
CLOBBERs for xmm0-xmm5 if they don't have a SET) and add the rest later.
And the addition could be perhaps done at other spots, e.g. in an
epilogue_completed guarded splitter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or any of the above mentioned variants?

A separate issue is CALL_USED_REGISTERS for msabi on xmm16-xmm31.

And yet another issue is that the presence of a vzeroupper instruction does
(and the patch doesn't change) prevent callers from trying to preserve
something in the xmm0-xmm15 registers if the callee doesn't use them.
Would be nice to be able to say that the vzeroupper will preserve the low
128 bits, so it is ok to hold a 128-bit value across the call, but not
256-bit or larger.  Not sure if the RA has a way to express that (and IPA-RA
certainly ATM doesn't).

2020-02-04  Jakub Jelinek  <jakub@redhat.com>

	PR target/92190
	* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
	include sets and not clobbers in the vzeroupper pattern.
	* config/i386/sse.md (*avx_vzeroupper): Add the clobbers during
	pattern output.

	* gcc.target/i386/pr92190.c: New test.

--- gcc/config/i386/i386-features.c.jj	2020-02-03 13:17:15.919723150 +0100
+++ gcc/config/i386/i386-features.c	2020-02-03 18:30:08.274865983 +0100
@@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
 
      (set (reg:V2DF R) (reg:V2DF R))
 
-   which preserves the low 128 bits but clobbers the upper bits.
-   For a dead register we just use:
-
-     (clobber (reg:V2DF R))
-
-   which invalidates any previous contents of R and stops R from becoming
-   live across the vzeroupper in future.  */
+   which preserves the low 128 bits but clobbers the upper bits.  */
 
 static void
 ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
 {
   rtx pattern = PATTERN (insn);
   unsigned int nregs = TARGET_64BIT ? 16 : 8;
-  rtvec vec = rtvec_alloc (nregs + 1);
-  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  unsigned int npats = nregs;
   for (unsigned int i = 0; i < nregs; ++i)
     {
       unsigned int regno = GET_SSE_REGNO (i);
+      if (!bitmap_bit_p (live_regs, regno))
+	npats--;
+    }
+  if (npats == 0)
+    return;
+  rtvec vec = rtvec_alloc (npats + 1);
+  RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+  for (unsigned int i = 0, j = 0; i < nregs; ++i)
+    {
+      unsigned int regno = GET_SSE_REGNO (i);
+      if (!bitmap_bit_p (live_regs, regno))
+	continue;
       rtx reg = gen_rtx_REG (V2DImode, regno);
-      if (bitmap_bit_p (live_regs, regno))
-	RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
-      else
-	RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+      ++j;
+      RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
     }
   XVEC (pattern, 0) = vec;
   df_insn_rescan (insn);
--- gcc/config/i386/sse.md.jj	2020-02-03 13:17:15.957722589 +0100
+++ gcc/config/i386/sse.md	2020-02-03 18:30:08.280865894 +0100
@@ -19819,7 +19819,34 @@ (define_insn "*avx_vzeroupper"
   [(match_parallel 0 "vzeroupper_pattern"
      [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
   "TARGET_AVX"
-  "vzeroupper"
+{
+  if (flag_ipa_ra && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1)
+    {
+      /* For IPA-RA purposes, make it clear the instruction clobbers
+	 even XMM registers not mentioned explicitly in the pattern.  */
+      unsigned int nregs = TARGET_64BIT ? 16 : 8;
+      unsigned int npats = XVECLEN (operands[0], 0);
+      rtvec vec = rtvec_alloc (nregs + 1);
+      RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
+      for (unsigned int i = 0, j = 1; i < nregs; ++i)
+	{
+	  unsigned int regno = GET_SSE_REGNO (i);
+	  if (j < npats
+	      && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
+	    {
+	      RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
+	      j++;
+	    }
+	  else
+	    {
+	      rtx reg = gen_rtx_REG (V2DImode, regno);
+	      RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+	    }
+	}
+      XVEC (operands[0], 0) = vec;
+    }
+  return "vzeroupper";
+}
   [(set_attr "type" "sse")
    (set_attr "modrm" "0")
    (set_attr "memory" "none")
--- gcc/testsuite/gcc.target/i386/pr92190.c.jj	2020-02-03 18:29:30.924415644 +0100
+++ gcc/testsuite/gcc.target/i386/pr92190.c	2020-02-03 18:29:15.992635389 +0100
@@ -0,0 +1,19 @@
+/* PR target/92190 */
+/* { dg-do compile { target { *-*-linux* && lp64 } } } */
+/* { dg-options "-mabi=ms -O2 -mavx512f" } */
+
+typedef char VC __attribute__((vector_size (16)));
+typedef int VI __attribute__((vector_size (16 * sizeof 0)));
+VC a;
+VI b;
+void bar (VI);
+void baz (VC);
+
+void
+foo (void)
+{
+  VC k = a;
+  VI n = b;
+  bar (n);
+  baz (k);
+}

	Jakub