This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH, rs6000] Correct handling of multiply high-part for little endian

From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
To: gcc-patches at gcc dot gnu dot org
Cc: dje dot gcc at gmail dot com
Date: Wed, 30 Oct 2013 17:55:49 -0500
Subject: [PATCH, rs6000] Correct handling of multiply high-part for little endian
Authentication-results: sourceware.org; auth=none

Hi,

When working around the peculiar little-endian semantics of the vperm
instruction, our usual fix is to complement the permute control vector
and swap the order of the two vector input operands, so that we get a
double-wide vector in the proper order.  We don't want to swap the
operands when we are expanding a mult_highpart operation, however, as
the two input operands are not to be interpreted as a double-wide
vector.  Instead they represent odd and even elements, and swapping the
operands gets the odd and even elements reversed in the final result.

The permute for this case is generated by target-neutral code in
optabs.c: expand_mult_highpart ().  We obviously can't change that code
directly.  However, we can redirect the logic from the "case 2" method
to target-specific code by implementing expansions for the
umul<mode>3_highpart and smul<mode>3_highpart operations.  I've done
this, with the expansions acting exactly as expand_mult_highpart does
today, with the exception that it swaps the input operands to the call
to expand_vec_perm when we are generating little-endian code.  We will
later swap them back to their original position in the code in rs6000.c:
altivec_expand_vec_perm_const_le ().

The change has no intended effect when generating big-endian code.

Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
regressions.  This fixes the gcc.dg/vect/pr51581-4.c test failure for
little endian.  Ok for trunk?

Thanks,
Bill


2013-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
	prototype.
	* config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
	* config/rs6000/altivec.md (umul<mode>3_highpart): New.
	(smul_<mode>3_highpart): New.


Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 204192)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -58,6 +58,7 @@ extern void rs6000_expand_vector_extract (rtx, rtx
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
+extern bool altivec_expand_mul_highpart (rtx op[3], bool);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
 extern void build_mask64_2_operands (rtx, rtx *);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 204192)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -29249,6 +29249,58 @@ rs6000_do_expand_vec_perm (rtx target, rtx op0, rt
     emit_move_insn (target, x);
 }
 
+/* Expand an Altivec multiply high-part.  The logic matches the 
+   general logic in optabs.c:expand_mult_highpart, but swaps the
+   inputs for little endian.  Note that we will swap them again
+   during the permute; this is the one case where we don't want
+   the operands swapped, as if we do we get the even and odd values
+   reversed.  */
+
+bool
+altivec_expand_mul_highpart (rtx operands[3], bool uns_p)
+{
+  struct expand_operand eops[3];
+  rtx m1, m2, perm, tmp;
+  int i;
+
+  optab tab1 = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab;
+  optab tab2 = uns_p ? vec_widen_umult_odd_optab : vec_widen_umult_odd_optab;
+  enum machine_mode mode = GET_MODE (operands[0]);
+  enum insn_code icode = optab_handler (tab1, mode);
+  int nunits = GET_MODE_NUNITS (mode);
+  enum machine_mode wmode = insn_data[icode].operand[0].mode;
+  rtvec v = rtvec_alloc (nunits);
+
+  create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
+  create_input_operand (&eops[1], operands[1], mode);
+  create_input_operand (&eops[2], operands[2], mode);
+  expand_insn (icode, 3, eops);
+  m1 = gen_lowpart (mode, eops[0].value);
+
+  create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
+  create_input_operand (&eops[1], operands[1], mode);
+  create_input_operand (&eops[2], operands[2], mode);
+  expand_insn (optab_handler (tab2, mode), 3, eops);
+  m2 = gen_lowpart (mode, eops[0].value);
+
+  for (i = 0; i < nunits; ++i)
+    RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
+				+ ((i & 1) ? nunits : 0));
+
+  perm = gen_rtx_CONST_VECTOR (mode, v);
+
+  if (!BYTES_BIG_ENDIAN) {
+    tmp = m1;
+    m1 = m2;
+    m2 = tmp;
+  }
+
+  perm = expand_vec_perm (mode, m1, m2, perm, NULL_RTX);
+  emit_move_insn (operands[0], perm);
+
+  return true;
+}
+
 /* Expand an extract even operation.  */
 
 void
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 204192)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1401,6 +1401,30 @@
     FAIL;
 })
 
+(define_expand "umul<mode>3_highpart"
+  [(match_operand:VIshort 0 "register_operand" "")
+   (match_operand:VIshort 1 "register_operand" "")
+   (match_operand:VIshort 2 "register_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  if (altivec_expand_mul_highpart (operands, /*uns_p*/true))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_expand "smul<mode>3_highpart"
+  [(match_operand:VIshort 0 "register_operand" "")
+   (match_operand:VIshort 1 "register_operand" "")
+   (match_operand:VIshort 2 "register_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  if (altivec_expand_mul_highpart (operands, /*uns_p*/false))
+    DONE;
+  else
+    FAIL;
+})
+
 (define_insn "altivec_vrfip"		; ceil
   [(set (match_operand:V4SF 0 "register_operand" "=v")
         (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")]

Follow-Ups:
- Re: [PATCH, rs6000] Correct handling of multiply high-part for little endian
  - From: David Edelsohn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]