This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

ia64 mulv8qi3 expander


Richard,

may I point out three problems with this recently added expansion:

(1) Trying to get is used triggers an ICE at config/ia64/ia64.c:5211
(in rtx_needs_barrier). Fixing this requires CONST_* to be added to the
circumventing condition for the call to abort at the end of the PARALLEL
case in that function.
(2) The last gen_mix1_r has appearantly incorrect first and second
arguments (lz and lm used where certainly hz and hm are meant).
(3) The final gen_pack2_sss is making the whole thing behave
differently than 8 individual multiplications: the latter would yield
reduced modulo 2**n (with n being the width of the type, 8 here)
results, this instruction, however, saturates overflowed values. It
would seem desirable (and in the context of an autovectorizer even
necessary, since it might fold expressions using explicit casts) to
match commonly expected behavior (especially if this can be done at even
smaller cost). Therefore I have an alternative suggestion for
implementing this:

(define_expand "mulv8qi3"
  [(set (match_operand:V8QI 0 "gr_register_operand" "")
	(mult:V8QI (match_operand:V8QI 1 "gr_register_operand" "r")
		   (match_operand:V8QI 2 "gr_register_operand" "r")))]
  ""
{
  rtx r1, l1, r2, l2, rm, lm;

  r1 = gen_reg_rtx (V4HImode);
  l1 = gen_reg_rtx (V4HImode);
  r2 = gen_reg_rtx (V4HImode);
  l2 = gen_reg_rtx (V4HImode);

  /* Zero-extend the QImode elements into two words of HImode elements
     by interleaving them with zero bytes.  */
  emit_insn (gen_mix1_r (gen_lowpart (V8QImode, r1),
			 operands[1], CONST0_RTX (V8QImode)));
  emit_insn (gen_mix1_r (gen_lowpart (V8QImode, r2),
			 operands[2], CONST0_RTX (V8QImode)));
  emit_insn (gen_mix1_l (gen_lowpart (V8QImode, l1),
			 operands[1], CONST0_RTX (V8QImode)));
  emit_insn (gen_mix1_l (gen_lowpart (V8QImode, l2),
			 operands[2], CONST0_RTX (V8QImode)));

  /* Multiply.  */
  rm = gen_reg_rtx (V4HImode);
  lm = gen_reg_rtx (V4HImode);
  emit_insn (gen_mulv4hi3 (rm, r1, r2));
  emit_insn (gen_mulv4hi3 (lm, l1, l2));

  /* Zap the high order bytes of the HImode elements by overwriting
those
     in one part with the low order bytes of the other.  */
  emit_insn (gen_mix1_r (operands[0],
			 gen_lowpart (V8QImode, rm),
			 gen_lowpart (V8QImode, lm)));
  DONE;
})

Additionally, the itanium_class attribute of mulv4hi3 is wrong, it
should be mmmul.

Finally, the three (out of four) fpack_* having either or both operands
designated as xf seem wrong to me, as there's not going to be any
rounding nor any FP exceptions (the scalar code always correctly issues
fnorm.s for such truncations). I think these three need to be removed.

Jan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]