This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, rs6000] Add support for usadv16qi and usadv8hi standard patterns
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>
- Date: Mon, 6 Nov 2017 04:17:23 -0600
- Subject: Re: [PATCH, rs6000] Add support for usadv16qi and usadv8hi standard patterns
- Authentication-results: sourceware.org; auth=none
- References: <2a3e7921-c2ee-ff57-677a-f84becc0f002@linux.vnet.ibm.com>
Hi Bill,
On Sun, Nov 05, 2017 at 06:25:11PM -0600, Bill Schmidt wrote:
> This patch adds support for vectorization of unsigned SAD expressions. SAD
> vectorization uses the usad<mode> pattern to represent a widening accumulation
> of SADs performed on a narrower type. The two cases in this patch operate
> on V16QImode and V8HImode, respectively, accumulating into V4SImode. A
> vectorized loop on SAD operations will use these patterns in the main loop
> body and perform a final reduction to sum the 4 accumulated results in the
> V4SImode accumulator during the loop epilogue.
>
> POWER's sum-across ops (vsum4ubs and vsum4shs) unfortunately have saturating
> semantics, so they can only be used for the sum-across; the accumulation
> with previous iteration results requires a separate add.
> @@ -4184,6 +4184,51 @@
> "vbpermd %0,%1,%2"
> [(set_attr "type" "vecsimple")])
>
> +;; Support for SAD (sum of absolute differences).
> +
> +;; Due to saturating semantics, we can't combine the sum-across
> +;; with the vector accumulate in vsum4ubs. A vadduwm is needed.
> +(define_expand "usadv16qi"
> + [(use (match_operand:V4SI 0 "register_operand"))
> + (use (match_operand:V16QI 1 "register_operand"))
> + (use (match_operand:V16QI 2 "register_operand"))
> + (use (match_operand:V4SI 3 "register_operand"))]
> + "TARGET_P9_VECTOR"
> + "
> +{
> + rtx absd = gen_reg_rtx (V16QImode);
> + rtx zero = gen_reg_rtx (V4SImode);
> + rtx psum = gen_reg_rtx (V4SImode);
> +
> + emit_insn (gen_p9_vaduv16qi3 (absd, operands[1], operands[2]));
> + emit_insn (gen_altivec_vspltisw (zero, const0_rtx));
> + emit_insn (gen_altivec_vsum4ubs (psum, absd, zero));
> + emit_insn (gen_addv4si3 (operands[0], psum, operands[3]));
> + DONE;
> +}")
No quotes around the {} block please (twice).
Other than that, looks fine to me, please commit. Thanks,
Segher