[PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
Yuliang Wang
Yuliang.Wang@arm.com
Thu Aug 29 14:17:00 GMT 2019
This patch allows for more efficient SVE2 vectorization of Multiply High with Round and Scale (MULHRS) patterns.
The example snippet:
uint16_t a[N], b[N], c[N];
void foo_round (void)
{
for (int i = 0; i < N; i++)
a[i] = ((((int32_t)b[i] * (int32_t)c[i]) >> 14) + 1) >> 1;
}
... previously vectorized to:
foo_round:
...
ptrue p0.s
whilelo p1.h, wzr, w2
ld1h {z2.h}, p1/z, [x4, x0, lsl #1]
ld1h {z0.h}, p1/z, [x3, x0, lsl #1]
uunpklo z3.s, z2.h //
uunpklo z1.s, z0.h //
uunpkhi z2.s, z2.h //
uunpkhi z0.s, z0.h //
mul z1.s, p0/m, z1.s, z3.s //
mul z0.s, p0/m, z0.s, z2.s //
asr z1.s, z1.s, #14 //
asr z0.s, z0.s, #14 //
add z1.s, z1.s, #1 //
add z0.s, z0.s, #1 //
asr z1.s, z1.s, #1 //
asr z0.s, z0.s, #1 //
uzp1 z0.h, z1.h, z0.h //
st1h {z0.h}, p1, [x1, x0, lsl #1]
inch x0
whilelo p1.h, w0, w2
b.ne 28
ret
... and now vectorizes to:
foo_round:
...
whilelo p0.h, wzr, w2
nop
ld1h {z1.h}, p0/z, [x4, x0, lsl #1]
ld1h {z2.h}, p0/z, [x3, x0, lsl #1]
umullb z0.s, z1.h, z2.h //
umullt z1.s, z1.h, z2.h //
rshrnb z0.h, z0.s, #15 //
rshrnt z0.h, z1.s, #15 //
st1h {z0.h}, p0, [x1, x0, lsl #1]
inch x0
whilelo p0.h, w0, w2
b.ne 28
ret
nop
Also supported are:
* Non-rounding cases
The equivalent example snippet:
void foo_trunc (void)
{
for (int i = 0; i < N; i++)
a[i] = ((int32_t)b[i] * (int32_t)c[i]) >> 15;
}
... vectorizes with SHRNT/SHRNB
* 32-bit and 8-bit input/output types
* Signed output types
SMULLT/SMULLB are generated instead
SQRDMULH was considered as a potential single-instruction optimization but saturates the intermediate value instead of truncating.
Best Regards,
Yuliang Wang
ChangeLog:
2019-08-22 Yuliang Wang <yuliang.wang@arm.com>
* config/aarch64/aarch64-sve2.md: support for SVE2 instructions [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants
* config/aarch64/iterators.md: iterators and attributes for above
* internal-fn.def: internal functions for MULH[R]S patterns
* optabs.def: optabs definitions for above and sign variants
* tree-vect-patterns.c (vect_recog_multhi_pattern): pattern recognition function for MULHRS
* gcc.target/aarch64/sve2/mulhrs_1.c: new test for all variants
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb11655.patch
Type: application/octet-stream
Size: 20235 bytes
Desc: rb11655.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190829/e634c3a0/attachment.obj>
More information about the Gcc-patches
mailing list