[PATCH 10/14][AArch64] Add vcvt(_high)?_f32_f16 intrinsics
Alan Lawrence
alan.lawrence@arm.com
Wed Apr 22 17:21:00 GMT 2015
This adds the two remaining widening intrinsics, first adding patterns in
aarch64-simd.md, then entries in aarch64-simd-builtins.def, and finally
intrinsics in arm_neon.h .
Note this changes the vector indices present in the RTL on bigendian for float
vec_unpacks, to be the same as for integer vec_unpacks. This appears consistent
with the usage of VEC_UNPACK_(FLOAT_)?EXPR in tree-vect-stmts.c, which uses a
different EXPR for the same half of the vector depending on endianness. I was
not able to construct a testcase where the RTL here mattered (i.e. where the RTL
was constant-folded, but the tree had not been), but the correctness can be seen
from a testcase:
double d[4];
void
bar (float *f)
{
for (int i = 0; i < 4; i++)
d[i] = f[i];
}
which used to produced as final RTL (-O3)
(insn:TI 8 10 12 (set (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float
*)f_6(D)] ] [77])
(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
])))) test.c:40 1274 {vec_unpacks_hi_v4sf}
(expr_list:REG_EQUIV (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double
*)&d]+0 S16 A64])
(nil)))
(insn:TI 12 8 11 (set (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])
(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float
*)f_6(D)] ] [77])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
])))) test.c:40 1272 {vec_unpacks_lo_v4sf}
(expr_list:REG_EQUIV (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])
(nil)))
(insn:TI 11 12 15 (set (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)&d]+0
S16 A64])
(reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])) test.c:40 808
{*aarch64_simd_movv2df}
(expr_list:REG_DEAD (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
(nil)))
(insn:TI 15 11 22 (set (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])
(reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])) test.c:40 808
{*aarch64_simd_movv2df}
(expr_list:REG_DEAD (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])
i.e. apparently storing vector elements 2 and 3 to the address of d, and elems
0+1 to address (d+16). Of course this was flipped back again to be correct at
assembly time, but following this patch the RTL indices are also correct (elems
0+1 to address d, elems 2+3 to address d+16).
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_<mode>,
aarch64_simd_vec_unpacks_hi_<mode>): New insn.
(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn.
(vec_unpacks_lo_<mode>, vec_unpacks_hi_<mode>): New expand.
(aarch64_float_extend_lo_v2df): Rename to...
(aarch64_float_extend_lo_<Vwide>): this, using VDF and so adding V4SF.
* config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf.
(float_extend_lo): Add v4sf.
* config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New.
* config/aarch64/iterators.md (VQ_HSF): New iterator.
(VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF.
(Vwide): New mode_attr.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 10_aarch64_vcvt_high_f32_f16.patch
Type: text/x-patch
Size: 6873 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150422/1cc1b38d/attachment.bin>
More information about the Gcc-patches
mailing list