[PATCH 10/14][AArch64] Add vcvt(_high)?_f32_f16 intrinsics

Wed Apr 22 17:21:00 GMT 2015

This adds the two remaining widening intrinsics, first adding patterns in 
aarch64-simd.md, then entries in aarch64-simd-builtins.def, and finally 
intrinsics in arm_neon.h .

Note this changes the vector indices present in the RTL on bigendian for float 
vec_unpacks, to be the same as for integer vec_unpacks. This appears consistent 
with the usage of VEC_UNPACK_(FLOAT_)?EXPR in tree-vect-stmts.c, which uses a 
different EXPR for the same half of the vector depending on endianness. I was 
not able to construct a testcase where the RTL here mattered (i.e. where the RTL 
was constant-folded, but the tree had not been), but the correctness can be seen 
from a testcase:

double d[4];
void
bar (float *f)
{
   for (int i = 0; i < 4; i++)
     d[i] = f[i];
}

which used to produced as final RTL (-O3)

(insn:TI 8 10 12 (set (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
         (float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])
                 (parallel [
                         (const_int 2 [0x2])
                         (const_int 3 [0x3])
                     ])))) test.c:40 1274 {vec_unpacks_hi_v4sf}
      (expr_list:REG_EQUIV (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double 
*)&d]+0 S16 A64])
         (nil)))
(insn:TI 12 8 11 (set (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])
         (float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])
                 (parallel [
                         (const_int 0 [0])
                         (const_int 1 [0x1])
                     ])))) test.c:40 1272 {vec_unpacks_lo_v4sf}
      (expr_list:REG_EQUIV (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
                 (const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])
         (nil)))
(insn:TI 11 12 15 (set (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)&d]+0 
S16 A64])
         (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])) test.c:40 808 
{*aarch64_simd_movv2df}
      (expr_list:REG_DEAD (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
         (nil)))
(insn:TI 15 11 22 (set (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
                 (const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])
         (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])) test.c:40 808 
{*aarch64_simd_movv2df}
      (expr_list:REG_DEAD (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])

i.e. apparently storing vector elements 2 and 3 to the address of d, and elems 
0+1 to address (d+16). Of course this was flipped back again to be correct at 
assembly time, but following this patch the RTL indices are also correct (elems 
0+1 to address d, elems 2+3 to address d+16).

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_<mode>,
	aarch64_simd_vec_unpacks_hi_<mode>): New insn.
	(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn.
	(vec_unpacks_lo_<mode>, vec_unpacks_hi_<mode>): New expand.
	(aarch64_float_extend_lo_v2df): Rename to...
	(aarch64_float_extend_lo_<Vwide>): this, using VDF and so adding V4SF.

	* config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf.
	(float_extend_lo): Add v4sf.

	* config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New.
	* config/aarch64/iterators.md (VQ_HSF): New iterator.
	(VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF.
	(Vwide): New mode_attr.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 10_aarch64_vcvt_high_f32_f16.patch
Type: text/x-patch
Size: 6873 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20150422/1cc1b38d/attachment.bin>