V3 [PATCH] i386: Use scalar operand in floating point vec_dup patterns

Sun Oct 21 23:13:00 GMT 2018

On 10/21/18, H.J. Lu <hjl.tools@gmail.com> wrote:
> On 10/17/18, H.J. Lu <hjl.tools@gmail.com> wrote:
>> Since vector registers are also used for scalar floating point values,
>> we can use scalar operand in floating point vec_dup patterns, which
>> enables combiner to generate
>>
>> (set (reg:V8SF 84)
>>      (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y")))))
>>
>> For AVX512 broadcast instructions from integer register operand, we only
>> need to broadcast integer to integer vectors.
>>
>> gcc/
>>
>> 	PR target/87537
>> 	* config/i386/i386-builtin-types.def: Replace
>> 	CODE_FOR_avx2_vec_dupv4sf, CODE_FOR_avx2_vec_dupv8sf and
>> 	CODE_FOR_avx2_vec_dupv4df with CODE_FOR_vec_dupv4sf,
>> 	CODE_FOR_vec_dupv8sf and CODE_FOR_vec_dupv4df, respectively.
>> 	* config/i386/i386.c (expand_vec_perm_1): Replace
>> 	gen_avx512f_vec_dupv16sf_1, gen_avx2_vec_dupv8sf_1 and
>> 	gen_avx512f_vec_dupv8df_1 with gen_avx512f_vec_dupv16sf,
>> 	gen_vec_dupv8sf and gen_avx512f_vec_dupv8df, respectively.
>> 	Duplicate them from scalar operand.
>> 	* config/i386/i386.md (SF to DF splitter): Replace
>> 	gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf.
>> 	* config/i386/sse.md (VF48_AVX512VL): New.
>> 	(avx2_vec_dup<mode>): Removed.
>> 	(avx2_vec_dupv8sf_1): Likewise.
>> 	(avx512f_vec_dup<mode>_1): Likewise.
>> 	(avx2_vec_dupv4df): Likewise.
>> 	(<avx512>_vec_dup<mode><mask_name>:V48_AVX512VL): Likewise.
>> 	(<avx512>_vec_dup<mode><mask_name>:VF48_AVX512VL): New.
>> 	(<avx512>_vec_dup<mode><mask_name>:VI48_AVX512VL): Likewise.
>> 	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>): Replace
>> 	V48_AVX512VL with VI48_AVX512VL.
>> 	(*avx_vperm_broadcast_<mode>): Replace gen_avx2_vec_dupv8sf with
>> 	gen_vec_dupv8sf.
>>
>> gcc/testsuite/
>>
>> 	PR target/87537
>> 	* gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated.
>> 	* gcc.target/i386/avx512vl-vbroadcast-3.c: Likewise.
>
> Here is the updated patch. I added const_vector_duplicate_operand to
> handle constant vector broadcast from memory.  OK for trunk?

Here is the updated patch with a testcase for const_vector_duplicate_operand.
We should split

(set (reg:V16SF 86)
     (const_vector:V16SF
       [(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16])

to

(set (reg:V16SF 86)
     (vec_duplicate:V16SF (mem/u/c:SF (symbol_ref/u:DI ("*.LC1")))))

only before register allocation and we shouldn't split special SSE constants.
OK for trunk?

Thanks.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-i386-Use-scalar-operand-in-floating-point-vec_dup-pa.patch
Type: text/x-patch
Size: 23277 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20181021/6e06cb06/attachment.bin>