V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

H.J. Lu hjl.tools@gmail.com
Fri Oct 19 10:31:00 GMT 2018


On 10/18/18, Jan Hubicka <hubicka@ucw.cz> wrote:
>> we need to generate
>>
>>	vxorp[ds]	%xmmN, %xmmN, %xmmN
>>	...
>>	vcvtss2sd	f(%rip), %xmmN, %xmmX
>>	...
>>	vcvtsi2ss	i(%rip), %xmmN, %xmmY
>>
>> to avoid partial XMM register stall.  This patch adds a pass to generate
>> a single
>>
>> 	vxorps		%xmmN, %xmmN, %xmmN
>>
>> at function entry, which is shared by all SF and DF conversions, instead
>> of generating one
>>
>> 	vxorp[ds]	%xmmN, %xmmN, %xmmN
>>
>> for each SF/DF conversion.
>>
>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>>
>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>>
>> are
>>
>> 1. On Broadwell server:
>>
>> 500.perlbench_r (-0.82%)
>> 502.gcc_r (0.73%)
>> 505.mcf_r (-0.24%)
>> 520.omnetpp_r (-2.22%)
>> 523.xalancbmk_r (-1.47%)
>> 525.x264_r (0.31%)
>> 531.deepsjeng_r (0.27%)
>> 541.leela_r (0.85%)
>> 548.exchange2_r (-0.11%)
>> 557.xz_r (-0.34%)
>> Geomean: (-0.23%)
>>
>> 503.bwaves_r (0.00%)
>> 507.cactuBSSN_r (-1.88%)
>> 508.namd_r (0.00%)
>> 510.parest_r (-0.56%)
>> 511.povray_r (0.49%)
>> 519.lbm_r (-1.28%)
>> 521.wrf_r (-0.28%)
>> 526.blender_r (0.55%)
>> 527.cam4_r (-0.20%)
>> 538.imagick_r (2.52%)
>> 544.nab_r (-0.18%)
>> 549.fotonik3d_r (-0.51%)
>> 554.roms_r (-0.22%)
>> Geomean: (0.00%)
>
> I wonder why the patch seems to have more effect on specint that should not
> care much
> about float<->double conversions?

These are within noise range.

>> number of vxorp[ds]:
>>
>> before		after		difference
>> 14570		4515		-69%
>>
>> OK for trunk?
>
> This looks very nice though.
>

> +  if (v4sf_const0)
> +    {
> +      /* Generate a single vxorps at function entry and preform df
> +	 rescan. */
> +      bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
> +      insn = BB_HEAD (bb);
> +      set = gen_rtx_SET (v4sf_const0, CONST0_RTX (V4SFmode));
> +      set_insn = emit_insn_after (set, insn);
> +      df_insn_rescan (set_insn);
> +      df_process_deferred_rescans ();
> +    }
>
> It seems suboptimal to place the const0 at the entry of function - if the
> conversoin happens in cold region of function this will just increase
> register
> pressure.  I guess right answer would be to look for the postdominance
> frontier

Did you mean "the nearest common dominator"?

> of the set of all uses of the zero register?
>

Here is the updated patch to adds a pass to generate a single

	vxorps		%xmmN, %xmmN, %xmmN

at entry of the nearest common dominator for basic blocks with SF/DF
conversions.  OK for trunk?

Thanks.


-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-i386-Add-pass_remove_partial_avx_dependency.patch
Type: text/x-patch
Size: 11841 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20181019/4c5edcfd/attachment.bin>


More information about the Gcc-patches mailing list