PATCH: Add vzeroupper optimization for AVX

H.J. Lu hjl.tools@gmail.com
Wed Oct 27 11:35:00 GMT 2010


On Tue, Oct 26, 2010 at 12:12 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 11:53 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Oct 25, 2010 at 7:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>>> At RTL expansion time, the vzeroupper optimization generates a
>>>>> vzeroupper_nop before function call and functin return if 256bit AVX
>>>>> instructions are used. The vzeroupper pass is run before final pass.
>>>>
>>>> Can't you run it at the end of machine_reorg instead?
>>>>
>>>
>>> Here is the updated patch without the new pass.  OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> --
>>> H.J.
>>> gcc/
>>>
>>> 2010-10-25  H.J. Lu  <hongjiu.lu@intel.com>
>>>>        * config/i386/i386-protos.h (init_cumulative_args): Add an int.
>>>
>>>        * config/i386/i386.c (block_info): New.
>>>        (BLOCK_INFO): Likewise.
>>>        (RTX_VZEROUPPER_CALLEE_RETURN_AVX256): Likewise.
>>>        (RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256): Likewise.
>>>        (RTX_VZEROUPPER_CALLEE_PASS_AVX256): Likewise.
>>>        (RTX_VZEROUPPER_NO_AVX256): Likewise.
>>>        (check_avx256_stores): Likewise.
>>>        (move_or_delete_vzeroupper_2): Likewise.
>>>        (move_or_delete_vzeroupper_1): Likewise.
>>>        (move_or_delete_vzeroupper): Likewise.
>>>        (use_avx256_p): Likewise.
>>>        (function_pass_avx256_p): Likewise.
>>>        (flag_opts): Add -mvzeroupper.
>>>        (ix86_option_override_internal): Turn on MASK_VZEROUPPER by
>>>        default for TARGET_AVX.  Turn off MASK_VZEROUPPER if TARGET_AVX
>>>        is disabled.
>>>        (ix86_function_ok_for_sibcall): Disable sibcall if we need to
>>>        generate vzeroupper.
>>>        (init_cumulative_args): Add an int to indicate caller.  Set
>>>        use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
>>>        based on return type.
>>>        (ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
>>>        caller_pass_avx256_p based on argument type.
>>>        (ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
>>>        is used, but not returned by caller.
>>>        (ix86_expand_call): Emit vzeroupper if 256bit AVX register is
>>>        used.
>>>        (ix86_local_alignment): Set use_avx256_p if 256bit AVX register
>>>        is used.
>>>        (ix86_minimum_alignment): Likewise.
>>>        (ix86_reorg): Run the vzeroupper optimization if needed.
>>>
>>>        * config/i386/i386.h (ix86_args): Add caller.
>>>        (INIT_CUMULATIVE_ARGS): Updated.
>>>        (machine_function): Add use_vzeroupper_p, use_avx256_p,
>>>        caller_pass_avx256_p, caller_return_avx256_p,
>>>        callee_pass_avx256_p and callee_return_avx256_p.
>>>
>>>        * config/i386/i386.md (UNSPECV_VZEROUPPER_NOP): New.
>>>        * config/i386/sse.md (avx_vzeroupper_nop): Likewise.
>>>
>>>        * config/i386/i386.opt (-mvzeroupper): New.
>>>
>>>        * doc/invoke.texi: Document -mvzeroupper.
>>>
>>> gcc/testsuite/
>>>
>>> 2010-10-25  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        * gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
>>>        * gcc.target/i386/avx-vzeroupper-2.c: Likewise.
>>>
>>>        * gcc.target/i386/avx-vzeroupper-3.c: New.
>>>        * gcc.target/i386/avx-vzeroupper-4.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-5.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-6.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-7.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-8.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-9.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-10.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-11.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-12.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-13.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-14.c: Likewise.
>>>
>>
>>> +/* Callee returns 256bit AVX register.  */
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_AVX256          const1_rtx
>>> +/* Callee returns and passes 256bit AVX register.  */
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256     constm1_rtx
>>> +/* Callee passes 256bit AVX register.  */
>>> +#define RTX_VZEROUPPER_CALLEE_PASS_AVX256            const0_rtx
>>> +/* Callee doesn't return nor passe 256bit AVX register, or no
>>> +   256bit AVX register in function return.  */
>>> +#define RTX_VZEROUPPER_NO_AVX256                     const2_rtx
>
> Those aren't numbers. They are RTX. I can use enum. Then I need to use
> GEN_INT (xxx). I can do that.

Done.

>>
>> Please convert above defines to an enum.
>>> +static void
>>> +move_or_delete_vzeroupper_2 (basic_block curr_block,
>>
>> Please rename all "basic_block" variables simply to "bb". All other
>> parts of gcc name them that way.
>
> Will do.

Done.

>>> +  for (curr_insn = BB_HEAD (curr_block);
>>> +       curr_insn && curr_insn != NEXT_INSN (BB_END (curr_block));
>>> +       curr_insn = next_insn)
>>> +    {
>> ...
>>> +      next_insn = NEXT_INSN (curr_insn);
>>
>> Ugh. Please use "while" loop here. Something like in i386.c/distance_agu_use.
>
> I will investigate.

I replaced inner for loop with while loop. But I kept the outer loop as for
loop since I may delete/move vzeroupper. A for loop is simpler to
get the correct next insn.


>
>>> +;; Clear the upper 128bits of AVX registers, equivalent to a NOP.
>>> +;; This should be used only when the upper 128bits are unused.
>>> +(define_insn "avx_vzeroupper_nop"
>>> +  [(unspec_volatile [(match_operand 0 "const_int_operand" "")]
>>> +                 UNSPECV_VZEROUPPER_NOP)]
>>> +  "TARGET_AVX"
>>> +  "vzeroupper"
>>> +  [(set_attr "type" "sse")
>>> +   (set_attr "modrm" "0")
>>> +   (set_attr "memory" "none")
>>> +   (set_attr "prefix" "vex")
>>> +   (set_attr "mode" "OI")])
>>
>> IMO, there is no need for a new insn pattern. UNSPEC_VOLATILEs clobber
>> all regs, pseudos and memory (see sched-deps.c around line 2528), so
>> it looks to me that vzeroupper_nop and existing vzeroupper are no
>> different. Just use the new pattern instead of avx_vzeroupper and
>> *avx_vzeroupper, and iff -mvzeroupper is passed to gcc, then
>> vzeroupper insn is moved around.
>

Done.


Here is the updated patch.  OK for trunk?

Thanks.


-- 
H.J.
---
gcc/

2010-10-26  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (init_cumulative_args): Add an int.

	* config/i386/i386.c (block_info): New.
	(BLOCK_INFO): Likewise.
	(call_avx256_state): Likewise.
	(check_avx256_stores): Likewise.
	(move_or_delete_vzeroupper_2): Likewise.
	(move_or_delete_vzeroupper_1): Likewise.
	(move_or_delete_vzeroupper): Likewise.
	(use_avx256_p): Likewise.
	(function_pass_avx256_p): Likewise.
	(flag_opts): Add -mvzeroupper.
	(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
	default for TARGET_AVX.  Turn off MASK_VZEROUPPER if TARGET_AVX
	is disabled.
	(ix86_function_ok_for_sibcall): Disable sibcall if we need to
	generate vzeroupper.
	(init_cumulative_args): Add an int to indicate caller.  Set
	use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
	based on return type.
	(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
	caller_pass_avx256_p based on argument type.
	(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
	is used, but not returned by caller.
	(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
	used.
	(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
	is used.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_special_args_builtin): Set target to
	GEN_INT (vzeroupper_intrinsic) for CODE_FOR_avx_vzeroupper.
	(ix86_reorg): Run the vzeroupper optimization if needed.

	* config/i386/i386.h (ix86_args): Add caller.
	(INIT_CUMULATIVE_ARGS): Updated.
	(machine_function): Add use_vzeroupper_p, use_avx256_p,
	caller_pass_avx256_p, caller_return_avx256_p,
	callee_pass_avx256_p and callee_return_avx256_p.

	* config/i386/sse.md (avx_vzeroupper): Removed.
	(*avx_vzeroupper): Removed.
	(avx_vzeroupper): New.

	* config/i386/i386.opt (-mvzeroupper): New.

	* doc/invoke.texi: Document -mvzeroupper.

gcc/testsuite/

2010-10-25  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
	* gcc.target/i386/avx-vzeroupper-2.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-3.c: New.
	* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-14.c: Likewise.



More information about the Gcc-patches mailing list