This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Add vzeroupper optimization for AVX


On Tue, Oct 26, 2010 at 12:12 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 11:53 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Oct 25, 2010 at 7:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>>> At RTL expansion time, the vzeroupper optimization generates a
>>>>> vzeroupper_nop before function call and functin return if 256bit AVX
>>>>> instructions are used. The vzeroupper pass is run before final pass.
>>>>
>>>> Can't you run it at the end of machine_reorg instead?
>>>>
>>>
>>> Here is the updated patch without the new pass. ?OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> --
>>> H.J.
>>> gcc/
>>>
>>> 2010-10-25 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>>> ? ? ? ?* config/i386/i386-protos.h (init_cumulative_args): Add an int.
>>>
>>> ? ? ? ?* config/i386/i386.c (block_info): New.
>>> ? ? ? ?(BLOCK_INFO): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_RETURN_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_PASS_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_NO_AVX256): Likewise.
>>> ? ? ? ?(check_avx256_stores): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper_2): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper_1): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper): Likewise.
>>> ? ? ? ?(use_avx256_p): Likewise.
>>> ? ? ? ?(function_pass_avx256_p): Likewise.
>>> ? ? ? ?(flag_opts): Add -mvzeroupper.
>>> ? ? ? ?(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
>>> ? ? ? ?default for TARGET_AVX. ?Turn off MASK_VZEROUPPER if TARGET_AVX
>>> ? ? ? ?is disabled.
>>> ? ? ? ?(ix86_function_ok_for_sibcall): Disable sibcall if we need to
>>> ? ? ? ?generate vzeroupper.
>>> ? ? ? ?(init_cumulative_args): Add an int to indicate caller. ?Set
>>> ? ? ? ?use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
>>> ? ? ? ?based on return type.
>>> ? ? ? ?(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
>>> ? ? ? ?caller_pass_avx256_p based on argument type.
>>> ? ? ? ?(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
>>> ? ? ? ?is used, but not returned by caller.
>>> ? ? ? ?(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
>>> ? ? ? ?used.
>>> ? ? ? ?(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
>>> ? ? ? ?is used.
>>> ? ? ? ?(ix86_minimum_alignment): Likewise.
>>> ? ? ? ?(ix86_reorg): Run the vzeroupper optimization if needed.
>>>
>>> ? ? ? ?* config/i386/i386.h (ix86_args): Add caller.
>>> ? ? ? ?(INIT_CUMULATIVE_ARGS): Updated.
>>> ? ? ? ?(machine_function): Add use_vzeroupper_p, use_avx256_p,
>>> ? ? ? ?caller_pass_avx256_p, caller_return_avx256_p,
>>> ? ? ? ?callee_pass_avx256_p and callee_return_avx256_p.
>>>
>>> ? ? ? ?* config/i386/i386.md (UNSPECV_VZEROUPPER_NOP): New.
>>> ? ? ? ?* config/i386/sse.md (avx_vzeroupper_nop): Likewise.
>>>
>>> ? ? ? ?* config/i386/i386.opt (-mvzeroupper): New.
>>>
>>> ? ? ? ?* doc/invoke.texi: Document -mvzeroupper.
>>>
>>> gcc/testsuite/
>>>
>>> 2010-10-25 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>>
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-2.c: Likewise.
>>>
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-3.c: New.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-14.c: Likewise.
>>>
>>
>>> +/* Callee returns 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_AVX256 ? ? ? ? ?const1_rtx
>>> +/* Callee returns and passes 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256 ? ? constm1_rtx
>>> +/* Callee passes 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_PASS_AVX256 ? ? ? ? ? ?const0_rtx
>>> +/* Callee doesn't return nor passe 256bit AVX register, or no
>>> + ? 256bit AVX register in function return. ?*/
>>> +#define RTX_VZEROUPPER_NO_AVX256 ? ? ? ? ? ? ? ? ? ? const2_rtx
>
> Those aren't numbers. They are RTX. I can use enum. Then I need to use
> GEN_INT (xxx). I can do that.

Done.

>>
>> Please convert above defines to an enum.
>>> +static void
>>> +move_or_delete_vzeroupper_2 (basic_block curr_block,
>>
>> Please rename all "basic_block" variables simply to "bb". All other
>> parts of gcc name them that way.
>
> Will do.

Done.

>>> + ?for (curr_insn = BB_HEAD (curr_block);
>>> + ? ? ? curr_insn && curr_insn != NEXT_INSN (BB_END (curr_block));
>>> + ? ? ? curr_insn = next_insn)
>>> + ? ?{
>> ...
>>> + ? ? ?next_insn = NEXT_INSN (curr_insn);
>>
>> Ugh. Please use "while" loop here. Something like in i386.c/distance_agu_use.
>
> I will investigate.

I replaced inner for loop with while loop. But I kept the outer loop as for
loop since I may delete/move vzeroupper. A for loop is simpler to
get the correct next insn.


>
>>> +;; Clear the upper 128bits of AVX registers, equivalent to a NOP.
>>> +;; This should be used only when the upper 128bits are unused.
>>> +(define_insn "avx_vzeroupper_nop"
>>> + ?[(unspec_volatile [(match_operand 0 "const_int_operand" "")]
>>> + ? ? ? ? ? ? ? ? UNSPECV_VZEROUPPER_NOP)]
>>> + ?"TARGET_AVX"
>>> + ?"vzeroupper"
>>> + ?[(set_attr "type" "sse")
>>> + ? (set_attr "modrm" "0")
>>> + ? (set_attr "memory" "none")
>>> + ? (set_attr "prefix" "vex")
>>> + ? (set_attr "mode" "OI")])
>>
>> IMO, there is no need for a new insn pattern. UNSPEC_VOLATILEs clobber
>> all regs, pseudos and memory (see sched-deps.c around line 2528), so
>> it looks to me that vzeroupper_nop and existing vzeroupper are no
>> different. Just use the new pattern instead of avx_vzeroupper and
>> *avx_vzeroupper, and iff -mvzeroupper is passed to gcc, then
>> vzeroupper insn is moved around.
>

Done.


Here is the updated patch.  OK for trunk?

Thanks.


-- 
H.J.
---
gcc/

2010-10-26  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (init_cumulative_args): Add an int.

	* config/i386/i386.c (block_info): New.
	(BLOCK_INFO): Likewise.
	(call_avx256_state): Likewise.
	(check_avx256_stores): Likewise.
	(move_or_delete_vzeroupper_2): Likewise.
	(move_or_delete_vzeroupper_1): Likewise.
	(move_or_delete_vzeroupper): Likewise.
	(use_avx256_p): Likewise.
	(function_pass_avx256_p): Likewise.
	(flag_opts): Add -mvzeroupper.
	(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
	default for TARGET_AVX.  Turn off MASK_VZEROUPPER if TARGET_AVX
	is disabled.
	(ix86_function_ok_for_sibcall): Disable sibcall if we need to
	generate vzeroupper.
	(init_cumulative_args): Add an int to indicate caller.  Set
	use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
	based on return type.
	(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
	caller_pass_avx256_p based on argument type.
	(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
	is used, but not returned by caller.
	(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
	used.
	(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
	is used.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_special_args_builtin): Set target to
	GEN_INT (vzeroupper_intrinsic) for CODE_FOR_avx_vzeroupper.
	(ix86_reorg): Run the vzeroupper optimization if needed.

	* config/i386/i386.h (ix86_args): Add caller.
	(INIT_CUMULATIVE_ARGS): Updated.
	(machine_function): Add use_vzeroupper_p, use_avx256_p,
	caller_pass_avx256_p, caller_return_avx256_p,
	callee_pass_avx256_p and callee_return_avx256_p.

	* config/i386/sse.md (avx_vzeroupper): Removed.
	(*avx_vzeroupper): Removed.
	(avx_vzeroupper): New.

	* config/i386/i386.opt (-mvzeroupper): New.

	* doc/invoke.texi: Document -mvzeroupper.

gcc/testsuite/

2010-10-25  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
	* gcc.target/i386/avx-vzeroupper-2.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-3.c: New.
	* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-14.c: Likewise.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]