This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: Add vzeroupper optimization for AVX
On Tue, Oct 26, 2010 at 12:12 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 11:53 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Oct 25, 2010 at 7:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>>> At RTL expansion time, the vzeroupper optimization generates a
>>>>> vzeroupper_nop before function call and functin return if 256bit AVX
>>>>> instructions are used. The vzeroupper pass is run before final pass.
>>>>
>>>> Can't you run it at the end of machine_reorg instead?
>>>>
>>>
>>> Here is the updated patch without the new pass. ?OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> --
>>> H.J.
>>> gcc/
>>>
>>> 2010-10-25 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>>> ? ? ? ?* config/i386/i386-protos.h (init_cumulative_args): Add an int.
>>>
>>> ? ? ? ?* config/i386/i386.c (block_info): New.
>>> ? ? ? ?(BLOCK_INFO): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_RETURN_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_CALLEE_PASS_AVX256): Likewise.
>>> ? ? ? ?(RTX_VZEROUPPER_NO_AVX256): Likewise.
>>> ? ? ? ?(check_avx256_stores): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper_2): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper_1): Likewise.
>>> ? ? ? ?(move_or_delete_vzeroupper): Likewise.
>>> ? ? ? ?(use_avx256_p): Likewise.
>>> ? ? ? ?(function_pass_avx256_p): Likewise.
>>> ? ? ? ?(flag_opts): Add -mvzeroupper.
>>> ? ? ? ?(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
>>> ? ? ? ?default for TARGET_AVX. ?Turn off MASK_VZEROUPPER if TARGET_AVX
>>> ? ? ? ?is disabled.
>>> ? ? ? ?(ix86_function_ok_for_sibcall): Disable sibcall if we need to
>>> ? ? ? ?generate vzeroupper.
>>> ? ? ? ?(init_cumulative_args): Add an int to indicate caller. ?Set
>>> ? ? ? ?use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
>>> ? ? ? ?based on return type.
>>> ? ? ? ?(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
>>> ? ? ? ?caller_pass_avx256_p based on argument type.
>>> ? ? ? ?(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
>>> ? ? ? ?is used, but not returned by caller.
>>> ? ? ? ?(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
>>> ? ? ? ?used.
>>> ? ? ? ?(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
>>> ? ? ? ?is used.
>>> ? ? ? ?(ix86_minimum_alignment): Likewise.
>>> ? ? ? ?(ix86_reorg): Run the vzeroupper optimization if needed.
>>>
>>> ? ? ? ?* config/i386/i386.h (ix86_args): Add caller.
>>> ? ? ? ?(INIT_CUMULATIVE_ARGS): Updated.
>>> ? ? ? ?(machine_function): Add use_vzeroupper_p, use_avx256_p,
>>> ? ? ? ?caller_pass_avx256_p, caller_return_avx256_p,
>>> ? ? ? ?callee_pass_avx256_p and callee_return_avx256_p.
>>>
>>> ? ? ? ?* config/i386/i386.md (UNSPECV_VZEROUPPER_NOP): New.
>>> ? ? ? ?* config/i386/sse.md (avx_vzeroupper_nop): Likewise.
>>>
>>> ? ? ? ?* config/i386/i386.opt (-mvzeroupper): New.
>>>
>>> ? ? ? ?* doc/invoke.texi: Document -mvzeroupper.
>>>
>>> gcc/testsuite/
>>>
>>> 2010-10-25 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>>
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-2.c: Likewise.
>>>
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-3.c: New.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
>>> ? ? ? ?* gcc.target/i386/avx-vzeroupper-14.c: Likewise.
>>>
>>
>>> +/* Callee returns 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_AVX256 ? ? ? ? ?const1_rtx
>>> +/* Callee returns and passes 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256 ? ? constm1_rtx
>>> +/* Callee passes 256bit AVX register. ?*/
>>> +#define RTX_VZEROUPPER_CALLEE_PASS_AVX256 ? ? ? ? ? ?const0_rtx
>>> +/* Callee doesn't return nor passe 256bit AVX register, or no
>>> + ? 256bit AVX register in function return. ?*/
>>> +#define RTX_VZEROUPPER_NO_AVX256 ? ? ? ? ? ? ? ? ? ? const2_rtx
>
> Those aren't numbers. They are RTX. I can use enum. Then I need to use
> GEN_INT (xxx). I can do that.
Done.
>>
>> Please convert above defines to an enum.
>>> +static void
>>> +move_or_delete_vzeroupper_2 (basic_block curr_block,
>>
>> Please rename all "basic_block" variables simply to "bb". All other
>> parts of gcc name them that way.
>
> Will do.
Done.
>>> + ?for (curr_insn = BB_HEAD (curr_block);
>>> + ? ? ? curr_insn && curr_insn != NEXT_INSN (BB_END (curr_block));
>>> + ? ? ? curr_insn = next_insn)
>>> + ? ?{
>> ...
>>> + ? ? ?next_insn = NEXT_INSN (curr_insn);
>>
>> Ugh. Please use "while" loop here. Something like in i386.c/distance_agu_use.
>
> I will investigate.
I replaced inner for loop with while loop. But I kept the outer loop as for
loop since I may delete/move vzeroupper. A for loop is simpler to
get the correct next insn.
>
>>> +;; Clear the upper 128bits of AVX registers, equivalent to a NOP.
>>> +;; This should be used only when the upper 128bits are unused.
>>> +(define_insn "avx_vzeroupper_nop"
>>> + ?[(unspec_volatile [(match_operand 0 "const_int_operand" "")]
>>> + ? ? ? ? ? ? ? ? UNSPECV_VZEROUPPER_NOP)]
>>> + ?"TARGET_AVX"
>>> + ?"vzeroupper"
>>> + ?[(set_attr "type" "sse")
>>> + ? (set_attr "modrm" "0")
>>> + ? (set_attr "memory" "none")
>>> + ? (set_attr "prefix" "vex")
>>> + ? (set_attr "mode" "OI")])
>>
>> IMO, there is no need for a new insn pattern. UNSPEC_VOLATILEs clobber
>> all regs, pseudos and memory (see sched-deps.c around line 2528), so
>> it looks to me that vzeroupper_nop and existing vzeroupper are no
>> different. Just use the new pattern instead of avx_vzeroupper and
>> *avx_vzeroupper, and iff -mvzeroupper is passed to gcc, then
>> vzeroupper insn is moved around.
>
Done.
Here is the updated patch. OK for trunk?
Thanks.
--
H.J.
---
gcc/
2010-10-26 H.J. Lu <hongjiu.lu@intel.com>
* config/i386/i386-protos.h (init_cumulative_args): Add an int.
* config/i386/i386.c (block_info): New.
(BLOCK_INFO): Likewise.
(call_avx256_state): Likewise.
(check_avx256_stores): Likewise.
(move_or_delete_vzeroupper_2): Likewise.
(move_or_delete_vzeroupper_1): Likewise.
(move_or_delete_vzeroupper): Likewise.
(use_avx256_p): Likewise.
(function_pass_avx256_p): Likewise.
(flag_opts): Add -mvzeroupper.
(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
default for TARGET_AVX. Turn off MASK_VZEROUPPER if TARGET_AVX
is disabled.
(ix86_function_ok_for_sibcall): Disable sibcall if we need to
generate vzeroupper.
(init_cumulative_args): Add an int to indicate caller. Set
use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
based on return type.
(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
caller_pass_avx256_p based on argument type.
(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
is used, but not returned by caller.
(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
used.
(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
is used.
(ix86_minimum_alignment): Likewise.
(ix86_expand_special_args_builtin): Set target to
GEN_INT (vzeroupper_intrinsic) for CODE_FOR_avx_vzeroupper.
(ix86_reorg): Run the vzeroupper optimization if needed.
* config/i386/i386.h (ix86_args): Add caller.
(INIT_CUMULATIVE_ARGS): Updated.
(machine_function): Add use_vzeroupper_p, use_avx256_p,
caller_pass_avx256_p, caller_return_avx256_p,
callee_pass_avx256_p and callee_return_avx256_p.
* config/i386/sse.md (avx_vzeroupper): Removed.
(*avx_vzeroupper): Removed.
(avx_vzeroupper): New.
* config/i386/i386.opt (-mvzeroupper): New.
* doc/invoke.texi: Document -mvzeroupper.
gcc/testsuite/
2010-10-25 H.J. Lu <hongjiu.lu@intel.com>
* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
* gcc.target/i386/avx-vzeroupper-2.c: Likewise.
* gcc.target/i386/avx-vzeroupper-3.c: New.
* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
* gcc.target/i386/avx-vzeroupper-14.c: Likewise.