This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PATCH: PR target/46519: Missing vzeroupper

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: Richard Guenther <richard dot guenther at gmail dot com>
Cc: Uros Bizjak <ubizjak at gmail dot com>, gcc-patches at gcc dot gnu dot org
Date: Sat, 20 Nov 2010 06:11:07 -0800
Subject: Re: PATCH: PR target/46519: Missing vzeroupper
References: <AANLkTimcGSYi1X95FHgr6m6k35CHkso8jPNcuM-=_Oia@mail.gmail.com> <AANLkTinxVSz7_vz3X19kOpr4As5AZPH0_g2wfD5+5Prt@mail.gmail.com> <AANLkTi=wuUYc0rkWpJ5hq_+NKz5cCMzyicwQ3omh2X=H@mail.gmail.com> <AANLkTi=GBP7tLOpXB01K_HO6uQnby+YFo5TCp8zBnKGy@mail.gmail.com>

On Sat, Nov 20, 2010 at 2:53 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Sat, Nov 20, 2010 at 12:31 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Nov 19, 2010 at 2:48 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Nov 19, 2010 at 10:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Nov 18, 2010 at 1:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>
>>>>>> Here is the patch for
>>>>>>
>>>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>>>>>>
>>>>>> We have 2 blocks pointing to each others. This patch first scans
>>>>>> all blocks without moving vzeroupper so that we can have accurate
>>>>>> information about upper 128bits at block entry.
>>>>>
>>>>> This introduces another insn scanning pass, almost the same as
>>>>> existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling).
>>>>>
>>>>> So, if I understand correctly:
>>>>> - The patch removes the detection if the function ever touches AVX registers.
>>>>> - Due to this, all call_insn RTXes have to be decorated with
>>>>> CALL_NEEDS_VZEROUPPER.
>>>>> - A new pre-pass is required that scans all functions in order to
>>>>> detect functions with live AVX registers at exit, and at the same time
>>>>> marks the functions that *do not* use AVX registers.
>>>>> - Existing pass then re-scans everything to again detect functions
>>>>> with live AVX registers at exit and handles vzeroupper emission.
>>>>>
>>>>> I don't think this approach is acceptable. Maybe a LCM infrastructure
>>>>> can be used to handle this case?
>>>>>
>>>>
>>>> Here is the rewrite of the vzeroupper optimization pass.
>>>> To avoid circular dependency, it has 2 passes. ?It
>>>> delays the circular dependency to the second pass
>>>> and avoid rescan as much as possible.
>>>>
>>>> I compared the bootstrap times with/wthout this patch
>>>> on 64bit Sandy Bridge with multilib and --with-fpmath=avx.
>>>> I enabled c,c++,fortran,java,lto,objc
>>>>
>>>> Without patch:
>>>>
>>>> 12378.70user 573.02system 41:54.21elapsed 515%CPU
>>>>
>>>> With patch
>>>>
>>>> 12580.56user 578.07system 42:25.41elapsed 516%CPU
>>>>
>>>> The overhead is about 1.6%.
>>>
>>> That's a quite big overhead for something that doesn't use FP
>>> math (and thus no AVX).
>>
>> AVX256 vector insns are independent of FP math. ?They can be
>> generated by vectorizer as well as loop unroll. ?We can limit
>> it to -O2 or -O3 if overhead is a big concern.
>
> Limiting it to -fexpensive-optimizations would be a good start. ?Btw,
> how is code-size affected? ?Does it make sense to disable it when
> optimizing a function for size? ?As it affects performance of callees
> whether the caller is optimized for size or speed probably isn't the
> best thing to check.
>

We pay penalty at SSE<->AVX transition, not exactly in callee/caller.
We can just check optimize_size.

Here is the updated patch to limit vzeroupper optimization to
-fexpensive-optimizations and not optimizing for size.  OK for trunk?

Thanks.


-- 
H.J.
---
gcc/

2010-11-20  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (upper_128bits_state): New.
	(block_info_def): Remove upper_128bits_set and done.  Add state,
	referenced, count, processed and rescanned.
	(check_avx256_stores): Updated.
	(move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END.
	Call note_stores only if needed.  Set referenced and count.
	(move_or_delete_vzeroupper_1): Updated.  Set rescan_vzeroupper_p.
	(rescan_move_or_delete_vzeroupper): New.
	(move_or_delete_vzeroupper):  Process and rescan all all basic
	blocks instead of predecessor blocks of all exit points.
	(ix86_option_override_internal): Enable vzeroupper optimization
	only for -fexpensive-optimizations and not optimizing for size.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_vzeroupper_p
	and use_avx256_p.  Add rescan_vzeroupper_p.

gcc/testsuite/

2010-11-20  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-14.c: Replace -O0 with -O2.
	* gcc.target/i386/avx-vzeroupper-15.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-16.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-17.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.
	* gcc.target/i386/avx-vzeroupper-21.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-22.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-23.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-24.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-25.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-26.c: Likewise.

Attachment: gcc-pr46519-4.patch
Description: Text document

Follow-Ups:
- Re: PATCH: PR target/46519: Missing vzeroupper
  - From: Uros Bizjak

References:
- PATCH: PR target/46519: Missing vzeroupper
  - From: H.J. Lu
- Re: PATCH: PR target/46519: Missing vzeroupper
  - From: Richard Guenther
- Re: PATCH: PR target/46519: Missing vzeroupper
  - From: H.J. Lu
- Re: PATCH: PR target/46519: Missing vzeroupper
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]