This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: PR target/46519: Missing vzeroupper


On Wed, Dec 29, 2010 at 1:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 7:10 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 9:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Dec 17, 2010 at 8:03 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> This patch fixes another missing vzeroupper. ?OK for trunk?
>
>> I'd like to apply this patch instead. It removes escan_move_or_delete_vzeroupper
>> and rewrites move_or_delete_vzeroupper_1 to avoid recursive call. It first scans
>> all basic blocks repeatedly until no basic block changes the upper
>> 128bits of AVX
>> to used at exit. ?Then it rescans all basic blocks with unknown upper
>> 128bit state.
>> OK for trunk?
>
> H.J. explained me in a private mail about the importance of this
> patch. I think that the quote below explains it:
>
> <quote>
>> I'm not sure that the algorithm is correct (and I don't have enough
>> experience in this area), so I'd rather leave the review to someone
>> else. AFAICS, there can be 20 passes, and from comments, it is
>> questionable if this is enough.
>
> I tried several benchmarks which failed before my patch. ?The most pass
> I saw is 2. I can change it to 2 and re-run SPEC CPU 2K/2006 to find
> out what the smallest pass should be.
>
>> I propose that you commit your previous (simple) patch, since IMO this
>
> My simple patch doesn't work on SPEC CPU 2K/2006. It isn't very
> useful for 4.6.
>
>> one is too invasive for this development stage. However, I still think
>
> The old algorithm is obviously incorrect. The new algorithm removes the
> recursive calls and is simpler/faster than the old one. ?vzeroupper optimization
> is a very important new feature for AVX. The current implementation is
> incorrect. ?I'd like to fix it before 4.6 is released.
>
>> that LCM infrastructure (see lcm.c) should be used to place
>> vzerouppers at optimum points.
>
> We will investigate LCM for 4.7.
> </qoute>
>
> I think that due to these reasons, the patch should be committed to
> SVN even in this development stage. Even if the algorithm is not
> optimal, the patch demonstrably produces substantially better code.
> This feature has no impact on generic code without -mvzeroupper /
> -mavx switch, and since there are currently very few AVX users,
> negligible overall impact.
>
>> gcc/
>>
>> 2010-12-18 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>
>> ? ? ? ?PR target/46519
>> ? ? ? ?* config/i386/i386.c (block_info_def): Remove referenced, count
>> ? ? ? ?and rescanned.
>> ? ? ? ?(move_or_delete_vzeroupper_2): Updated.
>> ? ? ? ?(move_or_delete_vzeroupper_1): Rewritten to avoid recursive call.
>> ? ? ? ?(rescan_move_or_delete_vzeroupper): Removed.
>> ? ? ? ?(move_or_delete_vzeroupper): Repeat processing all basic blocks
>> ? ? ? ?until no basic block state is changed to used at exit.
>>
>> gcc/testsuite/
>>
>> 2010-12-18 ?H.J. Lu ?<hongjiu.lu@intel.com>
>>
>> ? ? ? ?PR target/46519
>> ? ? ? ?* gfortran.dg/pr46519-2.f90: New.
>>
>
> The patch is OK, but please allow a day or two for RMs (CC'd) to
> eventually comment.

We will investigate LCM for 4.7.  In the meantime, here is  a small patch
on top of the current one. If the upper 128bits are never changed in a basic
block, we can skip it in the later passes.  OK for trunk together with the
current patch?

Thanks.


-- 
H.J.
---
2010-12-29  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (upper_128bits_state): Update comments.
	(block_info_def): Add unchanged.
	(move_or_delete_vzeroupper_2): Short circuit if upper 128bits
	are unchanged in the block.

Attachment: gcc-pr46519-9.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]