PATCH: PR target/46519: Missing vzeroupper

Uros Bizjak ubizjak@gmail.com
Wed Dec 29 11:03:00 GMT 2010


On Sat, Dec 18, 2010 at 7:10 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 9:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Dec 17, 2010 at 8:03 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>
>>> This patch fixes another missing vzeroupper.  OK for trunk?

> I'd like to apply this patch instead. It removes escan_move_or_delete_vzeroupper
> and rewrites move_or_delete_vzeroupper_1 to avoid recursive call. It first scans
> all basic blocks repeatedly until no basic block changes the upper
> 128bits of AVX
> to used at exit.  Then it rescans all basic blocks with unknown upper
> 128bit state.
> OK for trunk?

H.J. explained me in a private mail about the importance of this
patch. I think that the quote below explains it:

<quote>
> I'm not sure that the algorithm is correct (and I don't have enough
> experience in this area), so I'd rather leave the review to someone
> else. AFAICS, there can be 20 passes, and from comments, it is
> questionable if this is enough.

I tried several benchmarks which failed before my patch.  The most pass
I saw is 2. I can change it to 2 and re-run SPEC CPU 2K/2006 to find
out what the smallest pass should be.

> I propose that you commit your previous (simple) patch, since IMO this

My simple patch doesn't work on SPEC CPU 2K/2006. It isn't very
useful for 4.6.

> one is too invasive for this development stage. However, I still think

The old algorithm is obviously incorrect. The new algorithm removes the
recursive calls and is simpler/faster than the old one.  vzeroupper optimization
is a very important new feature for AVX. The current implementation is
incorrect.  I'd like to fix it before 4.6 is released.

> that LCM infrastructure (see lcm.c) should be used to place
> vzerouppers at optimum points.

We will investigate LCM for 4.7.
</qoute>

I think that due to these reasons, the patch should be committed to
SVN even in this development stage. Even if the algorithm is not
optimal, the patch demonstrably produces substantially better code.
This feature has no impact on generic code without -mvzeroupper /
-mavx switch, and since there are currently very few AVX users,
negligible overall impact.

> gcc/
>
> 2010-12-18  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR target/46519
>        * config/i386/i386.c (block_info_def): Remove referenced, count
>        and rescanned.
>        (move_or_delete_vzeroupper_2): Updated.
>        (move_or_delete_vzeroupper_1): Rewritten to avoid recursive call.
>        (rescan_move_or_delete_vzeroupper): Removed.
>        (move_or_delete_vzeroupper): Repeat processing all basic blocks
>        until no basic block state is changed to used at exit.
>
> gcc/testsuite/
>
> 2010-12-18  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR target/46519
>        * gfortran.dg/pr46519-2.f90: New.
>

The patch is OK, but please allow a day or two for RMs (CC'd) to
eventually comment.

Thanks,
Uros.



More information about the Gcc-patches mailing list