This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Wed, Dec 29, 2010 at 1:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sat, Dec 18, 2010 at 7:10 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Sat, Dec 18, 2010 at 9:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Fri, Dec 17, 2010 at 8:03 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>> >>>> This patch fixes another missing vzeroupper. ?OK for trunk? > >> I'd like to apply this patch instead. It removes escan_move_or_delete_vzeroupper >> and rewrites move_or_delete_vzeroupper_1 to avoid recursive call. It first scans >> all basic blocks repeatedly until no basic block changes the upper >> 128bits of AVX >> to used at exit. ?Then it rescans all basic blocks with unknown upper >> 128bit state. >> OK for trunk? > > H.J. explained me in a private mail about the importance of this > patch. I think that the quote below explains it: > > <quote> >> I'm not sure that the algorithm is correct (and I don't have enough >> experience in this area), so I'd rather leave the review to someone >> else. AFAICS, there can be 20 passes, and from comments, it is >> questionable if this is enough. > > I tried several benchmarks which failed before my patch. ?The most pass > I saw is 2. I can change it to 2 and re-run SPEC CPU 2K/2006 to find > out what the smallest pass should be. > >> I propose that you commit your previous (simple) patch, since IMO this > > My simple patch doesn't work on SPEC CPU 2K/2006. It isn't very > useful for 4.6. > >> one is too invasive for this development stage. However, I still think > > The old algorithm is obviously incorrect. The new algorithm removes the > recursive calls and is simpler/faster than the old one. ?vzeroupper optimization > is a very important new feature for AVX. The current implementation is > incorrect. ?I'd like to fix it before 4.6 is released. > >> that LCM infrastructure (see lcm.c) should be used to place >> vzerouppers at optimum points. > > We will investigate LCM for 4.7. > </qoute> > > I think that due to these reasons, the patch should be committed to > SVN even in this development stage. Even if the algorithm is not > optimal, the patch demonstrably produces substantially better code. > This feature has no impact on generic code without -mvzeroupper / > -mavx switch, and since there are currently very few AVX users, > negligible overall impact. > >> gcc/ >> >> 2010-12-18 ?H.J. Lu ?<hongjiu.lu@intel.com> >> >> ? ? ? ?PR target/46519 >> ? ? ? ?* config/i386/i386.c (block_info_def): Remove referenced, count >> ? ? ? ?and rescanned. >> ? ? ? ?(move_or_delete_vzeroupper_2): Updated. >> ? ? ? ?(move_or_delete_vzeroupper_1): Rewritten to avoid recursive call. >> ? ? ? ?(rescan_move_or_delete_vzeroupper): Removed. >> ? ? ? ?(move_or_delete_vzeroupper): Repeat processing all basic blocks >> ? ? ? ?until no basic block state is changed to used at exit. >> >> gcc/testsuite/ >> >> 2010-12-18 ?H.J. Lu ?<hongjiu.lu@intel.com> >> >> ? ? ? ?PR target/46519 >> ? ? ? ?* gfortran.dg/pr46519-2.f90: New. >> > > The patch is OK, but please allow a day or two for RMs (CC'd) to > eventually comment. We will investigate LCM for 4.7. In the meantime, here is a small patch on top of the current one. If the upper 128bits are never changed in a basic block, we can skip it in the later passes. OK for trunk together with the current patch? Thanks. -- H.J. --- 2010-12-29 H.J. Lu <hongjiu.lu@intel.com> * config/i386/i386.c (upper_128bits_state): Update comments. (block_info_def): Add unchanged. (move_or_delete_vzeroupper_2): Short circuit if upper 128bits are unchanged in the block.
Attachment:
gcc-pr46519-9.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |