This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Wed, Nov 17, 2010 at 8:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Nov 17, 2010 at 3:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>>> >>>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb)) >>>>>> >>>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in >>>>>> move_or_delete_vzeroupper_2. ?This patch does it. >>>>> >>>>> Huh? The loop does simple linear scan of all insns in the bb, so it >>>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END >>>>> (bb), but it has NEXT_INSN (BB_END (bb))? >>>> >>>> It has BB_END, but it won't be visited by NEXT_INSN starting from >>>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the >>>> end of the BB everywhere in gcc. >>>> >>>>> Can you please provide a test case that illustrates this? >>>>> >>>> >>>> I am enclosing a work in progress. ?We noticed that we are >>>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006. >>>> One isssue is we may have >>>> >>>> foo: >>>> >>>> ? ? ? call bar <<<<< Missing vzeroupper >>>> >>>> ? ? ? 256bit vectorized insn >>>> ? ? ? goto foo >>>> >>>> We miss vzeroupper before call bar. ?We don't have a small testcase. >>>> But this patch fixes this case by inspection. We are checking other >>>> cases. >>> >>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool >>> upper_128bits_set) >>> ? ? ? ? ? ? bb->index, upper_128bits_set); >>> >>> ? insn = BB_HEAD (bb); >>> + ?last = NEXT_INSN (BB_END (bb)); >>> ? while (insn != BB_END (bb)) >>> ? ? { >>> ? ? ? insn = NEXT_INSN (insn); >>> + ? ? ?if (insn == last) >>> + ? ? ? break; >>> >>> ? ? ? if (!NONDEBUG_INSN_P (insn)) >>> ? ? ? ?continue; >>> >>> The change above is not needed. The new check is never triggered - the >>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to >>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN >>> (BB_END (bb))" is needed. >> >> Here is the patch for >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519 >> >> We have 2 blocks pointing to each others. This patch first scans >> all blocks without moving vzeroupper so that we can have accurate >> information about upper 128bits at block entry. >> >>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped >>> with NEXT_INSN.) >> >> Please try gcc.target/i386/avx-vzeroupper-20.c. ?It will >> trigger this condition. >> >>> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style) >>> >>> ? /* Emit vzeroupper if needed. ?*/ >>> ? if (TARGET_VZEROUPPER >>> - ? ? ?&& cfun->machine->use_avx256_p >>> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize) >>> ? ? ? && !cfun->machine->caller_return_avx256_p) >>> ? ? { >>> ? ? ? cfun->machine->use_vzeroupper_p = 1; >>> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, >>> ? ? } >>> >>> ? /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration. ?*/ >>> - ?if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p) >>> + ?if (TARGET_VZEROUPPER >>> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize)) >>> >>> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with >>> -ftree-vectorize?! It looks that parts (or state machine) that set >>> ...->use_avx256_p flag should be fixed. >> >> There are: >> >> foo: >> >> ? ? ?call bar <<<<< Missing vzeroupper >> >> ? ? ?256bit vectorized insn >> ? ? ?goto foo >> >> I couldn't find a hook to set use_avx256_p before RTL expansion >> starts. >> >>> ? ? { >>> ? ? ? rtx unspec; >>> ? ? ? int avx256; >>> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c >>> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c >>> new file mode 100644 >>> index 0000000..3301083 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c >>> @@ -0,0 +1,13 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */ >>> + >>> +extern void free (void *); >>> +void >>> +bar (void *ncstrp) >>> +{ >>> + ?if(ncstrp==((void *)0)) >>> + ? ?return; >>> + ?free(ncstrp); >>> +} >>> + >>> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */ >>> >>> Hm, this testcase doesn't go together with the above change. There is >>> no vectorization involved, and the scan checks that vzeroupper is NOT >>> emitted. >>> >> >> This testcase is for >> >> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb)) >> I sent the patch without comments too soon. As discussed in PR, setting and checking use_avx256_p isn't reliable. This patch removes use_avx256_p. Any comments? Thanks. -- H.J. --- gcc/ 2010-11-17 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c (block_info_def): Add scanned and no_avx256. (move_or_delete_vzeroupper_2): Properly check the end of basic block. Call note_stores only if no_avx256 is false. (scan_live_upper_128bits_2): New. (scan_live_upper_128bits_1): Likewise. (move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to scan predecessor blocks of all exit points. (use_avx256_p): Removed. (init_cumulative_args): Don't set use_avx256_p. (ix86_function_arg): Likewise. (ix86_expand_move): Likewise. (ix86_expand_vector_move_misalign): Likewise. (ix86_local_alignment): Likewise. (ix86_minimum_alignment): Likewise. (ix86_expand_epilogue): Don't check use_avx256_p when generating vzeroupper. (ix86_expand_call): Likewise. * config/i386/i386.h (machine_function): Remove use_avx256_p. gcc/testsuite/ 2010-11-17 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper. * gcc.target/i386/avx-vzeroupper-11.c: Likewise. * gcc.target/i386/avx-vzeroupper-20.c: New.
Attachment:
gcc-pr46519-2.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |