Created attachment 22430 [details] A testcase /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O3 -funroll-loops -ffast-math -mavx -Wno-multichar -S bad.c .L531: vmovapd %ymm0, (%rsp) vzeroupper call Get_Token movl Token_Id(%rip), %eax vmovapd (%rsp), %ymm0 cmpl $1, %eax je .L139 cmpl $2, %eax je .L541 L171: vmovsd (%rax), %xmm9 vsubsd 32(%rsp,%rcx,8), %xmm9, %xmm8 leal 1(%rdx), %ecx cmpl %ecx, %esi vmovsd %xmm8, (%rax) jg .L543 jmp .L531 .p2align 4,,10 .p2align 3 .L541: leaq 80(%rsp), %rsi movq %r12, %rdi call Parse_Num_Term <<<< missing vzeroupper movl 0(%r13), %esi movl 80(%rsp), %eax vmovapd (%rsp), %ymm0 cmpl %eax, %esi
Does the patch at [1] also fix this test? [1] http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01802.html
(In reply to comment #1) > Does the patch at [1] also fix this test? > > [1] http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01802.html No, that is for a different case.
Created attachment 22437 [details] A testcase With -O3 -funroll-loops -ffast-math -mavx: movl Token_Id(%rip), %eax vmovapd 32(%rsp), %ymm0 cmpl $1, %eax je .L4 cmpl $2, %eax je .L5 testl %eax, %eax jne .L464 leaq 124(%rsp), %rsi leaq 64(%rsp), %rdi call _Z16Parse_Rel_FactorPdPi <<<< Missing vzeroupper movl 124(%rsp), %esi .L856: vmovapd %ymm0, 32(%rsp) call _Z9Get_Tokenv <<<< Missing vzeroupper movl Token_Id(%rip), %eax vmovapd 32(%rsp), %ymm0 ... .L491: leaq 252(%rsp), %rsi leaq 64(%rsp), %rdi vmovapd %ymm0, 32(%rsp) call _ZL14Parse_Rel_TermPdPi <<<< Missing vzeroupper movl 252(%rsp), %esi movl 224(%rsp), %edx vmovapd 32(%rsp), %ymm0 cmpl %edx, %esi
(In reply to comment #3) > Created attachment 22437 [details] > A testcase > > With -O3 -funroll-loops -ffast-math -mavx: > > movl Token_Id(%rip), %eax > vmovapd 32(%rsp), %ymm0 > cmpl $1, %eax > je .L4 > cmpl $2, %eax > je .L5 > testl %eax, %eax > jne .L464 > leaq 124(%rsp), %rsi > leaq 64(%rsp), %rdi > call _Z16Parse_Rel_FactorPdPi <<<< Missing vzeroupper > movl 124(%rsp), %esi > > .L856: > vmovapd %ymm0, 32(%rsp) > call _Z9Get_Tokenv <<<< Missing vzeroupper > movl Token_Id(%rip), %eax > vmovapd 32(%rsp), %ymm0 > > ... > .L491: > leaq 252(%rsp), %rsi > leaq 64(%rsp), %rdi > vmovapd %ymm0, 32(%rsp) > call _ZL14Parse_Rel_TermPdPi <<<< Missing vzeroupper > movl 252(%rsp), %esi > movl 224(%rsp), %edx > vmovapd 32(%rsp), %ymm0 > cmpl %edx, %esi Some of 256bit vector insns are introduced by loop unroll. Maybe we should drop the use_avx256_p check since it isn't reliable.
Author: hjl Date: Wed Nov 24 18:24:39 2010 New Revision: 167124 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167124 Log: Improve vzeroupper optimization. gcc/ 2010-11-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c (upper_128bits_state): New. (block_info_def): Remove upper_128bits_set and done. Add state, referenced, count, processed and rescanned. (check_avx256_stores): Updated. (move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END. Call note_stores only if needed. Set referenced and count. (move_or_delete_vzeroupper_1): Updated. Set rescan_vzeroupper_p. (rescan_move_or_delete_vzeroupper): New. (move_or_delete_vzeroupper): Process and rescan all all basic blocks instead of predecessor blocks of all exit points. (ix86_option_override_internal): Enable vzeroupper optimization only for -fexpensive-optimizations and not optimizing for size. (use_avx256_p): Removed. (init_cumulative_args): Don't set use_avx256_p. (ix86_function_arg): Likewise. (ix86_expand_move): Likewise. (ix86_expand_vector_move_misalign): Likewise. (ix86_local_alignment): Likewise. (ix86_minimum_alignment): Likewise. (ix86_expand_epilogue): Don't check use_avx256_p when generating vzeroupper. (ix86_expand_call): Likewise. * config/i386/i386.h (machine_function): Remove use_vzeroupper_p and use_avx256_p. Add rescan_vzeroupper_p. gcc/testsuite/ 2010-11-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper. * gcc.target/i386/avx-vzeroupper-11.c: Likewise. * gcc.target/i386/avx-vzeroupper-14.c: Replace -O0 with -O2. * gcc.target/i386/avx-vzeroupper-15.c: Likewise. * gcc.target/i386/avx-vzeroupper-16.c: Likewise. * gcc.target/i386/avx-vzeroupper-17.c: Likewise. * gcc.target/i386/avx-vzeroupper-20.c: New. * gcc.target/i386/avx-vzeroupper-21.c: Likewise. * gcc.target/i386/avx-vzeroupper-22.c: Likewise. * gcc.target/i386/avx-vzeroupper-23.c: Likewise. * gcc.target/i386/avx-vzeroupper-24.c: Likewise. * gcc.target/i386/avx-vzeroupper-25.c: Likewise. * gcc.target/i386/avx-vzeroupper-26.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-21.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-22.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-23.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-24.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-25.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-26.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-16.c trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-17.c
Author: hjl Date: Wed Nov 24 19:16:40 2010 New Revision: 167126 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167126 Log: Don't check TREE_THIS_VOLATILE in ix86_expand_call. gcc/ 2010-11-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c (ix86_expand_call): Don't check TREE_THIS_VOLATILE. gcc/testsuite/ 2010-11-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * gfortran.dg/pr46519-1.f: New. Added: trunk/gcc/testsuite/gfortran.dg/pr46519-1.f Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
Author: hjl Date: Thu Dec 30 13:12:02 2010 New Revision: 168342 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168342 Log: Repeat processing all basic blocks for vzeroupper optimization. gcc/ 2010-12-30 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c (block_info_def): Remove referenced, count and rescanned. (move_or_delete_vzeroupper_2): Updated. (move_or_delete_vzeroupper_1): Rewritten to avoid recursive call. (rescan_move_or_delete_vzeroupper): Removed. (move_or_delete_vzeroupper): Repeat processing all basic blocks until no basic block state is changed to used at exit. gcc/testsuite/ 2010-12-30 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * gfortran.dg/pr46519-2.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/pr46519-2.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
Author: hjl Date: Mon Jan 24 17:29:58 2011 New Revision: 169173 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169173 Log: Visit basic blocks using the work-list based algorithm. 2011-01-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c: Include sbitmap.h and fibheap.h. (block_info): Add scanned and prev. (move_or_delete_vzeroupper_2): Return if the basic block has been scanned and the upper 128bit state is unchanged from the last scan. (move_or_delete_vzeroupper_1): Return true if the exit state is changed. (move_or_delete_vzeroupper): Visit basic blocks using the work-list based algorithm based on vt_find_locations in var-tracking.c. * config/i386/t-i386: Also depend on sbitmap.h and $(FIBHEAP_H). Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/t-i386
Fixed.
Author: dnovillo Date: Wed Feb 2 17:40:40 2011 New Revision: 169536 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169536 Log: Visit basic blocks using the work-list based algorithm. 2011-01-24 H.J. Lu <hongjiu.lu@intel.com> PR target/46519 * config/i386/i386.c: Include sbitmap.h and fibheap.h. (block_info): Add scanned and prev. (move_or_delete_vzeroupper_2): Return if the basic block has been scanned and the upper 128bit state is unchanged from the last scan. (move_or_delete_vzeroupper_1): Return true if the exit state is changed. (move_or_delete_vzeroupper): Visit basic blocks using the work-list based algorithm based on vt_find_locations in var-tracking.c. * config/i386/t-i386: Also depend on sbitmap.h and $(FIBHEAP_H). Modified: branches/google/integration/gcc/ChangeLog branches/google/integration/gcc/config/i386/i386.c branches/google/integration/gcc/config/i386/t-i386