Bug 46519 - Missing vzeroupper
Summary: Missing vzeroupper
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-17 15:04 UTC by H.J. Lu
Modified: 2011-02-02 17:40 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A testcase (448 bytes, text/plain)
2010-11-17 15:04 UTC, H.J. Lu
Details
A testcase (775 bytes, text/plain)
2010-11-18 00:26 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2010-11-17 15:04:22 UTC
Created attachment 22430 [details]
A testcase

/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O3 -funroll-loops -ffast-math -mavx -Wno-multichar -S bad.c

.L531:
        vmovapd %ymm0, (%rsp)
        vzeroupper
        call    Get_Token
        movl    Token_Id(%rip), %eax
        vmovapd (%rsp), %ymm0
        cmpl    $1, %eax
        je      .L139
        cmpl    $2, %eax
        je      .L541

L171:
        vmovsd  (%rax), %xmm9
        vsubsd  32(%rsp,%rcx,8), %xmm9, %xmm8
        leal    1(%rdx), %ecx
        cmpl    %ecx, %esi
        vmovsd  %xmm8, (%rax)
        jg      .L543
        jmp     .L531
        .p2align 4,,10
        .p2align 3
.L541:
        leaq    80(%rsp), %rsi 
        movq    %r12, %rdi 
        call    Parse_Num_Term <<<< missing vzeroupper
        movl    0(%r13), %esi 
        movl    80(%rsp), %eax 
        vmovapd (%rsp), %ymm0
        cmpl    %eax, %esi
Comment 1 Uroš Bizjak 2010-11-17 18:42:29 UTC
Does the patch at [1] also fix this test?

[1] http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01802.html
Comment 2 H.J. Lu 2010-11-17 18:49:14 UTC
(In reply to comment #1)
> Does the patch at [1] also fix this test?
> 
> [1] http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01802.html

No, that is for a different case.
Comment 3 H.J. Lu 2010-11-18 00:26:18 UTC
Created attachment 22437 [details]
A testcase

With -O3 -funroll-loops -ffast-math -mavx:

        movl    Token_Id(%rip), %eax
        vmovapd 32(%rsp), %ymm0
        cmpl    $1, %eax
        je      .L4
        cmpl    $2, %eax
        je      .L5
        testl   %eax, %eax
        jne     .L464
        leaq    124(%rsp), %rsi
        leaq    64(%rsp), %rdi
        call    _Z16Parse_Rel_FactorPdPi   <<<< Missing vzeroupper
        movl    124(%rsp), %esi

.L856:
        vmovapd %ymm0, 32(%rsp)
        call    _Z9Get_Tokenv  <<<< Missing vzeroupper
        movl    Token_Id(%rip), %eax 
        vmovapd 32(%rsp), %ymm0

...
.L491:
        leaq    252(%rsp), %rsi
        leaq    64(%rsp), %rdi 
        vmovapd %ymm0, 32(%rsp)    
        call    _ZL14Parse_Rel_TermPdPi   <<<< Missing vzeroupper
        movl    252(%rsp), %esi
        movl    224(%rsp), %edx
        vmovapd 32(%rsp), %ymm0
        cmpl    %edx, %esi
Comment 4 H.J. Lu 2010-11-18 01:48:13 UTC
(In reply to comment #3)
> Created attachment 22437 [details]
> A testcase
> 
> With -O3 -funroll-loops -ffast-math -mavx:
> 
>         movl    Token_Id(%rip), %eax
>         vmovapd 32(%rsp), %ymm0
>         cmpl    $1, %eax
>         je      .L4
>         cmpl    $2, %eax
>         je      .L5
>         testl   %eax, %eax
>         jne     .L464
>         leaq    124(%rsp), %rsi
>         leaq    64(%rsp), %rdi
>         call    _Z16Parse_Rel_FactorPdPi   <<<< Missing vzeroupper
>         movl    124(%rsp), %esi
> 
> .L856:
>         vmovapd %ymm0, 32(%rsp)
>         call    _Z9Get_Tokenv  <<<< Missing vzeroupper
>         movl    Token_Id(%rip), %eax 
>         vmovapd 32(%rsp), %ymm0
> 
> ...
> .L491:
>         leaq    252(%rsp), %rsi
>         leaq    64(%rsp), %rdi 
>         vmovapd %ymm0, 32(%rsp)    
>         call    _ZL14Parse_Rel_TermPdPi   <<<< Missing vzeroupper
>         movl    252(%rsp), %esi
>         movl    224(%rsp), %edx
>         vmovapd 32(%rsp), %ymm0
>         cmpl    %edx, %esi

Some of 256bit vector insns are introduced by loop unroll. Maybe
we should drop the use_avx256_p check since it isn't reliable.
Comment 5 hjl@gcc.gnu.org 2010-11-24 18:24:46 UTC
Author: hjl
Date: Wed Nov 24 18:24:39 2010
New Revision: 167124

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167124
Log:
Improve vzeroupper optimization.

gcc/

2010-11-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (upper_128bits_state): New.
	(block_info_def): Remove upper_128bits_set and done.  Add state,
	referenced, count, processed and rescanned. 
	(check_avx256_stores): Updated.
	(move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END.
	Call note_stores only if needed.  Set referenced and count.
	(move_or_delete_vzeroupper_1): Updated.  Set rescan_vzeroupper_p.
	(rescan_move_or_delete_vzeroupper): New.
	(move_or_delete_vzeroupper):  Process and rescan all all basic
	blocks instead of predecessor blocks of all exit points.
	(ix86_option_override_internal): Enable vzeroupper optimization
	only for -fexpensive-optimizations and not optimizing for size.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_vzeroupper_p
	and use_avx256_p.  Add rescan_vzeroupper_p.

gcc/testsuite/

2010-11-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-14.c: Replace -O0 with -O2.
	* gcc.target/i386/avx-vzeroupper-15.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-16.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-17.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.
	* gcc.target/i386/avx-vzeroupper-21.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-22.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-23.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-24.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-25.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-26.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-21.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-22.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-23.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-24.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-25.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-26.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-16.c
    trunk/gcc/testsuite/gcc.target/i386/avx-vzeroupper-17.c
Comment 6 hjl@gcc.gnu.org 2010-11-24 19:16:48 UTC
Author: hjl
Date: Wed Nov 24 19:16:40 2010
New Revision: 167126

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167126
Log:
Don't check TREE_THIS_VOLATILE in ix86_expand_call.

gcc/

2010-11-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (ix86_expand_call): Don't check
	TREE_THIS_VOLATILE.

gcc/testsuite/

2010-11-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gfortran.dg/pr46519-1.f: New.

Added:
    trunk/gcc/testsuite/gfortran.dg/pr46519-1.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog
Comment 7 hjl@gcc.gnu.org 2010-12-30 13:12:05 UTC
Author: hjl
Date: Thu Dec 30 13:12:02 2010
New Revision: 168342

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168342
Log:
Repeat processing all basic blocks for vzeroupper optimization.

gcc/

2010-12-30  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Remove referenced, count
	and rescanned.
	(move_or_delete_vzeroupper_2): Updated.
	(move_or_delete_vzeroupper_1): Rewritten to avoid recursive call.
	(rescan_move_or_delete_vzeroupper): Removed.
	(move_or_delete_vzeroupper): Repeat processing all basic blocks
	until no basic block state is changed to used at exit.

gcc/testsuite/

2010-12-30  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gfortran.dg/pr46519-2.f90: New.

Added:
    trunk/gcc/testsuite/gfortran.dg/pr46519-2.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog
Comment 8 hjl@gcc.gnu.org 2011-01-24 17:30:00 UTC
Author: hjl
Date: Mon Jan 24 17:29:58 2011
New Revision: 169173

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169173
Log:
Visit basic blocks using the work-list based algorithm.

2011-01-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c: Include sbitmap.h and fibheap.h.
	(block_info): Add scanned and prev.
	(move_or_delete_vzeroupper_2): Return if the basic block
	has been scanned and the upper 128bit state is unchanged
	from the last scan.
	(move_or_delete_vzeroupper_1): Return true if the exit
	state is changed.
	(move_or_delete_vzeroupper): Visit basic blocks using the
	work-list based algorithm based on vt_find_locations in
	var-tracking.c.

	* config/i386/t-i386: Also depend on sbitmap.h and $(FIBHEAP_H).

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/t-i386
Comment 9 H.J. Lu 2011-01-24 17:30:59 UTC
Fixed.
Comment 10 Diego Novillo 2011-02-02 17:40:46 UTC
Author: dnovillo
Date: Wed Feb  2 17:40:40 2011
New Revision: 169536

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169536
Log:
Visit basic blocks using the work-list based algorithm.

2011-01-24  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c: Include sbitmap.h and fibheap.h.
	(block_info): Add scanned and prev.
	(move_or_delete_vzeroupper_2): Return if the basic block
	has been scanned and the upper 128bit state is unchanged
	from the last scan.
	(move_or_delete_vzeroupper_1): Return true if the exit
	state is changed.
	(move_or_delete_vzeroupper): Visit basic blocks using the
	work-list based algorithm based on vt_find_locations in
	var-tracking.c.

	* config/i386/t-i386: Also depend on sbitmap.h and $(FIBHEAP_H).

Modified:
    branches/google/integration/gcc/ChangeLog
    branches/google/integration/gcc/config/i386/i386.c
    branches/google/integration/gcc/config/i386/t-i386