Bug 21239 - [4.0 Regression] Illegal elimination of SSE2 load/store using xmm intrinsics
[4.0 Regression] Illegal elimination of SSE2 load/store using xmm intrinsics
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: rtl-optimization
4.0.0
: P2 normal
: 4.0.1
Assigned To: Jakub Jelinek
: ssemmx, wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-26 23:45 UTC by kurt
Modified: 2005-05-04 08:15 UTC (History)
2 users (show)

See Also:
Host: x86_64-suse-linux
Target: x86_64-suse-linux
Build: x86_64-suse-linux
Known to work: 3.4.3 4.1.0
Known to fail: 4.0.0
Last reconfirmed: 2005-05-03 14:42:12


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kurt 2005-04-26 23:45:53 UTC
/** intrin.c   
 *   
 * gcc-4.0 misoptimizes the _mm_load_sd() away with   
 * -O1 (x86-64), with or without -m32 -msse2.   
 *   
 * (c) Kurt Garloff <garloff@suse.de>, Artistic v2   
 */   
   
#include <stdlib.h>                                                                              
#include <emmintrin.h>                                                                           
   
#ifdef WORKAROUND                                                                                
# define ACCESS(X) asm("": : "x"(X))                                                             
#else                                                                                            
# define ACCESS(X)                                                                               
#endif                                                                                           
   
void do_copy(const unsigned int ln, double* const dst,      
                const double* const src)   
{   
        int i = ln;   
        const register double *s = src;   
        register double *d = dst;   
        __m128d TMP;   
        while (i) {   
                TMP = _mm_load_sd(s);   
                ACCESS(TMP);   
                _mm_store_sd(d, TMP);   
                --i; ++s; ++d;   
        }                                                                                        
}                                                                                                
   
int main()   
{   
        unsigned int i;   
        double *a, *b ,*c;   
        a = (double*) malloc(19*sizeof(double));   
        b = (double*) malloc(19*sizeof(double));   
        for (i = 0; i < 19; ++i) {   
                a[i] = 1; b[i] = 2;   
        }                                                                                        
        do_copy(19, a, b);   
        return (a[18] != 2);   
}                                                                                                
   
The test program should return 0, which it does if gcc-3.3/3.4 is used or if   
compiled with -DWORKAROUND. gcc-4.0, 4_0-branch, HEAD, and   
tree-profiling-branch all fail: The _mm_load_sd() is optimized away.   
I guess the compiler does not consider the _mm_store_sd() as a consumer of   
the vector register. Adding the fake consumer asm(""::x(XMMREG)); helps thus.  
Compiling with -m32 -msse2 exposes the same problem, I have a strong suspicion  
the native compiler on x86 would have the same problem.  
  
   
Here's the wrong assembly produced by gcc-4.0 (on x86-64, using -O2):  
do_copy:  
.LFB495:  
        testl   %edi, %edi  
        jne     .L8  
        rep ; ret  
        .p2align 4,,7  
.L8:  
        xorl    %eax, %eax  
        .p2align 4,,7  
.L4:  
        incl    %eax  
        movq    $0, (%rsi)  
        addq    $8, %rsi  
        cmpl    %eax, %edi  
        jne     .L4  
        rep ; ret  
  
... and here the correct assembly with -DWORKAROUND added:  
do_copy:  
.LFB495:  
        testl   %edi, %edi  
        jne     .L8  
        rep ; ret  
        .p2align 4,,7  
.L8:  
        xorl    %eax, %eax  
        .p2align 4,,7  
.L4:  
        movsd   (%rdx), %xmm0  
        incl    %eax  
        movlpd  %xmm0, (%rsi)  
        addq    $8, %rdx  
        addq    $8, %rsi  
        cmpl    %eax, %edi  
        jne     .L4  
        rep ; ret
Comment 1 Andrew Pinski 2005-04-27 00:00:38 UTC
Confirmed.
Combine is combing the following RTL:
(insn 30 27 31 2 (set (reg:DF 70)
        (mem:DF (reg/v/f:SI 64 [ s ]) [0 S8 A64])) -1 (nil)
    (nil))

(insn 31 30 32 2 (set (reg:V2DF 69)
        (vec_concat:V2DF (reg:DF 70)
            (const_double:DF 0 [0x0] 0.0 [0x0.0p+0]))) -1 (insn_list:REG_DEP_TRUE 30 (nil))
    (expr_list:REG_DEAD (reg:DF 70)
        (nil)))

(insn 32 31 34 2 (set (reg:DF 71)
        (vec_select:DF (reg:V2DF 69)
            (parallel [
                    (const_int 0 [0x0])
                ]))) -1 (insn_list:REG_DEP_TRUE 31 (nil))
    (expr_list:REG_DEAD (reg:V2DF 69)
        (nil)))

(insn 34 32 36 2 (set (mem:DF (reg/v/f:SI 63 [ d ]) [0 S8 A64])
        (reg:DF 71)) -1 (insn_list:REG_DEP_TRUE 32 (nil))
    (expr_list:REG_DEAD (reg:DF 71)
        (nil)))
into:
(insn 34 32 36 2 (set (mem:DF (reg/v/f:SI 63 [ d ]) [0 S8 A64])
        (const_double:DF 0 [0x0] 0.0 [0x0.0p+0])) 65 {*movdf_nointeger} (nil)
    (nil))

Which is just wrong.
Comment 2 Jakub Jelinek 2005-05-03 14:42:11 UTC
Yeah, a bug in combine_simplify_rtx.  I have a patch that fixes this, but
while working on a testcase I encountered other bug as well, so am looking
into that too.
Comment 3 CVS Commits 2005-05-03 22:16:15 UTC
Subject: Bug 21239

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	jakub@gcc.gnu.org	2005-05-03 22:16:02

Modified files:
	gcc            : ChangeLog combine.c 
	gcc/testsuite  : ChangeLog 
	gcc/config/i386: i386.c 
Added files:
	gcc/testsuite/gcc.dg: i386-sse-11.c 

Log message:
	* config/i386/i386.c (ix86_expand_vector_set): Fix setting 3rd and 4th
	item in V4SF mode.
	
	PR rtl-optimization/21239
	* combine.c (combine_simplify_rtx) <case VEC_SELECT>: Fix a typo.
	
	* gcc.dg/i386-sse-11.c: New test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.8587&r2=2.8588
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/combine.c.diff?cvsroot=gcc&r1=1.488&r2=1.489
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.5436&r2=1.5437
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.817&r2=1.818
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/i386-sse-11.c.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 4 CVS Commits 2005-05-04 07:30:19 UTC
Subject: Bug 21239

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-4_0-branch
Changes by:	jakub@gcc.gnu.org	2005-05-04 07:29:29

Modified files:
	gcc            : ChangeLog combine.c 
	gcc/testsuite  : ChangeLog 
	gcc/config/i386: i386.c 
Added files:
	gcc/testsuite/gcc.dg: i386-sse-11.c 

Log message:
	* config/i386/i386.c (ix86_expand_vector_set): Fix setting 3rd and 4th
	item in V4SF mode.
	
	PR rtl-optimization/21239
	* combine.c (combine_simplify_rtx) <case VEC_SELECT>: Fix a typo.
	
	* gcc.dg/i386-sse-11.c: New test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=2.7592.2.215&r2=2.7592.2.216
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/combine.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.475.4.4&r2=1.475.4.5
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.5084.2.158&r2=1.5084.2.159
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.795.6.7&r2=1.795.6.8
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/i386-sse-11.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=NONE&r2=1.1.2.1

Comment 5 Giovanni Bajo 2005-05-04 08:15:46 UTC
Fixed, thanks Kurt for the report and Jakub for fixing it!