This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c++/12902] Invalid assembly generated when using SSE / xmmintrin.h


PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902



------- Additional Comments From kbowers at lanl dot gov  2003-11-05 05:51 -------
Subject: Re:  Invalid assembly generated when using SSE / xmmintrin.h

pinskia at gcc dot gnu dot org wrote:
> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12902
> 
> 
> pinskia at gcc dot gnu dot org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |WAITING
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2003-11-05 04:08 -------
> The "a" in the asm corespondes to the variable "a" in foo, not the a in inlined function, swizzle.
> So the movaps is coming from:
>   c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); <--- 8(%eax)
 >
> Is that really your problem, or is because main's stack is unaligned and you inlined functions into 
> main that use sse?

I don't think this is a stack alignment problem. The memory pointed to 
by "swizzle:a0"/"foo:a" is not on the stack and is 16-byte aligned (a0 
is the pointer "a" in the the foo). Also, _mm_loadl_pi is intended for 
8-byte aligned as (((__m64 *)a0)+1 is) memory 8-byte loads .

I think gcc has a bug in the implementation of the 
__builtin_ia32_loadlps and __builtin_ia32_loadhps intrinsics when 
dealing with xmm registers stored in a stack temporary.

Here is foo again with the b1 swizzle / math removed.

void foo( const a_t *a, const b_t *b, int n ) {
   vector4 ai, a0, a1, a2, b0, v0, v1, v2;
   __m128 *p0, *p1, *p2, *p3;

   for(;n;n--,a+=4) {
     swizzle(a,a+1,a+2,a+3,ai,a0,a1,a2);
     p0 = (__m128 *)(b + ai.i[0]);
     p1 = (__m128 *)(b + ai.i[1]);
     p2 = (__m128 *)(b + ai.i[2]);
     p3 = (__m128 *)(b + ai.i[3]);
     swizzle(p0++,p1++,p2++,p3++,b0,v0,v1,v2);
     b0.v = _mm_add_ps(_mm_add_ps(b0.v,_mm_mul_ps(a1.v,v0.v)),
            _mm_mul_ps(a2.v,_mm_add_ps(v1.v,_mm_mul_ps(a1.v,v2.v))));
   }
}

Here is the relevant assembly snippet:

% gcc-3.3.2 -S -fverbose-asm -O -msse bug_12902_followup.cpp
% cat bug_12902_followup.s

... snip to around line 45 ...

movaps	-40(%ebp), %xmm3	#  <variable>.v,  <anonymous>
movlps	(%esi), %xmm3	# * a,  <anonymous>
movlps	8(%esi), %xmm6	#  <anonymous>
movhps	16(%esi), %xmm3	#  <anonymous>
movhps	24(%esi), %xmm6	#  <anonymous>
movaps	%xmm6, %xmm4	#  <anonymous>,  <anonymous>
movaps	%xmm3, %xmm1	#  <anonymous>,  <anonymous>
movlps	32(%esi), %xmm1	#  <anonymous>
movaps	%xmm6, %xmm2	#  <anonymous>,  <anonymous>
movlps	40(%esi), %xmm2	#  <anonymous>
movhps	48(%esi), %xmm1	#  <anonymous>
movhps	56(%esi), %xmm2	#  <anonymous>
movaps	%xmm3, %xmm0	#  <anonymous>

... snip ...

This is the correct assembly and the function does not crash. I also 
have much more complex versions of the loop which do not crash (in those 
cases, I guess I got lucky with the register spills).

The third instruction above corresponds to line you cited:

c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1);

I've tried to figure out what gcc is doing in the original foo but I'm 
not sure I follow:

I think I've figured out the problem. Here is a translation of the first 
swizzle assembly from the original foo():

#  a.v = _mm_loadl_pi(a.v, (__m64 *)a0);
*  a.v is in xmm3
	movlps	(%eax), %xmm3

#  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FAULTS
*  c.v is in -200(%ebp)
	movaps	8(%eax), %xmm0
	movlps	%xmm0, -200(%ebp)

#  a.v = _mm_loadh_pi(a.v, (__m64 *)a1);
*  a.v is in xmm3
	movhps	16(%eax), %xmm3

#  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... WOULD FAULT
*  c.v is in -200(%ebp)
	movaps	24(%eax), %xmm2
	movhps	%xmm2, -200(%ebp)

#  b.v = a.v;
*  b.v is in xmm3 (a.v and b.v are copies)

#  d.v = c.v;
*  d.v is in xmm4 (d.v and c.v are copies)
	movaps	-200(%ebp), %xmm4

# t   = _mm_loadl_pi(b.v, (__m64 *)a2);
* t is in xmm1
	movaps	%xmm3, %xmm1
	movlps	32(%eax), %xmm1

# u   = _mm_loadl_pi(d.v,((__m64 *)a2)+1);
* u is in xmm2
	movaps	%xmm4, %xmm2
	movlps	40(%eax), %xmm2

# t   = _mm_loadh_pi(t, (__m64 *)a3);
* t is in xmm1
	movhps	48(%eax), %xmm1

# u   = _mm_loadh_pi(u,((__m64 *)a3)+1);
* u is in xmm2
	movhps	56(%eax), %xmm2

# a.v = _mm_shuffle_ps(a.v,t,0x88);
* a.v is stored is "foo:ai"
* b.v no longer shares xmm3 with a.v
	movaps	%xmm3, %xmm0
	shufps	$136, %xmm1, %xmm0
	movaps	%xmm0, -40(%ebp)

# b.v = _mm_shuffle_ps(b.v,t,0xdd);
* b.v is stored in "foo:a0"
	shufps	$221, %xmm1, %xmm3
	movaps	%xmm3, -216(%ebp)

# c.v = _mm_shuffle_ps(c.v,u,0x88);
* c.v is stored in "foo:a1"
* d.v no longer shares xmm4 with a.v
	movaps	%xmm4, %xmm0
	shufps	$136, %xmm2, %xmm0
	movaps	%xmm0, -200(%ebp)
# d.v = _mm_shuffle_ps(d.v,u,0xdd);
* d.v is stored in "foo:a2"
	shufps	$221, %xmm2, %xmm4
	movaps	%xmm4, -232(%ebp)

The two faulting instructions appear to occur when an _mm_loadl_pi 
(__builtin_ia32_loadlps), _mm_loadh_pi (__builtin_ia32_loadhps) seem to 
work fine when doing a m64 -> xmm register transaction. However, when 
doing a m64 -> xmm stack temporary, they emit invalid assembly. The 
above assembly sequency makes sense if the following changes are made:

#  c.v = _mm_loadl_pi(c.v,((__m64 *)a0)+1); ... FIXED
*  c.v is in -200(%ebp)
	movaps  -200(%ebp), %xmm0 # Retrieve c.v from stack
	movlps	8(%eax), %xmm0    # A valid 64-bit load
	movaps	%xmm0, -200(%ebp) # Store modified c.v onto stack

and

#  c.v = _mm_loadh_pi(c.v,((__m64 *)a1)+1); ... FIXED
*  c.v is in -200(%ebp)
	movaps  -200(%ebp), %xmm2 # Retrive c.v from stack
	movhps	24(%eax), %xmm2   # A valid 64-bit load
	movaps	%xmm2, -200(%ebp) # Store modified c.v onto stack

Thanks for your prompt reply.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]