This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

PR 12902: Still problems with unaligned SSE access (V4SF mode)

From: Uros Bizjak <uros at kss-loka dot si>
To: gcc-patches at gcc dot gnu dot org
Date: Wed, 05 Jan 2005 10:23:22 +0100
Subject: PR 12902: Still problems with unaligned SSE access (V4SF mode)

Hello!

There are still problems with unaligned SSE access. It looks that "sse_movhps" and "sse_movlps" patterns should be broken into load and store part, as it is now case with sse2_loadhpd/sse2_storehpd and sse2_loadlpd/sse2_storelpd (I think that doubles were fixed by rth's patch from 2004-12-22.

Here is reduced testcase that shows the problem:
--cut here--
#include <xmmintrin.h>

typedef struct
{
 int i;
 float f[3];
}
a_t;

typedef union
{
 int i[4];
 float f[4];
 __m128 v;
}
vector4_t;

void swizzle (const void *a, vector4_t *b, vector4_t *c) {
 b->v = _mm_loadl_pi (b->v, (__m64 *) a);
 c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);
}

int main () {
 a_t a[2];
 vector4_t b, c, x;

 swizzle (a, &b, &c);
 x.v = _mm_add_ps (b.v, c.v);

 return (x.i[1] + x.i[2] + x.i[3] +x.i[4]);
}
--cut here--

The problem is in 2nd line of swizzle() " c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);" , that gets combined to:

(insn 25 23 26 0 (set (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 <variable>.v+0 S16 A128]) (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 <variable>.v+0 S16 A128]) (mem:V4SF (plus:SI (reg/v/f:SI 61 [ a ]) (const_int 8 [0x8])) [0 S16 A8]) (const_int 3 [0x3]))) 534 {sse_movlps} (insn_list:REG_DEP_TRUE 8 (nil)) (expr_list:REG_DEAD (reg/v/f:SI 61 [ a ]) (expr_list:REG_DEAD (reg/v/f:SI 63 [ c ]) (nil))))

As this pattern doesn't satisfy the register constraints for "sse_movlps" pattern, reload generates

(insn 38 23 25 0 (set (reg:V4SF 21 xmm0) (mem:V4SF (plus:SI (reg/v/f:SI 0 ax [orig:61 a ] [61]) (const_int 8 [0x8])) [0 S16 A8])) 502 {*movv4sf_internal} (nil) (nil))

(insn:HI 25 38 26 0 (set (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) [0 <variable>.v+0 S16 A128]) (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) [0 <variable>.v+0 S16 A128]) (reg:V4SF 21 xmm0) (const_int 3 [0x3]))) 534 {sse_movlps} (insn_list:REG_DEP_TRUE 8 (nil)) (nil))


This results in movaps insn with unaligned load and application crashes:
       ...
       movaps  8(%eax), %xmm0  #,                        <<< here
       movlps  %xmm0, (%ecx)   #, <variable>.v
       ...

Instead, I think a code like this should be emitted:
       ...
       movaps  (%ecx), %xmm0  #, <variable>.v
       movlps   8(%eax), %xmm0,  #,
       movaps  %xmm0, (%ecx)  #, <variable>.v
       ...

Uros.

Follow-Ups:
- Re: PR 12902: Still problems with unaligned SSE access (V4SF mode)
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]