This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PR 12902: Still problems with unaligned SSE access (V4SF mode)


Hello!

There are still problems with unaligned SSE access. It looks that "sse_movhps" and "sse_movlps" patterns should be broken into load and store part, as it is now case with sse2_loadhpd/sse2_storehpd and sse2_loadlpd/sse2_storelpd (I think that doubles were fixed by rth's patch from 2004-12-22.

Here is reduced testcase that shows the problem:
--cut here--
#include <xmmintrin.h>

typedef struct
{
 int i;
 float f[3];
}
a_t;

typedef union
{
 int i[4];
 float f[4];
 __m128 v;
}
vector4_t;

void swizzle (const void *a, vector4_t *b, vector4_t *c) {
 b->v = _mm_loadl_pi (b->v, (__m64 *) a);
 c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);
}

int main () {
 a_t a[2];
 vector4_t b, c, x;

 swizzle (a, &b, &c);
 x.v = _mm_add_ps (b.v, c.v);

 return (x.i[1] + x.i[2] + x.i[3] +x.i[4]);
}
--cut here--

The problem is in 2nd line of swizzle() " c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);" , that gets combined to:

(insn 25 23 26 0 (set (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 <variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 <variable>.v+0 S16 A128])
(mem:V4SF (plus:SI (reg/v/f:SI 61 [ a ])
(const_int 8 [0x8])) [0 S16 A8])
(const_int 3 [0x3]))) 534 {sse_movlps} (insn_list:REG_DEP_TRUE 8 (nil))
(expr_list:REG_DEAD (reg/v/f:SI 61 [ a ])
(expr_list:REG_DEAD (reg/v/f:SI 63 [ c ])
(nil))))


As this pattern doesn't satisfy the register constraints for "sse_movlps" pattern, reload generates

(insn 38 23 25 0 (set (reg:V4SF 21 xmm0)
(mem:V4SF (plus:SI (reg/v/f:SI 0 ax [orig:61 a ] [61])
(const_int 8 [0x8])) [0 S16 A8])) 502 {*movv4sf_internal} (nil)
(nil))


(insn:HI 25 38 26 0 (set (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) [0 <variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) [0 <variable>.v+0 S16 A128])
(reg:V4SF 21 xmm0)
(const_int 3 [0x3]))) 534 {sse_movlps} (insn_list:REG_DEP_TRUE 8 (nil))
(nil))



This results in movaps insn with unaligned load and application crashes: ... movaps 8(%eax), %xmm0 #, <<< here movlps %xmm0, (%ecx) #, <variable>.v ...

Instead, I think a code like this should be emitted:
       ...
       movaps  (%ecx), %xmm0  #, <variable>.v
       movlps   8(%eax), %xmm0,  #,
       movaps  %xmm0, (%ecx)  #, <variable>.v
       ...

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]