This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
PR 12902: Still problems with unaligned SSE access (V4SF mode)
- From: Uros Bizjak <uros at kss-loka dot si>
- To: gcc-patches at gcc dot gnu dot org
- Date: Wed, 05 Jan 2005 10:23:22 +0100
- Subject: PR 12902: Still problems with unaligned SSE access (V4SF mode)
Hello!
There are still problems with unaligned SSE access. It looks that
"sse_movhps" and "sse_movlps" patterns should be broken into load and
store part, as it is now case with sse2_loadhpd/sse2_storehpd and
sse2_loadlpd/sse2_storelpd (I think that doubles were fixed by rth's
patch from 2004-12-22.
Here is reduced testcase that shows the problem:
--cut here--
#include <xmmintrin.h>
typedef struct
{
int i;
float f[3];
}
a_t;
typedef union
{
int i[4];
float f[4];
__m128 v;
}
vector4_t;
void swizzle (const void *a, vector4_t *b, vector4_t *c) {
b->v = _mm_loadl_pi (b->v, (__m64 *) a);
c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);
}
int main () {
a_t a[2];
vector4_t b, c, x;
swizzle (a, &b, &c);
x.v = _mm_add_ps (b.v, c.v);
return (x.i[1] + x.i[2] + x.i[3] +x.i[4]);
}
--cut here--
The problem is in 2nd line of swizzle() " c->v = _mm_loadl_pi (c->v,
((__m64 *) a) + 1);" , that gets combined to:
(insn 25 23 26 0 (set (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0
<variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0
<variable>.v+0 S16 A128])
(mem:V4SF (plus:SI (reg/v/f:SI 61 [ a ])
(const_int 8 [0x8])) [0 S16 A8])
(const_int 3 [0x3]))) 534 {sse_movlps}
(insn_list:REG_DEP_TRUE 8 (nil))
(expr_list:REG_DEAD (reg/v/f:SI 61 [ a ])
(expr_list:REG_DEAD (reg/v/f:SI 63 [ c ])
(nil))))
As this pattern doesn't satisfy the register constraints for
"sse_movlps" pattern, reload generates
(insn 38 23 25 0 (set (reg:V4SF 21 xmm0)
(mem:V4SF (plus:SI (reg/v/f:SI 0 ax [orig:61 a ] [61])
(const_int 8 [0x8])) [0 S16 A8])) 502
{*movv4sf_internal} (nil)
(nil))
(insn:HI 25 38 26 0 (set (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63])
[0 <variable>.v+0 S16 A128])
(vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63])
[0 <variable>.v+0 S16 A128])
(reg:V4SF 21 xmm0)
(const_int 3 [0x3]))) 534 {sse_movlps}
(insn_list:REG_DEP_TRUE 8 (nil))
(nil))
This results in movaps insn with unaligned load and application crashes:
...
movaps 8(%eax), %xmm0 #, <<< here
movlps %xmm0, (%ecx) #, <variable>.v
...
Instead, I think a code like this should be emitted:
...
movaps (%ecx), %xmm0 #, <variable>.v
movlps 8(%eax), %xmm0, #,
movaps %xmm0, (%ecx) #, <variable>.v
...
Uros.