PR 12902: Still problems with unaligned SSE access (V4SF mode)

Uros Bizjak uros@kss-loka.si
Wed Jan 5 09:22:00 GMT 2005


Hello!

There are still problems with unaligned SSE access. It looks that 
"sse_movhps" and "sse_movlps" patterns should be broken into load and 
store part, as it is now case with sse2_loadhpd/sse2_storehpd and 
sse2_loadlpd/sse2_storelpd (I think that doubles were fixed by rth's 
patch from 2004-12-22.

Here is reduced testcase that shows the problem:
--cut here--
#include <xmmintrin.h>

typedef struct
{
  int i;
  float f[3];
}
a_t;

typedef union
{
  int i[4];
  float f[4];
  __m128 v;
}
vector4_t;

void swizzle (const void *a, vector4_t *b, vector4_t *c) {
  b->v = _mm_loadl_pi (b->v, (__m64 *) a);
  c->v = _mm_loadl_pi (c->v, ((__m64 *) a) + 1);
}

int main () {
  a_t a[2];
  vector4_t b, c, x;

  swizzle (a, &b, &c);
  x.v = _mm_add_ps (b.v, c.v);

  return (x.i[1] + x.i[2] + x.i[3] +x.i[4]);
}
--cut here--

The problem is in 2nd line of swizzle() "  c->v = _mm_loadl_pi (c->v, 
((__m64 *) a) + 1);" , that gets combined to:

(insn 25 23 26 0 (set (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 
<variable>.v+0 S16 A128])
        (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 63 [ c ]) [0 
<variable>.v+0 S16 A128])
            (mem:V4SF (plus:SI (reg/v/f:SI 61 [ a ])
                    (const_int 8 [0x8])) [0 S16 A8])
            (const_int 3 [0x3]))) 534 {sse_movlps} 
(insn_list:REG_DEP_TRUE 8 (nil))
    (expr_list:REG_DEAD (reg/v/f:SI 61 [ a ])
        (expr_list:REG_DEAD (reg/v/f:SI 63 [ c ])
            (nil))))

As this pattern doesn't satisfy the register constraints for 
"sse_movlps" pattern, reload generates

(insn 38 23 25 0 (set (reg:V4SF 21 xmm0)
        (mem:V4SF (plus:SI (reg/v/f:SI 0 ax [orig:61 a ] [61])
                (const_int 8 [0x8])) [0 S16 A8])) 502 
{*movv4sf_internal} (nil)
    (nil))

(insn:HI 25 38 26 0 (set (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) 
[0 <variable>.v+0 S16 A128])
        (vec_merge:V4SF (mem/s:V4SF (reg/v/f:SI 2 cx [orig:63 c ] [63]) 
[0 <variable>.v+0 S16 A128])
            (reg:V4SF 21 xmm0)
            (const_int 3 [0x3]))) 534 {sse_movlps} 
(insn_list:REG_DEP_TRUE 8 (nil))
    (nil))


This results in movaps insn with unaligned load and application crashes:
        ...
        movaps  8(%eax), %xmm0  #,                        <<< here
        movlps  %xmm0, (%ecx)   #, <variable>.v
        ...

Instead, I think a code like this should be emitted:
        ...
        movaps  (%ecx), %xmm0  #, <variable>.v
        movlps   8(%eax), %xmm0,  #,
        movaps  %xmm0, (%ecx)  #, <variable>.v
        ...

Uros.



More information about the Gcc-patches mailing list