This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

patch [RFC] fix for an -msse2 failure


Hi,

Following test case ICEs on x86 (-Os -msse2):

typedef float __m128 __attribute__ ((vector_size (16)));

static __inline __m128
_mm_mul_ps (__m128 __A, __m128 __B)
{
  return __builtin_ia32_mulps (__A, __B);
}

static __inline __m128
_mm_sub_ps (__m128 __A, __m128 __B)
{
  return  __builtin_ia32_subps (__A, __B);
}

__m128 POW_FUNC (__m128 x, __m128 y)
{
    __m128 xmm0 = x, xmm1 = y, xmm2;

xmm2 = __builtin_ia32_xorps (xmm2, xmm2);

xmm0 = _mm_mul_ps (xmm0, xmm1);

xmm0 = _mm_sub_ps (xmm0, xmm2);

xmm0 = _mm_mul_ps (xmm0, xmm1);

    return xmm0;
}

% mygccim5 -c -Os -msse2 bad.c
bad.c: In function 'POW_FUNC':
bad.c:28: internal compiler error: in trunc_int_for_mode, at explow.c:53
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

And the reason it ICEs is because for the vector xor operator of same vectors
(__builtin_ia32_xorps (xmm2, xmm2)) we evaluate the result to a
REG_EQUAL (const_int 0 [0x0]). As in:


(insn 10 5 13 0 (set (reg/v:V4SF 58 [ xmm2.2 ])
        (xor:V4SF (reg/v:V4SF 60 [ xmm2 ])
            (reg/v:V4SF 60 [ xmm2 ]))) 562 {*sse_xorv4sf3} (nil)
    (expr_list:REG_EQUAL (const_int 0 [0x0])
        (nil)))

This causes the combine phase to ICE. I don;t see how result of a vector xor be a scalar const_int 0.
My first attempt was to generate the 'right' REG_EQUAL; namely, a vector_const of 0 elements; as in:


(insn 12 7 17 0 (set (reg:V4SF 61)
        (xor:V4SF (reg/v:V4SF 60 [ x ])
            (reg/v:V4SF 60 [ x ]))) 543 {*xorv4sf3} (nil)
    (expr_list:REG_EQUAL (const_vector:V4SI [
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
            ])
        (nil)))

But this also ICEs immediately down the road. So, this pattern which seems semantically correct is
not liked by the gcc back-end. So, what fixed it for me is to not fold the above vector xor operation. As in
this patch:


Index: simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.228.2.3
diff -c -p -r1.228.2.3 simplify-rtx.c
*** simplify-rtx.c      9 Apr 2005 06:01:55 -0000       1.228.2.3
--- simplify-rtx.c      13 Apr 2005 00:20:11 -0000
*************** simplify_binary_operation (enum rtx_code
*** 1950,1956 ****
            return simplify_gen_unary (NOT, mode, op0, mode);
          if (trueop0 == trueop1
              && ! side_effects_p (op0)
!             && GET_MODE_CLASS (mode) != MODE_CC)
            return const0_rtx;

          /* Canonicalize XOR of the most significant bit to PLUS.  */
--- 1950,1957 ----
            return simplify_gen_unary (NOT, mode, op0, mode);
          if (trueop0 == trueop1
              && ! side_effects_p (op0)
!             && GET_MODE_CLASS (mode) != MODE_CC
!             && ! VECTOR_MODE_P (mode))
            return const0_rtx;

/* Canonicalize XOR of the most significant bit to PLUS. */

Is this the patch that I can pursue?

- Thanks, fariborz (fjahanian@apple.com)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]