This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
patch [RFC] fix for an -msse2 failure
- From: Fariborz Jahanian <fjahanian at apple dot com>
- To: "gcc-patches at gcc dot gnu dot org Patches" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 12 Apr 2005 17:40:09 -0700
- Subject: patch [RFC] fix for an -msse2 failure
Hi,
Following test case ICEs on x86 (-Os -msse2):
typedef float __m128 __attribute__ ((vector_size (16)));
static __inline __m128
_mm_mul_ps (__m128 __A, __m128 __B)
{
return __builtin_ia32_mulps (__A, __B);
}
static __inline __m128
_mm_sub_ps (__m128 __A, __m128 __B)
{
return __builtin_ia32_subps (__A, __B);
}
__m128 POW_FUNC (__m128 x, __m128 y)
{
__m128 xmm0 = x, xmm1 = y, xmm2;
xmm2 = __builtin_ia32_xorps (xmm2, xmm2);
xmm0 = _mm_mul_ps (xmm0, xmm1);
xmm0 = _mm_sub_ps (xmm0, xmm2);
xmm0 = _mm_mul_ps (xmm0, xmm1);
return xmm0;
}
% mygccim5 -c -Os -msse2 bad.c
bad.c: In function 'POW_FUNC':
bad.c:28: internal compiler error: in trunc_int_for_mode, at explow.c:53
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
And the reason it ICEs is because for the vector xor operator of same
vectors
(__builtin_ia32_xorps (xmm2, xmm2)) we evaluate the result to a
REG_EQUAL (const_int 0 [0x0]). As in:
(insn 10 5 13 0 (set (reg/v:V4SF 58 [ xmm2.2 ])
(xor:V4SF (reg/v:V4SF 60 [ xmm2 ])
(reg/v:V4SF 60 [ xmm2 ]))) 562 {*sse_xorv4sf3} (nil)
(expr_list:REG_EQUAL (const_int 0 [0x0])
(nil)))
This causes the combine phase to ICE. I don;t see how result of a
vector xor be a scalar const_int 0.
My first attempt was to generate the 'right' REG_EQUAL; namely, a
vector_const of 0 elements; as in:
(insn 12 7 17 0 (set (reg:V4SF 61)
(xor:V4SF (reg/v:V4SF 60 [ x ])
(reg/v:V4SF 60 [ x ]))) 543 {*xorv4sf3} (nil)
(expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
(const_int 0 [0x0])
])
(nil)))
But this also ICEs immediately down the road. So, this pattern which
seems semantically correct is
not liked by the gcc back-end. So, what fixed it for me is to not fold
the above vector xor operation. As in
this patch:
Index: simplify-rtx.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v
retrieving revision 1.228.2.3
diff -c -p -r1.228.2.3 simplify-rtx.c
*** simplify-rtx.c 9 Apr 2005 06:01:55 -0000 1.228.2.3
--- simplify-rtx.c 13 Apr 2005 00:20:11 -0000
*************** simplify_binary_operation (enum rtx_code
*** 1950,1956 ****
return simplify_gen_unary (NOT, mode, op0, mode);
if (trueop0 == trueop1
&& ! side_effects_p (op0)
! && GET_MODE_CLASS (mode) != MODE_CC)
return const0_rtx;
/* Canonicalize XOR of the most significant bit to PLUS. */
--- 1950,1957 ----
return simplify_gen_unary (NOT, mode, op0, mode);
if (trueop0 == trueop1
&& ! side_effects_p (op0)
! && GET_MODE_CLASS (mode) != MODE_CC
! && ! VECTOR_MODE_P (mode))
return const0_rtx;
/* Canonicalize XOR of the most significant bit to PLUS. */
Is this the patch that I can pursue?
- Thanks, fariborz (fjahanian@apple.com)