Bug 13366 - ICE using MMX/SSE builtins with -O
Summary: ICE using MMX/SSE builtins with -O
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.4.0
: P2 normal
Target Milestone: 4.0.0
Assignee: Richard Henderson
URL:
Keywords: ice-on-valid-code, ssemmx
Depends on:
Blocks:
 
Reported: 2003-12-09 13:52 UTC by Jack Lloyd
Modified: 2005-01-11 21:53 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target:
Build:
Known to work:
Known to fail: 4.0.0
Last reconfirmed: 2004-11-24 06:35:07


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Lloyd 2003-12-09 13:52:41 UTC
I'm seeing an ICE with GCC 3.2.2 (RH9 build) and GCC 3.4 (20031126 snapshot)
for the following code:

-- CUT
typedef int v4hi __attribute__ ((mode(V4HI)));

int f(unsigned short n)
   {
   v4hi vec = { 0, 0, 1, n };
   v4hi hw = __builtin_ia32_pmulhw(vec, vec);
   return (__builtin_ia32_pextrw(hw,0));
   }
-- CUT

I'm seeing this:

$ gcc-3.4 -v
Reading specs from /usr/local/gcc-3.4-20031126/lib/gcc/i686-pc-linux-gnu/3.4/specs
Configured with: ../gcc-3.4-20031126/configure --prefix=/usr/local/gcc-3.4-20031126
Thread model: posix
gcc version 3.4 20031126 (experimental)

$ gcc-3.4 -O -msse -c bug.c 
bug.c: In function `f':

bug.c:8: error: unable to find a register to spill in class `GENERAL_REGS'
bug.c:8: error: this is the insn:
(insn 13 11 15 0 (parallel [
            (set (subreg:SI (reg/v:V4HI %mm0 [orig:61 vec ] [61]) 0)
                (and:SI (subreg:SI (reg/v:V4HI %mm0 [orig:61 vec ] [61]) 0)
                    (const_int -65536 [0xffff0000])))
            (clobber (reg:CC %eflags))
        ]) 197 {*andsi_1} (insn_list 11 (nil))
    (expr_list:REG_UNUSED (reg:CC %eflags)
        (nil)))
bug.c:8: internal compiler error: in spill_failure, at reload1.c:1854
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

The ICE with 3.2.2 appears to be basically the same (can't find a register to
spill), but has less detail, and anyway I doubt anyone wants to fix bugs in
3.2.2. This looks an awful lot like bug 9401, but that has been closed
as fixed since last May.

The ICE is only with -O; -O2/-O3 seem to be fine.

BTW, change the '1' in the initializer of 'vec' to produce another ICE (in
emit-rtl.c). This one does not show up with 3.2.2, but my 3.4 doesn't like that 
either.
Comment 1 Andrew Pinski 2003-12-09 16:07:59 UTC
Confirmed but not a regression (this was rejected before the ICE showed up).
Comment 2 Jan Hubicka 2003-12-29 12:06:13 UTC
The generic vector extensions does not work at all for i386, unrotunately.
Honza
Comment 3 Andrew Pinski 2004-08-19 00:31:09 UTC
I get a different ICE on the mainline:
t.c:1: warning: specifying vector types with __attribute__ ((mode)) is deprecated
t.c:1: warning: use __attribute__ ((vector_size)) instead
 f

t.c: In function `f':
t.c:8: error: unable to find a register to spill in class `GENERAL_REGS'
t.c:8: error: this is the insn:
(insn 14 13 16 0 (set (strict_low_part (subreg:HI (reg/v:V4HI 29 mm0 [orig:59 vec ] [59]) 0))
        (const_int 0 [0x0])) 43 {*movstricthi_1} (insn_list 13 (nil))
    (nil))
t.c:8: internal compiler error: in spill_failure, at reload1.c:1884
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
Comment 4 Uroš Bizjak 2004-12-20 14:55:26 UTC
The testcase from the description fails in the same way for current mainline:

gcc -O -msse pr13366.c 
pr13366.c:1: warning: specifying vector types with __attribute__ ((mode)) is
deprecated
pr13366.c:1: warning: use __attribute__ ((vector_size)) instead
pr13366.c: In function 'f':
pr13366.c:9: error: unable to find a register to spill in class 'GENERAL_REGS'
pr13366.c:9: error: this is the insn:
(insn 15 13 17 0 (parallel [
            (set (subreg:SI (reg/v:V4HI 29 mm0 [orig:59 vec ] [59]) 0)
                (and:SI (subreg:SI (reg/v:V4HI 29 mm0 [orig:59 vec ] [59]) 0)
                    (const_int -65536 [0xffff0000])))
            (clobber (reg:CC 17 flags))
        ]) 206 {*andsi_1} (insn_list:REG_DEP_TRUE 13 (nil))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
pr13366.c:9: internal compiler error: in spill_failure, at reload1.c:1873
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

BTW: Changing 1 as suggested in report to 5, does not change the ICE.

Comment 5 Uroš Bizjak 2004-12-20 15:05:51 UTC
Could info at http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02453.html help to
fix this bug?
Comment 6 lloyd 2004-12-20 15:11:25 UTC
Oops, my report on that was not clear. Change the 1 to a 0 to get a different
ICE, at least in whatever random 3.4.0 snapshot I have installed (20040107).
It's apparently a 0/!0 thing. I have not checked this on a 3.4 release or
mainline, though. Here is the what I see after changing the 1 to a 0:

ice.c: In function `f':
ice.c:8: internal compiler error: in subreg_hard_regno, at emit-rtl.c:1026
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
Comment 7 Uroš Bizjak 2004-12-20 15:39:07 UTC
Equivalent SSE2 version works OK:
typedef int v8hi __attribute__ ((mode (V8HI)));

int
f (unsigned short n)
{
  v8hi vec = { 0, 0, 0, 0, 0, 0, 1, n };
  v8hi hw = __builtin_ia32_pmulhw128 (vec, vec);
  return (__builtin_ia32_pextrw128 (hw, 0));
}

SSE2 example produces following RTL:
(insn 13 11 15 1 (set (reg/v:V8HI 59 [ vec ])
        (const_vector:V8HI [
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
                (const_int 0 [0x0])
            ])) -1 (nil)
    (nil))

(insn 15 13 16 1 (parallel [
            (set (subreg:SI (reg/v:V8HI 59 [ vec ]) 12)
                (and:SI (subreg:SI (reg/v:V8HI 59 [ vec ]) 12)
                    (const_int -65536 [0xffff0000])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))
...

and MMX version produces:

(insn 13 11 15 1 (clobber (reg/v:V4HI 59 [ vec ])) -1 (nil)
    (nil))

(insn 15 13 17 1 (parallel [
            (set (subreg:SI (reg/v:V4HI 59 [ vec ]) 0)
                (and:SI (subreg:SI (reg/v:V4HI 59 [ vec ]) 0)
                    (const_int -65536 [0xffff0000])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))
...

The trouble is in (insn 13). There is no setting of reg 59 to zero.

Also, mainline does not ICE for
v4hi vec = { 0, 0, 0, n };
and its SSE2 equivalent as suggested in comment #6 for both MMX and SSE2 versions.
Comment 8 CVS Commits 2005-01-11 21:33:29 UTC
Subject: Bug 13366

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-11 21:33:15

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: emmintrin.h i386-protos.h i386.c i386.h 
	                 mmintrin.h mmx.md pmmintrin.h predicates.md 
	                 sse.md xmmintrin.h 
Added files:
	gcc/testsuite/gcc.target/i386: pr13366.c 

Log message:
	PR target/13366
	* config/i386/i386.h (enum ix86_builtins): Move ...
	* config/i386/i386.c: ... here.
	(IX86_BUILTIN_MOVDDUP, IX86_BUILTIN_MMX_ZERO, IX86_BUILTIN_PEXTRW,
	IX86_BUILTIN_PINSRW, IX86_BUILTIN_LOADAPS, IX86_BUILTIN_LOADSS,
	IX86_BUILTIN_STORESS, IX86_BUILTIN_SSE_ZERO, IX86_BUILTIN_PEXTRW128,
	IX86_BUILTIN_PINSRW128, IX86_BUILTIN_LOADAPD, IX86_BUILTIN_LOADSD,
	IX86_BUILTIN_STOREAPD, IX86_BUILTIN_STORESD,  IX86_BUILTIN_STOREHPD,
	IX86_BUILTIN_STORELPD, IX86_BUILTIN_SETPD1, IX86_BUILTIN_SETPD,
	IX86_BUILTIN_CLRPD, IX86_BUILTIN_LOADPD1, IX86_BUILTIN_LOADRPD,
	IX86_BUILTIN_STOREPD1, IX86_BUILTIN_STORERPD, IX86_BUILTIN_LOADDQA,
	IX86_BUILTIN_STOREDQA, IX86_BUILTIN_CLRTI,
	IX86_BUILTIN_LOADDDUP): Remove.
	(IX86_BUILTIN_VEC_INIT_V2SI, IX86_BUILTIN_VEC_INIT_V4HI,
	IX86_BUILTIN_VEC_INIT_V8QI, IX86_BUILTIN_VEC_EXT_V2DF,
	IX86_BUILTIN_VEC_EXT_V2DI, IX86_BUILTIN_VEC_EXT_V4SF,
	IX86_BUILTIN_VEC_EXT_V8HI, IX86_BUILTIN_VEC_EXT_V4HI,
	IX86_BUILTIN_VEC_SET_V8HI, IX86_BUILTIN_VEC_SET_V4HI): New.
	(ix86_init_builtins): Make static.
	(ix86_init_mmx_sse_builtins): Update for changed builtins.
	(ix86_expand_binop_builtin): Only use ix86_fixup_binary_operands
	if all the modes match.  Otherwise, fake it.
	(get_element_number, ix86_expand_vec_init_builtin,
	ix86_expand_vec_ext_builtin, ix86_expand_vec_set_builtin): New.
	(ix86_expand_builtin): Make static.  Update for changed builtins.
	(ix86_expand_vector_move_misalign): Use sse2_loadlpd with zero
	operand instead of sse2_loadsd.  Cast sse1 fallback to V4SFmode.
	(ix86_expand_vector_init_duplicate): New.
	(ix86_expand_vector_init_low_nonzero): New.
	(ix86_expand_vector_init_one_var, ix86_expand_vector_init_general):
	Split out from ix86_expand_vector_init; handle integer modes.
	(ix86_expand_vector_init): Use them.
	(ix86_expand_vector_set, ix86_expand_vector_extract): New.
	* config/i386/i386-protos.h: Update.
	* config/i386/predicates.md (reg_or_0_operand): New.
	* config/i386/mmx.md (mov<MMXMODEI>_internal): Add 'r' variants.
	(movv2sf_internal): Likewise.  And a splitter to match them all.
	(vec_dupv2sf, mmx_concatv2sf, vec_setv2sf, vec_extractv2sf,
	vec_initv2sf, vec_dupv4hi, vec_dupv2si, mmx_concatv2si, vec_setv2si,
	vec_extractv2si, vec_initv2si, vec_setv4hi, vec_extractv4hi,
	vec_initv4hi, vec_setv8qi, vec_extractv8qi, vec_initv8qi): New.
	(mmx_pinsrw): Fix operand ordering.
	* config/i386/sse.md (movv4sf splitter): Use direct pattern,
	rather than sse_loadss expander.
	(movv2df splitter): Similarly.
	(sse_loadss, sse_loadlss): Remove.
	(vec_dupv4sf, sse_concatv2sf, sse_concatv4sf, vec_extractv4sf_0): New.
	(vec_setv4sf, vec_setv2df): Use ix86_expand_vector_set.
	(vec_extractv4sf, vec_extractv2df): Use ix86_expand_vector_extract.
	(sse3_movddup): Rename with '*'.
	(sse3_movddup splitter): Use gen_rtx_REG instead of gen_lowpart.
	(sse2_loadsd): Remove.
	(vec_dupv2df_sse3): Rename from sse3_loadddup.
	(vec_dupv2df, vec_concatv2df_sse3, vec_concatv2df): New.
	(sse2_pinsrw): Fix argument ordering.
	(sse2_loadld, sse2_loadq): Add sse1 alternatives.
	(sse2_stored): Remove 'r' destination.
	(vec_dupv4si, vec_dupv2di, sse2_concatv2si, sse1_concatv2si,
	vec_concatv4si_1, vec_concatv2di, vec_setv2di, vec_extractv2di,
	vec_initv2di, vec_setv4si, vec_extractv4si, vec_initv4si,
	vec_setv8hi, vec_extractv8hi, vec_initv8hi, vec_setv16qi,
	vec_extractv16qi, vec_initv16qi): New.
	
	* config/i386/emmintrin.h (__m128i, __m128d): Use typedef, not define.
	(_mm_set_sd, _mm_set1_pd, _mm_setzero_pd, _mm_set_epi64x,
	_mm_set_epi32, _mm_set_epi16, _mm_set_epi8, _mm_setzero_si128): Use
	constructor form.
	(_mm_load_pd, _mm_store_pd): Use plain dereference.
	(_mm_load_si128, _mm_store_si128): Likewise.
	(_mm_load1_pd): Use _mm_set1_pd.
	(_mm_load_sd): Use _mm_set_sd.
	(_mm_store_sd, _mm_storeh_pd): Use __builtin_ia32_vec_ext_v2df.
	(_mm_store1_pd, _mm_storer_pd): Use _mm_store_pd.
	(_mm_set_epi64): Use _mm_set_epi64x.
	(_mm_set1_epi64x, _mm_set1_epi64, _mm_set1_epi32, _mm_set_epi16,
	_mm_set1_epi8, _mm_setr_epi64, _mm_setr_epi32, _mm_setr_epi16,
	_mm_setr_epi8): Use _mm_set_foo form.
	(_mm_loadl_epi64, _mm_movpi64_epi64, _mm_move_epi64): Use _mm_set_epi64.
	(_mm_storel_epi64, _mm_movepi64_pi64): Use __builtin_ia32_vec_ext_v2di.
	(_mm_extract_epi16): Use __builtin_ia32_vec_ext_v8hi.
	(_mm_insert_epi16): Use __builtin_ia32_vec_set_v8hi.
	* config/i386/mmintrin.h (_mm_setzero_si64): Use plain cast.
	(_mm_set_pi32): Use __builtin_ia32_vec_init_v2si.
	(_mm_set_pi16): Use __builtin_ia32_vec_init_v4hi.
	(_mm_set_pi8): Use __builtin_ia32_vec_init_v8qi.
	(_mm_set1_pi16, _mm_set1_pi8): Use _mm_set_piN variant.
	* config/i386/pmmintrin.h (_mm_loaddup_pd): Use _mm_load1_pd.
	(_mm_movedup_pd): Use _mm_shuffle_pd.
	* config/i386/xmmintrin.h (_mm_setzero_ps, _mm_set_ss,
	_mm_set1_ps, _mm_set_ps, _mm_setr_ps): Use constructor form.
	(_mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps,
	_mm_cvtps_pi8, _mm_cvtpi32x2_ps): Avoid __builtin_ia32_mmx_zero;
	Use _mm_setzero_ps.
	(_mm_load_ss, _mm_load1_ps): Use _mm_set* form.
	(_mm_load_ps, _mm_loadr_ps): Use raw dereference.
	(_mm_store_ss): Use __builtin_ia32_vec_ext_v4sf.
	(_mm_store_ps): Use raw dereference.
	(_mm_store1_ps): Use _mm_storeu_ps.
	(_mm_storer_ps): Use _mm_store_ps.
	(_mm_extract_pi16): Use __builtin_ia32_vec_ext_v4hi.
	(_mm_insert_pi16): Use __builtin_ia32_vec_set_v4hi.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7095&r2=2.7096
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/emmintrin.h.diff?cvsroot=gcc&r1=1.9&r2=1.10
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386-protos.h.diff?cvsroot=gcc&r1=1.124&r2=1.125
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.c.diff?cvsroot=gcc&r1=1.773&r2=1.774
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.h.diff?cvsroot=gcc&r1=1.416&r2=1.417
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/mmintrin.h.diff?cvsroot=gcc&r1=1.14&r2=1.15
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/mmx.md.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/pmmintrin.h.diff?cvsroot=gcc&r1=1.4&r2=1.5
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/predicates.md.diff?cvsroot=gcc&r1=1.12&r2=1.13
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/sse.md.diff?cvsroot=gcc&r1=1.2&r2=1.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/xmmintrin.h.diff?cvsroot=gcc&r1=1.31&r2=1.32
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.target/i386/pr13366.c.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 9 Richard Henderson 2005-01-11 21:53:56 UTC
Fixed.  No chance of a backport to 3.4.  As a workaround, use _mm_set_pi16
instead of the explicit constructor.