Bug 65427 - [4.9 Regression] ICE in emit_move_insn with wide vector types
Summary: [4.9 Regression] ICE in emit_move_insn with wide vector types
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.2
: P3 normal
Target Milestone: 5.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-14 19:31 UTC by Alexander Peslyak
Modified: 2016-08-03 10:52 UTC (History)
3 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work: 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0, 4.5.0, 4.6.0, 4.6.2, 5.0
Known to fail: 4.7.0, 4.7.4, 4.8.0, 4.8.4, 4.9.0, 4.9.2
Last reconfirmed: 2015-03-16 00:00:00


Attachments
testcase (3.02 KB, text/x-csrc)
2015-03-14 19:31 UTC, Alexander Peslyak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Peslyak 2015-03-14 19:31:38 UTC
Created attachment 35037 [details]
testcase

GCC 4.7.0 through at least 4.9.2 and 5.0 20150215 snapshot (I haven't tested newer ones) fails with ICE when compiling the attached md5slice.c testcase on and for Linux x86_64:

$ gcc md5slice.c -o md5slice -O2 -DVECTOR -Wno-attributes -ftree-loop-vectorize
md5slice.c: In function 'GG':
md5slice.c:302:27: internal compiler error: in emit_move_insn, at expr.c:3609
 static MAYBE_INLINE3 void GG(a, b, c, d, x, s, ac)
                           ^
0x6974d2 emit_move_insn(rtx_def*, rtx_def*)
        ../../gcc/expr.c:3608
0x5e5294 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3288
0x5e5294 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3322
0x5e589b expand_gimple_basic_block
        ../../gcc/cfgexpand.c:5162
0x5e7b56 gimple_expand_cfg
        ../../gcc/cfgexpand.c:5741
0x5e7b56 execute
        ../../gcc/cfgexpand.c:5961

Without -ftree-loop-vectorize, compilation succeeds.  With -O3, it fails slightly differently:

$ gcc md5slice.c -o md5slice -O3 -DVECTOR -Wno-attributes 
md5slice.c: In function 'II.constprop':
md5slice.c:328:27: internal compiler error: in emit_move_insn, at expr.c:3609
 static MAYBE_INLINE3 void II(a, b, c, d, x, s, ac)
                           ^
0x6974d2 emit_move_insn(rtx_def*, rtx_def*)
        ../../gcc/expr.c:3608
0x5e5294 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3288
0x5e5294 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3322
0x5e589b expand_gimple_basic_block
        ../../gcc/cfgexpand.c:5162
0x5e7b56 gimple_expand_cfg
        ../../gcc/cfgexpand.c:5741
0x5e7b56 execute
        ../../gcc/cfgexpand.c:5961

With -mavx or -mavx2, it succeeds (despite of -O3).

GCC 4.7.0 does not have the -ftree-loop-vectorize option, but a similar problem is seen with -O3:

$ gcc md5slice.c -o md5slice -O3 -DVECTOR -Wno-attributes
md5slice.c: In function 'GG':
md5slice.c:302:27: internal compiler error: in emit_move_insn, at expr.c:3435

So far, all of this is with:

typedef element vector __attribute__ ((vector_size (32)));

on line 41.  Reducing the vector width to 16 makes the plain SSE2 compilation succeed with any optimizations.  Conversely, increasing the vector width to 64 makes compilation to fail even with AVX/AVX2 enabled.

Ideally, when the vector type width is in excess of the current target architecture's native SIMD vector width, GCC should transparently split it into multiple sub-vectors of the natively supported width.  This is useful not only for being able to build/use wider-vector source code for/on older CPUs, but also to hide instruction latencies by having the compiler interleave operations on the sub-vectors due to the extra parallelism the excessive vector width provides.  For example, once this is supported 32 could actually work faster than 16 on SSE2, and 64 faster than 32 on AVX2, for some applications (as long as the register pressure does not become too high).

Failing that, at least the compiler should report that this is unsupported, rather than fail with an ICE.

With GCC 4.6.2 and older, the ICE does not occur, for the rather unfortunate reason that (at least for me) these versions generate scalar code (so ~10x slower) when the type's vector width exceeds what's supported natively.
Comment 1 Jakub Jelinek 2015-03-16 11:13:57 UTC
r178392 works, r178445 already ICEs, so r178408 looks like the most probable candidate.
Reduced testcase for -O2 -ftree-vectorize:
typedef int V __attribute__ ((vector_size (32)));
V a, d, e, f;

void
foo (int b, int c)
{
  do
    {
      if (b)
	f = a ^ d;
      else
	f = e = a ^ d;
    }
  while (c);
}
Comment 2 Jakub Jelinek 2015-03-16 11:39:19 UTC
I think the bug is that tree-vect-generic.c doesn't lower COND_EXPRs, only VEC_COND_EXPRs.
Comment 3 Jakub Jelinek 2015-03-16 18:51:14 UTC
Author: jakub
Date: Mon Mar 16 18:50:43 2015
New Revision: 221464

URL: https://gcc.gnu.org/viewcvs?rev=221464&root=gcc&view=rev
Log:
	PR tree-optimization/65427
	* tree-vect-generic.c (do_cond, expand_vector_scalar_condition): New
	functions.
	(expand_vector_operations_1): Handle BLKmode vector COND_EXPR.

	* gcc.c-torture/execute/pr65427.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65427.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-generic.c
Comment 4 Jakub Jelinek 2015-03-16 18:52:00 UTC
Fixed on the trunk so far.
Comment 5 James Greenhalgh 2015-03-18 15:01:36 UTC
The new test is causing an ICE when compiled for size on arm-none-linux-gnueabihf, though this might just be exposing something latent in the ARM back-end. Of the configure options, the only meaningful ones to trigger the ICE are those which turn on NEON support (-mfpu=neon, -mfloat-abi=hard )

./gcc-install/bin/gcc -v -Os gcc-src/gcc/testsuite/gcc.c-torture/execute/pr65427.c -mfloat-abi=hard -mfpu=neon
Using built-in specs.
COLLECT_GCC=./gcc-install/bin/gcc
COLLECT_LTO_WRAPPER=/work/jamgre01/gcc-install/bin/../libexec/gcc/armv7l-unknown-linux-gnueabihf/5.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: /work/jamgre01//gcc-src/configure --with-cpu=cortex-a9 --with-fpu=neon-fp16 --with-mode=thumb --with-float=hard --enable-languages=c,c++,fortran --prefix=/work/jamgre01//gcc-install --with-build-config=bootstrap-time
Thread model: posix
gcc version 5.0.0 20150317 (experimental) (GCC) 
<snip>
gcc-src/gcc/testsuite/gcc.c-torture/execute/pr65427.c: In function ‘foo’:
gcc-src/gcc/testsuite/gcc.c-torture/execute/pr65427.c:17:1: internal compiler error: in process_insert_insn, at gcse.c:2174
 }
 ^
0x337447 process_insert_insn
        /work/jamgre01//gcc-src/gcc/gcse.c:2174
0x33829f insert_insn_end_basic_block
        /work/jamgre01//gcc-src/gcc/gcse.c:2196
0x33a073 hoist_code
        /work/jamgre01//gcc-src/gcc/gcse.c:3492
0x33a073 one_code_hoisting_pass
        /work/jamgre01//gcc-src/gcc/gcse.c:3722
0x33a073 execute_rtl_hoist
        /work/jamgre01//gcc-src/gcc/gcse.c:4212
0x33a073 execute
        /work/jamgre01//gcc-src/gcc/gcse.c:4293
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
Comment 6 ktkachov 2015-03-18 17:34:28 UTC
(In reply to James Greenhalgh from comment #5)
> The new test is causing an ICE when compiled for size on
> arm-none-linux-gnueabihf, though this might just be exposing something
> latent in the ARM back-end. Of the configure options, the only meaningful
> ones to trigger the ICE are those which turn on NEON support (-mfpu=neon,
> -mfloat-abi=hard )

As mentioned on gcc-patches, that's a separate issue. I'm testing a patch
Comment 7 ktkachov 2015-03-18 17:35:00 UTC
Keeping it open for potential backports...
Comment 8 Jakub Jelinek 2015-06-03 15:27:28 UTC
Author: jakub
Date: Wed Jun  3 15:26:56 2015
New Revision: 224086

URL: https://gcc.gnu.org/viewcvs?rev=224086&root=gcc&view=rev
Log:
	Backported from mainline
	2015-03-16  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/65427
	* tree-vect-generic.c (do_cond, expand_vector_scalar_condition): New
	functions.
	(expand_vector_operations_1): Handle BLKmode vector COND_EXPR.

	* gcc.c-torture/execute/pr65427.c: New test.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.c-torture/execute/pr65427.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_9-branch/gcc/tree-vect-generic.c
Comment 9 Jakub Jelinek 2015-06-03 21:42:52 UTC
Fixed for 4.9.3 now, backporting to 4.8 is much harder, as the code changed significantly.
Comment 10 Richard Biener 2015-06-23 08:17:07 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 11 Jakub Jelinek 2015-06-26 19:58:13 UTC
GCC 4.9.3 has been released.
Comment 12 Richard Biener 2016-08-03 10:52:24 UTC
Fixed in GCC 5+