Bug 24160 - [4.1 Regression] ICE with -O1 -ftree-vectorize -msse
Summary: [4.1 Regression] ICE with -O1 -ftree-vectorize -msse
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.1.0
: P1 normal
Target Milestone: 4.1.0
Assignee: Richard Henderson
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: ice-on-valid-code, patch, ra, ssemmx
Depends on:
Blocks:
 
Reported: 2005-10-01 12:56 UTC by Martin Drab
Modified: 2005-11-16 17:27 UTC (History)
4 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-11-02 02:46:37


Attachments
This triggers the bug (14.81 KB, text/plain)
2005-10-01 12:57 UTC, Martin Drab
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Drab 2005-10-01 12:56:23 UTC
The example (attached below), when compiled by following gcc

-------------------
$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../../../gcc-CVS-20050930/gcc-CVS-20050930/configure
--host=i686-pc-linux-gnu --prefix=/usr/local/opt/gcc-4.1
--exec-prefix=/usr/local/opt/gcc-4.1 --sysconfdir=/etc
--libdir=/usr/local/opt/gcc-4.1/lib --libexecdir=/usr/local/opt/gcc-4.1/libexec
--sharedstatedir=/var --localstatedir=/var --program-suffix=-4.1
--with-x-includes=/usr/X11R6/include --with-x-libraries=/usr/X11R6/lib
--enable-shared --enable-static --with-gnu-as --with-gnu-ld --with-stabs
--enable-threads=posix --enable-version-specific-runtime-libs --disable-coverage
--disable-libgcj --disable-checking --enable-multilib --with-x --enable-cmath
--enable-libstdcxx-debug --enable-fast-character --enable-hash-synchronization
--with-system-zlib --with-libbanshee --with-demangler-in-ld
--with-arch=athlon-xp --disable-libada --enable-languages=c,c++,f95,objc
Thread model: posix
gcc version 4.1.0 20050930 (experimental)
-------------------

with

-------------------
gcc -c -O3 -finline-limit=1024 -ftree-vectorize -ftracer -o minilzo.o minilzo.c
-------------------

results in this:

-------------------
native/minilzo.c: In function ‘__lzo_init2’:
native/minilzo.c:1278: error: unable to generate reloads for:
(insn:HI 394 392 396 29 (set (reg:V4SI 21 xmm0 [orig:93 vect_cst_.1512 ] [93])
        (vec_duplicate:V4SI (plus:SI (reg/f:SI 6 bp)
                (const_int -104 [0xffffff98])))) 757 {*vec_dupv4si} (nil)
    (nil))
native/minilzo.c:1278: internal compiler error: in find_reloads, at reload.c:3730
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
-------------------

This only fails on x86, on x86-64 it works. Also I did some checkings with some
older CVS's that I still have allready compiled and it seems that the regression
was introduced somewhere between CVS-20050723 (which works) and CVS-20050812
(which doesn't).
Comment 1 Martin Drab 2005-10-01 12:57:21 UTC
Created attachment 9855 [details]
This triggers the bug
Comment 2 Martin Drab 2005-10-01 13:05:16 UTC
(In reply to comment #0)
...
> -------------------
> gcc -c -O3 -finline-limit=1024 -ftree-vectorize -ftracer -o minilzo.o minilzo.c
> -------------------

It works when you either remove any of the options above, or lower the -O number
or lower the -finline-limit under 590 (but I suspect that would be specific to
the  test code).
Comment 3 Serge Belyshev 2005-10-01 14:36:48 UTC
// Confirmed, reduced testcase (compile with -O1 -ftree-vectorize -msse):

void foo (char **p)
{
  int i;
  char q [1024], *r;
  r = q + ((((unsigned long) q + 3) / 4) * 4) - (unsigned long) q;
  for (i = 0; i < 10; i++)
    p [i] = r;
}
Comment 4 Uroš Bizjak 2005-10-03 14:46:11 UTC
This one looks like real RA problem to me. For some reason, global register allocator is not allocating an xmm register to pseudo 76. This further leads to malformed RTL pattern as shown in the original report.

There is something wrong either in the calculation of reg_renumber[] array or the problem is in alter_reg() function. In alter_reg()  pseudo 76 is claimed to be a constant, so no stack slot is produced.
Comment 5 janis187 2005-10-03 22:26:38 UTC
A regression hunt using an i686-linux cross compiler with the testcase from
comment #3 identifies this patch from phython@gcc.gnu.org:

  http://gcc.gnu.org/ml/gcc-cvs/2005-03/msg00534.html

That doesn't fit with the submitter's information that it worked on 20050723
and the patch doesn't seem particularly relevant to this bug, so I tried
several builds around the date of this patch and afterwards; there are rare
intermittent passes after this patch, but no failures (that I saw) before it.
Comment 6 Martin Drab 2005-10-04 09:06:53 UTC
(In reply to comment #5)

OK, then there are two questions:

1) Is the testcase from Comment #3 really hitting the same bug? since when I compile my original test case with -O1 -ftree-vectorize -msse it works, but then again the reduction might cause the differences.

2) I remember I found this (my original test case) as a bug long before the CVS-20050812 (which I reported as first tested non-working) (unfortunatelly I don't remember which CVS it was, but may be somewhere around what you found (?)), but then all of a sudden it began to work again a bit later. So I didn't report it back then. But then a little later (first noted at the reported CVS-20050812) it broke again and is broken ever since. So something must have done it back then between those CVS-20050723 and CVS-20050812 again.
Comment 7 Uroš Bizjak 2005-10-04 11:38:06 UTC
(In reply to comment #6)

> 
> 1) Is the testcase from Comment #3 really hitting the same bug? since when I
> compile my original test case with -O1 -ftree-vectorize -msse it works, but
> then again the reduction might cause the differences.

  Yes, it is the same bug. You have to trick gcc into this situation, where initialization is in different BB that consumer (from .c.36.lreg):

(insn:HI 289 288 294 15 (set (reg/v/f:SI 108 [ wrkmem ])
        (plus:SI (reg/f:SI 20 frame)
            (const_int -92 [0xffffffa4]))) 144 {*lea_1} (nil)
    (expr_list:REG_EQUIV (plus:SI (reg/f:SI 20 frame)
            (const_int -92 [0xffffffa4]))
        (nil)))


And in another BB:

;; Start of basic block 24, registers live: 6 [bp] 7 [sp] 16 [argp] 20 [frame] 59 94 95 96 97 98 108
(code_label:HI 396 395 397 24 439 "" [1 uses])

(note:HI 397 396 399 24 [bb 24] NOTE_INSN_BASIC_BLOCK)

(insn:HI 399 397 401 24 (set (reg:V4SI 93 [ vect_cst_.1512 ])
        (vec_duplicate:V4SI (reg/v/f:SI 108 [ wrkmem ]))) 757 {*vec_dupv4si} (nil)
    (nil))

Reg 108 is now considered global, so it isn't assigned during lreg pass.
However, reg 108 initalization is fould by greg pass (just before it breaks):

.c.37.greg:

init_insns for 108: (insn_list:REG_DEP_TRUE 289 (nil))

The combination of insn 399 and insn 289 looks suspiciously close to malformed RTL pattern. In reload.c/reload1.c, there is plenty of code special-cased to
(frame_ptr + constant) that could produce malformed RTL that is shown in bugreport.

As further evidence, using -fomit-frame-pointer, I was not able to produce an ICE for any testcase in this bugreport.
Comment 8 Uroš Bizjak 2005-10-04 11:44:47 UTC
(In reply to comment #5)
> A regression hunt using an i686-linux cross compiler with the testcase from
> comment #3 identifies this patch from phython@gcc.gnu.org:
> 
>   http://gcc.gnu.org/ml/gcc-cvs/2005-03/msg00534.html

No, this patch is not to blame. I have disabled the transformation (by adding && 0 to the if condition), and the testcase in comment #3 still ICEs.
Comment 9 Uroš Bizjak 2005-10-04 12:45:35 UTC
The problem here is that we are out of GENERAL_REGS at the point of. This can be seen in code, produced with -fomit-frame-pointer:

...
.L4:
        movl    1052(%esp), %edx
        movl    %ebp, (%edx,%ecx,4)         <<<<< load to %ebp
        incl    %ecx
        cmpl    %ecx, %eax
        ja      .L4
        movl    $10, %edx
        movl    %edx, %esi
        subl    %ecx, %esi
        cmpl    $10, %eax
        je      .L13
        subl    %eax, %edx
        movl    %edx, 4(%esp)
        movl    %edx, %ebx
        shrl    $2, %ebx
        movl    %ebx, %edi
        sall    $2, %edi
        je      .L8
.L9:
        movl    %ebp, (%esp)          <<< here %ebp is spilled to stack
        movss   (%esp), %xmm1
        shufps  $0, %xmm1, %xmm1
        movaps  %xmm1, %xmm0
        movl    1052(%esp), %edx
...

It looks that register allocator is running out of registers. It allocates the last available reg, %ebp, that is usually used as a frame pointer. x86_64 target doesn't have this problem, as it has more registers (and is -fomit-frame-pointer by default).

This bug is unfortunatelly not fixable in the i386 backend without fixing current register allocator.

BTW: Peerhaps a tree expert could look into an optimized tree dump, maybe something can be done there.
Comment 10 Uroš Bizjak 2005-10-04 12:57:30 UTC
(In reply to comment #9)

Sorry for typing too fast... Of course, load of %ebp is here:

foo:
	pushl	%ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	subl	$1032, %esp
	leal	8(%esp), %ebp            <<<<<<< load of %ebp
	movl	1052(%esp), %eax
	andl	$15, %eax
	shrl	$2, %eax
	negl	%eax
	andl	$3, %eax
	je	.L2
	movl	$0, %ecx
.L4:
	movl	1052(%esp), %edx
	movl	%ebp, (%edx,%ecx,4)

And just for fun, without vectorization, following code is produced:

foo:
	subl	$1024, %esp
	movl	1028(%esp), %ecx
	movl	%esp, %edx
	movl	$1, %eax
.L2:
	movl	%edx, -4(%ecx,%eax,4)
	incl	%eax
	cmpl	$11, %eax
	jne	.L2
	addl	$1024, %esp
	ret


Comment 11 Andrew Pinski 2005-10-04 13:10:43 UTC
> BTW: Peerhaps a tree expert could look into an optimized tree dump, maybe
> something can be done there.

I doubt it.  Unless you find that:
  q.0 = (int) &q;
  r = &q + (char *) (((long unsigned int) q.0 + 3) / 4 * 4) - &q;

Is a constant, for the reduced testcase that is).  This is a reload/ra bug and really should be fixed there and not worked around anywhere else.
Well &q - &q == 0, so for the reduced testcase, there is some improvements with a new re-association but it would only help the issue a little bit.
Comment 12 Martin Drab 2005-10-04 14:38:15 UTC
(In reply to comment #7)
> As further evidence, using -fomit-frame-pointer, I was not able to produce an
> ICE for any testcase in this bugreport.

I was under the impression that -fomit-frame-pointer is enabled by -O2 or higher   for x86 as well. Is that not the case? 

Comment 13 Mark Mitchell 2005-10-31 06:00:41 UTC
This is a showstopper.
Comment 14 Richard Henderson 2005-11-16 17:23:27 UTC
Subject: Bug 24160

Author: rth
Date: Wed Nov 16 17:23:23 2005
New Revision: 107093

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=107093
Log:
        PR rtl-opt/24160
        PR target/24621
        * reload1.c (reg_equiv_invariant): New.
        (reload): Allocate, initialize, and free it.
        (calculate_needs_all_insns): Check it when skipping equivalence
        setting insns.
        (alter_reg): Likewise.
        (eliminate_regs_1): Rename from eliminate_regs.  Add new
        may_use_invariant argument; only use reg_equiv_invariant when true.
        (eliminate_regs): New.
        (eliminate_regs_in_insn): Use eliminate_regs_1; track when we're in
        a context for which may_use_invariant may be true.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/reload1.c

Comment 15 Richard Henderson 2005-11-16 17:27:33 UTC
Fixed.