Bug 27616 - [4.1/4.2 Regression] Infinite loop at -O1 and above in RTL CSE
Summary: [4.1/4.2 Regression] Infinite loop at -O1 and above in RTL CSE
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.1.0
: P1 normal
Target Milestone: 4.1.2
Assignee: Eric Botcazou
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: ice-on-valid-code, patch
: 28010 28569 (view as bug list)
Depends on:
Blocks: 28010 28569 28622
  Show dependency treegraph
 
Reported: 2006-05-15 15:45 UTC by Lee Ji Hwan
Modified: 2006-09-04 19:38 UTC (History)
9 users (show)

See Also:
Host:
Target: i686-pc-linux-gnu
Build:
Known to work: 4.0.3
Known to fail: 4.1.0 4.2.0
Last reconfirmed: 2006-08-06 20:15:17


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lee Ji Hwan 2006-05-15 15:45:54 UTC
===== gcc41bug.c =====
#include <stddef.h>

typedef struct chunk_s chunk_t;
struct chunk_s {
    unsigned int size;
    int offset_next;
};

#define GET_ADDRESS(baseptr, offset) \
    ((chunk_t *)((char *)(baseptr) + (offset)))

void
gcc_bug_test(chunk_t *first)
{
    chunk_t *cur;

    while (1) {
        cur = GET_ADDRESS(first, first->offset_next);

        if (GET_ADDRESS(first, cur->offset_next) == first)
            first->offset_next = 0;

        if (cur->size == 0)
            break;
    }
}
===== end of gcc41bug.c =====

command line:

$ i686-pc-linux-gnu-gcc-4.1.0 -c -O1 -o gcc41bug.o gcc41bug.c
i686-pc-linux-gnu-gcc-4.1.0: Internal error: Segmentation fault (program cc1)

1. Works fine without -O1.
2. Code might look meaningless, 
   but it's extracted part from a real application code

$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /var/tmp/portage/gcc-4.1.0/work/gcc-4.1.0/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.1.0 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.1.0/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.0 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.0/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.1.0/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.1.0/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
Thread model: posix
gcc version 4.1.0 (Gentoo 4.1.0)
Comment 1 Andrew Pinski 2006-05-15 15:51:34 UTC
#2  0x081a4404 in lookup_as_function (x=<value optimized out>, code=PLUS)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:1453
#3  0x081a58d0 in fold_rtx (x=<value optimized out>, insn=0x0)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:4213
#4  0x081a5a9e in fold_rtx (x=<value optimized out>, insn=0x0)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:3452
#5  0x081a58e4 in fold_rtx (x=<value optimized out>, insn=0x0)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:4218
#6  0x081a5a9e in fold_rtx (x=<value optimized out>, insn=0x0)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:3452
#7  0x081a58e4 in fold_rtx (x=<value optimized out>, insn=0x0)
    at /home/peshtigo/pinskia/src/gnu/gcc/src/gcc/cse.c:4218
#
Comment 2 Andrew Pinski 2006-05-15 15:53:09 UTC
Confirmed.
Comment 3 Andrew Pinski 2006-05-15 15:59:33 UTC
Reduced testcase (without any preprocessed macros or includes):
struct chunk_s {
    unsigned int size;
    int offset_next;
};
typedef struct chunk_s chunk_t;

void
gcc_bug_test(chunk_t *first)
{
  chunk_t * cur;
  char * first0;

do {
  first0 = (char *) first;
  cur = (chunk_t *) (first->offset_next + first0);
  if ((chunk_t *) (first0 + cur->offset_next) != first)
     return ;

  first->offset_next = 0;

  } while (cur->size != 0);
}
Comment 4 Kazu Hirata 2006-05-19 19:36:52 UTC
Infinite recursion is going on in CSE.  Here is what happens.

At cse.c:4278 in fold_rtx, we have

(gdb) call debug_rtx(x)
(plus:SI (reg/v/f:SI 60 [ first ])
         (const_int 4 [0x4]))

(gdb) call debug_rtx(y)
(plus:SI (reg/v/f:SI 60 [ first ])
         (mem/s/j:SI (plus:SI (reg/v/f:SI 59 [ cur ])
                              (const_int 4 [0x4])) [0 <variable>.offset_next+0 S4 A32]))
(gdb)

Here we call fold_rtx (XEXP (y, 1), 0), where XEXP (y, 1) is

  (mem/s/j:SI (plus:SI (reg/v/f:SI 59 [ cur ])
                       (const_int 4 [0x4])) [0 <variable>.offset_next+0 S4 A32])

fold_rtx delegates all processing of MEMs to fold_rtx_mem, so
fold_rtx_mem gets

  (mem/s/j:SI (plus:SI (reg/v/f:SI 59 [ cur ])
                       (const_int 4 [0x4])) [0 <variable>.offset_next+0 S4 A32])

fold_rtx_mem in turn calls fold_rtx with the address inside the MEM,
which is:

  (plus:SI (reg/v/f:SI 59 [ cur ])
           (const_int 4 [0x4])) [0 <variable>.offset_next+0 S4 A32])

fold_rtx later calls lookup_as_function on (reg/v/f:SI 59 [ cur ]),
which returns

  (plus:SI (reg/v/f:SI 60 [ first ])
           (mem/s/j:SI (plus:SI (reg/v/f:SI 60 [ first ])
                       (const_int 4 [0x4])) [0 <variable>.offset_next+0 S4 A32]))

which is the same as what we started with.
Comment 5 Mark Mitchell 2006-05-25 02:34:54 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 6 Steven Bosscher 2006-06-04 09:34:36 UTC
Would be fixed with fwprop due to not recursively calling fold_rtx.
Comment 7 Andrew Pinski 2006-07-07 13:36:29 UTC
Janis,
 Could you do a regression hunt on this bug for me?
Thanks,
Andrew Pinski
Comment 8 Janis Johnson 2006-07-14 00:20:46 UTC
A regression hunt on powerpc-linux using an i686-linux cross compiler identified this patch:

    http://gcc.gnu.org/viewcvs?view=rev&rev=102570

    r102570 | hubicka | 2005-07-29 22:22:41 +0000 (Fri, 29 Jul 2005)

The compiler didn't segfault on the test system, but it did run for a very long time so the test script used by the regression hunt timed out after 5 seconds.
Comment 9 Steven Bosscher 2006-07-14 17:02:28 UTC
The patch identified in comment #8 can't have caused the CSE problem, but it probably exposed a latent bug.
Comment 10 Eric Botcazou 2006-08-06 20:15:17 UTC
Investigating.
Comment 11 Eric Botcazou 2006-08-08 11:12:16 UTC
Ugh.  This is an oscillation with period 6 (not 3 as indicated in comment #3)
between fold_rtx and fold_rtx_mem:

#0  fold_rtx (x=0x55759dd4, insn=0x0) at /home/eric/svn/gcc/gcc/cse.c:3621
#1  0x081a640a in fold_rtx_mem (x=0x55759de0, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:3453
#2  0x081a8203 in fold_rtx (x=0x55759de0, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:3678
#3  0x081ad4d0 in fold_rtx (x=0x55759d8c, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:4283
#4  0x081a640a in fold_rtx_mem (x=0x55759d98, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:3453
#5  0x081a8203 in fold_rtx (x=0x55759d98, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:3678
#6  0x081ad4d0 in fold_rtx (x=0x55759dd4, insn=0x0)
    at /home/eric/svn/gcc/gcc/cse.c:4283

The fundamental problem is that fold_rtx operates recursively on its operands
after equivalencing them, unlike the tree folder.  More specifically:

  /* If we have (<op> <reg> <const_int>) for an associative OP and REG
     is known to be of similar form, we may be able to replace the
     operation with a combined operation.  This may eliminate the
     intermediate operation if every use is simplified in this way.
     Note that the similar optimization done by combine.c only works
     if the intermediate operation's result has only one reference.  */

This is a recipe for infinite loops and the code already contains a couple of
kludges against that:

	      /* If we have compiled a statement like
		 "if (x == (x & mask1))", and now are looking at
		 "x & mask2", we will have a case where the first operand
		 of Y is the same as our first operand.  Unless we detect
		 this case, an infinite loop will result.  */
	      if (XEXP (y, 0) == folded_arg0)
		break;
[...]
	      /* If Y contains our first operand (the most common way this
		 can happen is if Y is a MEM), we would do into an infinite
		 loop if we tried to fold it.  So don't in that case.  */

	      if (! reg_mentioned_p (folded_arg0, y))
		y = fold_rtx (y, insn);

I have another one for the -O1 case but it falls apart at -O2 because PRE
hoists an equivalencing insn out of the extended BB under consideration.

The problematic pattern is

  set (reg1)
      (plus (reg)
            (mem (plus (reg2) (const_int))))

  set (reg2)
      (plus (reg)
            (mem (plus (reg1) (const_int))))

so in particular it defeats all the first order kludges like the above two.
Comment 12 Andrew Pinski 2006-08-08 18:33:06 UTC
*** Bug 28010 has been marked as a duplicate of this bug. ***
Comment 13 Eric Botcazou 2006-08-08 18:52:42 UTC
*** Bug 28569 has been marked as a duplicate of this bug. ***
Comment 14 Eric Botcazou 2006-09-04 19:33:31 UTC
Subject: Bug 27616

Author: ebotcazou
Date: Mon Sep  4 19:33:24 2006
New Revision: 116683

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116683
Log:
	PR rtl-optimization/27616
	* cse.c (table_size): New static variable.
	(new_basic_block): Initialize it to 0.
	(remove_from_table): Decrement it.
	(insert): Increment it.
	(fold_rtx_mem_1): New function, renamed from fold_rtx_mem.
	(fold_rtx_mem): Enforce a cap on the recursion depth.  Call
	fold_rtx_mem_1 if under the cap.
	(fold_rtx) <RTX_COMM_ARITH>: In the associative case, delay a little
	the lookup of the equivalent expression and test for equality of the
	first operand of the equivalent expression before in turn looking up
	an equivalent constant for the second operand.


Added:
    trunk/gcc/testsuite/gcc.c-torture/compile/20060904-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cse.c
    trunk/gcc/testsuite/ChangeLog

Comment 15 Eric Botcazou 2006-09-04 19:35:16 UTC
Subject: Bug 27616

Author: ebotcazou
Date: Mon Sep  4 19:35:09 2006
New Revision: 116684

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116684
Log:
	PR rtl-optimization/27616
	* cse.c (table_size): New static variable.
	(new_basic_block): Initialize it to 0.
	(remove_from_table): Decrement it.
	(insert): Increment it.
	(fold_rtx_mem_1): New function, renamed from fold_rtx_mem.
	(fold_rtx_mem): Enforce a cap on the recursion depth.  Call
	fold_rtx_mem_1 if under the cap.
	(fold_rtx) <RTX_COMM_ARITH>: In the associative case, delay a little
	the lookup of the equivalent expression and test for equality of the
	first operand of the equivalent expression before in turn looking up
	an equivalent constant for the second operand.


Added:
    branches/gcc-4_1-branch/gcc/testsuite/gcc.c-torture/compile/20060904-1.c
Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/cse.c
    branches/gcc-4_1-branch/gcc/testsuite/ChangeLog

Comment 16 Eric Botcazou 2006-09-04 19:38:16 UTC
Fixed everywhere.