21734 – [4.1 regression] ICE: -ftree-vectorize, segfault

Bug 21734 - [4.1 regression] ICE: -ftree-vectorize, segfault

Summary: [4.1 regression] ICE: -ftree-vectorize, segfault

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.1.0

Importance:	P2 normal
Target Milestone:	4.1.0
Assignee:	Dorit Naishlos

URL:
Keywords:	ice-on-valid-code, monitored

Duplicates (1):	21851 (view as bug list)
Depends on:
Blocks:

Reported:	2005-05-24 08:00 UTC by Stefaan De Roeck
Modified:	2005-06-02 18:59 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:	4.0.0
Known to fail:	4.1.0
Last reconfirmed:	2005-05-24 14:07:09

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefaan De Roeck 2005-05-24 08:00:16 UTC

Segfault on compilation of following source:

struct _matrix {
  double lineardata[4 * 4];
  double & operator()(int row, int col = 0) {
    return lineardata[col * 4 + row];
  }
};

struct matrix : public _matrix {
  typedef _matrix parent;
  double & operator()(int row, int col = 0)
    { return parent::operator()(row,col); }
};

void add(matrix & __restrict in1, matrix & __restrict in2, matrix & __restrict
result) {
  for (int col=0; col<4; ++col)
    for (int row=0; row<4; ++row)
      result(row, col) = in1(row, col) + in2(row, col);
}

--- end of source ---


Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /esat/alexandria1/sderoeck/src/gcc/main/configure
--prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++
Thread model: posix
gcc version 4.1.0 20050523 (experimental)
 /esat/olympia/install/libexec/gcc/i686-pc-linux-gnu/4.1.0/cc1plus -quiet -v
-I/users/visics/sderoeck/projects/clean/CaveIn/Whistler -D_GNU_SOURCE test13.cpp
-quiet -dumpbasetest13.cpp -march=pentium4 -auxbase-strip test13.S -O9 -version
-fverbose-asm -fdump-tree-vect-stats -fdump-tree-vect-details -funroll-all-loops
-ftree-vectorize -o test13.S
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory
"/esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /users/visics/sderoeck/projects/clean/CaveIn/Whistler
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0/i686-pc-linux-gnu
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0/backward
 /esat/olympia/install/include
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/include
 /usr/include
End of search list.
GNU C++ version 4.1.0 20050523 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 4.1.0 20050523 (experimental).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 99f1919b6616caa4cacd62f3245b1e14
test13.cpp: In function 'void add(matrix&, matrix&, matrix&)':
test13.cpp:14: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

Comment 1 Volker Reichelt 2005-05-24 14:07:08 UTC

Confirmed.

Reduced testcase (-O2 -ftree-vectorize -march=pentium4):

=========================================================
struct A
{
  int a[4];
  int& operator[](int i) { return a[i]; }
};

struct B : public A
{
  int& operator[](int i) { return A::operator[](i); }
};

void foo(B &b)
{
  for (int i=0; i<4; ++i)
    b[i] = 0;
}
=========================================================

Comment 2 Andrew Pinski 2005-05-24 14:21:24 UTC

Here is a C testcase:
struct a
{
  int aa[4];
};
struct b
{
  struct a aa;
};

void foo(struct b *bb)
{
  int i;
  for (i=0; i<4; ++i)
    {
      struct a *aa = &bb->aa;
      struct a *aa1 = aa;
      struct a *aa2 = aa1;
      int *d = &aa2->aa[i];
      *d = 0;
    }
}

I think this is caused by the forwprop changes.  (Notice the extra variables to reproduce this bug).

Comment 3 Jeffrey A. Law 2005-05-25 05:55:23 UTC

Subject: Re:  [4.1 regression] ICE:
	-ftree-vectorize, segfault

On Tue, 2005-05-24 at 14:21 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-24 14:21 -------
> Here is a C testcase:
> struct a
> {
>   int aa[4];
> };
> struct b
> {
>   struct a aa;
> };
> 
> void foo(struct b *bb)
> {
>   int i;
>   for (i=0; i<4; ++i)
>     {
>       struct a *aa = &bb->aa;
>       struct a *aa1 = aa;
>       struct a *aa2 = aa1;
>       int *d = &aa2->aa[i];
>       *d = 0;
>     }
> }
> 
> I think this is caused by the forwprop changes.  (Notice the extra variables to reproduce this bug).
Actually, this is a bug in the vectorizer code to update PHIs in duplicate
loops.  forwprop AFAICT is just exposing the latent vectorizer bug.

>From what I've been able to determine so far, the vectorizer code to
update PHIs doesn't properly handle the case where a PHI is dead.  It
is likely someone with more experience in the vectorizer code is going
to need to fix this.


Jeff

Comment 4 Dorit Naishlos 2005-05-25 07:28:33 UTC

> Actually, this is a bug in the vectorizer code to update PHIs in duplicate
> loops.  forwprop AFAICT is just exposing the latent vectorizer bug.
> >From what I've been able to determine so far, the vectorizer code to
> update PHIs doesn't properly handle the case where a PHI is dead.  It
> is likely someone with more experience in the vectorizer code is going
> to need to fix this.

I'll take a look

Comment 5 Giovanni Bajo 2005-05-25 18:43:39 UTC

Assigning to Dorit.

Comment 6 Jeffrey A. Law 2005-05-25 21:43:37 UTC

Subject: Re:  [4.1 regression] ICE:
	-ftree-vectorize, segfault

On Wed, 2005-05-25 at 07:28 +0000, dorit at il dot ibm dot com wrote:
> ------- Additional Comments From dorit at il dot ibm dot com  2005-05-25 07:28 -------
> > Actually, this is a bug in the vectorizer code to update PHIs in duplicate
> > loops.  forwprop AFAICT is just exposing the latent vectorizer bug.
> > >From what I've been able to determine so far, the vectorizer code to
> > update PHIs doesn't properly handle the case where a PHI is dead.  It
> > is likely someone with more experience in the vectorizer code is going
> > to need to fix this.
> 
> I'll take a look
Thanks.  I suspect we have multiple problems with the PHI updates.

I think I know how to fix the code for updating PHIs in the duplicate
loop, but I haven't really slogged through the guard code to figure
out how PHIs in the guards should be updated.

In the case of the duplicate loop we have the following code:

> 
>   /* Scan the phis in the headers of the old and new loops
     (they are organized in exactly the same order).  */

  for (phi_new = phi_nodes (new_loop->header),
       phi_orig = phi_nodes (orig_loop->header);
       phi_new && phi_orig;
       phi_new = PHI_CHAIN (phi_new), phi_orig = PHI_CHAIN (phi_orig))
    {
      /* step 1.  */
      def = PHI_ARG_DEF_FROM_EDGE (phi_orig, entry_arg_e);
      add_phi_arg (phi_new, def, new_loop_entry_e);

      /* step 2.  */
      def = PHI_ARG_DEF_FROM_EDGE (phi_orig, orig_loop_latch);
      if (TREE_CODE (def) != SSA_NAME)
        continue;

      new_ssa_name = get_current_def (def);
      if (!new_ssa_name)
        /* Something defined outside of the loop.  */
        continue;

      /* An ordinary ssa name defined in the loop.  */
      add_phi_arg (phi_new, new_ssa_name, loop_latch_edge (new_loop));

      /* step 3 (case 1).  */
      if (!after)
        {
          gcc_assert (new_loop_exit_e == orig_entry_e);
          SET_PHI_ARG_DEF (phi_orig,
                           new_loop_exit_e->dest_idx,
                           new_ssa_name);
        }

Note that if !new_ssa_name, we continue the loop without ever
adding the PHI argument.  The net result being that we have a
PHI where PHI_ARG_DEF for one of the PHI's incoming edges is null.

I'm pretty sure that this can only happen if the result of the
PHI is not set anywhere in the loop.  In that case the PHI
argument in question should be the same SSA_NAME as the PHI_RESULT
[ ie, we ultimately end up generating a degenerate phi of the form

  x_3 = PHI (x_3 (latch edge), x_2 (initial value from entry edge))

What I don't know yet is if the problem is really that we haven't
set up the current def properly (thus causing get_current_def to
return NULL) or if we just need code to compensate for this
situation in slpeel_update_phis_for_duplicate_loop.

Thoughts?

Jeff

Comment 7 Dorit Naishlos 2005-05-30 14:35:06 UTC

I can't reproduce this ICE with mainline snapshot from today. (I was able to 
reproduce it a few days ago, but not anymore).

Comment 8 Stefaan De Roeck 2005-05-30 19:21:44 UTC

confirmed, I cannot reproduce with the given testcase either.  But my original
source code still triggers a (possibly the same) bug.  I've extracted a new
testcase:

struct M {
  double data[16];
  double* operator[](int row){ return &data[row*4]; };
  void set() {
    for (int i=0;i<16;++i)
      data[i]=0.0;
  }
};

struct A {
  M m1;
  void test();
};

void A::test() {
  M m2;
  m2[2][2]=0.;
  m1.set();
}

Comment 9 Stefaan De Roeck 2005-05-31 10:13:45 UTC

For the sake of completeness. the error produced with the new testcase:

Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /esat/alexandria1/sderoeck/src/gcc/main/configure
--prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++
Thread model: posix
gcc version 4.1.0 20050530 (experimental)
 /esat/olympia/install/libexec/gcc/i686-pc-linux-gnu/4.1.0/cc1plus -quiet -v
-D_GNU_SOURCE sweepobject.cpp -quiet -dumpbase sweepobject.cpp -march=pentium4
-auxbase-strip sweepx.o -O2 -version -ftree-vectorize -o /tmp/ccKIKNVA.s
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory
"/esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0/i686-pc-linux-gnu
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/../../../../include/c++/4.1.0/backward
 /esat/olympia/install/include
 /esat/olympia/install/lib/gcc/i686-pc-linux-gnu/4.1.0/include
 /usr/include
End of search list.
GNU C++ version 4.1.0 20050530 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 4.1.0 20050530 (experimental).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: a2126b7dadcd5b4079972b9e8125263a
sweepobject.cpp: In member function 'void A::test()':
sweepobject.cpp:15: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.

Comment 10 Andrew Pinski 2005-05-31 23:28:53 UTC

*** Bug 21851 has been marked as a duplicate of this bug. ***

Comment 11 Volker Reichelt 2005-06-01 08:41:39 UTC

Even shorter testcase:

============================================
struct A
{
  int a[4];
  int* operator[](int i) { return &a[i]; }
};

void foo(A a1, A &a2)
{
  a1[1][1]=0;
  for (int i=0; i<4; ++i)
    a2.a[i]=0;
}
============================================

Comment 12 Dorit Naishlos 2005-06-01 13:20:36 UTC

thanks, reproduced.

Comment 13 Dorit Naishlos 2005-06-01 13:22:33 UTC

> Note that if !new_ssa_name, we continue the loop without ever
> adding the PHI argument.  The net result being that we have a
> PHI where PHI_ARG_DEF for one of the PHI's incoming edges is null.
> 
> I'm pretty sure that this can only happen if the result of the
> PHI is not set anywhere in the loop.  

you're right, this is exactly what we have here:

before loop duplication we have:
   m2 = phi <init: m15, latch: m15>

The phi is dead, and has no defs in the loop, which, as you identified, results 
in the fact that it doesn't have a current_def set, and then in the duplicated 
loop we have:
   m24 = phi <init: m15, latch: NULL>

> In that case the PHI
> argument in question should be the same SSA_NAME as the PHI_RESULT
> [ ie, we ultimately end up generating a degenerate phi of the form
> 
> 
>   x_3 = PHI (x_3 (latch edge), x_2 (initial value from entry edge))
> 
> 

Indeed applying the following patch, which does exactly that, solves the 
problem:

Index: tree-vectorizer.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-vectorizer.c,v
retrieving revision 2.91
diff -u -3 -p -r2.91 tree-vectorizer.c
--- tree-vectorizer.c   26 May 2005 18:14:48 -0000      2.91
+++ tree-vectorizer.c   1 Jun 2005 13:11:01 -0000
@@ -1,3 +1,4 @@
+
 /* Loop Vectorization
    Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
    Contributed by Dorit Naishlos <dorit@il.ibm.com>
@@ -321,8 +322,11 @@ slpeel_update_phis_for_duplicate_loop (s

       new_ssa_name = get_current_def (def);
       if (!new_ssa_name)
-        /* Something defined outside of the loop.  */
-        continue;
+       {
+          /* This only happens if there are no definitions
+            inside the loop. use the phi_result in this case.  */
+         new_ssa_name = PHI_RESULT (phi_new);
+       }

       /* An ordinary ssa name defined in the loop.  */
       add_phi_arg (phi_new, new_ssa_name, loop_latch_edge (new_loop));
@@ -566,7 +570,12 @@ slpeel_update_phi_nodes_for_guard1 (edge
       else
         {
           current_new_name = get_current_def (loop_arg);
-          gcc_assert (current_new_name);
+         /* current_def is not available only if the variable does not
+            change inside the loop, in which case we also don't care
+            about recording a current_def for it because we won't be
+            trying to create loop-exit-phis for it.  */
+         if (!current_new_name)
+           continue;
         }



> 
> What I don't know yet is if the problem is really that we haven't
> set up the current def properly (thus causing get_current_def to
> return NULL) or if we just need code to compensate for this
> situation in slpeel_update_phis_for_duplicate_loop.
> 
> Thoughts?
>

I don't know which current_def would make sense to set for this phi, if at all.

It originally was:
   m2 = phi <init: m15, latch: m16>
   m16 = <v_may_def m2>

after t41.alias4 it became:
   m2 = phi <init: m15, latch: m2>

and after t44.store_ccp it got to its current form:
   m2 = phi <init: m15, latch: m15>

The best thing would be to detect such redundant phis and clean them up, and in 
the vectorizer work under the assumption that they don't exit. The code to do 
peeling would be cleaner (not having to consider these special cases), and we 
would generate much less code (see below how many phis we end up generating 
when peeling before and after this loop). By the way, all these garbage phis do 
get eliminated later on, by dce (at t68.cd_dce). Calling dce just before loop 
optimizations or just before the vectorizer also solved the problem. We can 
actually also detect invariant/dead phis at the beginning of the vectorizer (it 
will be pretty much for free cause we examine all phis and their uses anyhow. 
might as well get rid of them). In the meantime, I'll test the patch above.

FYI, when applying the patch above, the resulting code that we generate is as 
shown below:

==========================================
>>>before:

orig_loop:
  m2 = phi<init: m15, latch: m15>


>>>after:

   if C1 goto new_prolog_loop
   else  goto bb1

new_prolog_loop (dup):
   m24 = phi<init: m15, latch: m24>
loop_exit:
   m34 = phi <m24>
   if C2 goto bb1
   else  goto bb3

bb1:
   m33 = phi <m15, m34>
   if C3 goto orig_loop
   else  goto bb2

orig_loop:
  m2 = phi<init: m33, latch: m15>
loop_exit:
  m54 = phi<m15>
  if C4 goto bb2 
  else  goto bb4

bb2:
  m53 = phi<m33, m54>

new_epilog_loop (dup):
  m44 = phi <init: m53, latch: m44>
loop exit:

bb4:

bb3:
==========================================

Comment 14 Giovanni Bajo 2005-06-01 18:08:32 UTC

Please, remember to add both the new and the old testcase to the testsuite.

Comment 15 Dorit Naishlos 2005-06-01 19:14:55 UTC

> Please, remember to add both the new and the old testcase to the testsuite.

patch: http://gcc.gnu.org/ml/gcc-patches/2005-06/msg00110.html

Comment 16 Jeffrey A. Law 2005-06-02 05:22:44 UTC

Subject: Re:  [4.1 regression] ICE:
	-ftree-vectorize, segfault

On Wed, 2005-06-01 at 13:22 +0000, dorit at il dot ibm dot com wrote:

> The best thing would be to detect such redundant phis and clean them up, and in 
> the vectorizer work under the assumption that they don't exit. The code to do 
> peeling would be cleaner (not having to consider these special cases), and we 
> would generate much less code (see below how many phis we end up generating 
> when peeling before and after this loop).
I'd tend to agree that it would be better if these dead PHIs were
cleaned up before the loop optimizer is run -- however, I would 
strongly recommend that the loop optimizer handle this case.
The basic idea being that feeding an optimizer, any optimizer
with sub-optimal code (ie, dead phis, unpropagated constants
and copies, etc etc) should not cause an optimizer to segfault,
abort or generate incorrect code.



Jeff

Comment 17 GCC Commits 2005-06-02 14:52:26 UTC

Subject: Bug 21734

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	dorit@gcc.gnu.org	2005-06-02 14:52:18

Modified files:
	gcc            : ChangeLog tree-vectorizer.c 
	gcc/testsuite  : ChangeLog 
Added files:
	gcc/testsuite/g++.dg/vect: pr21734_1.cc pr21734_2.cc 

Log message:
	PR tree-optimization/21734
	* tree-vectorizer.c (slpeel_update_phis_for_duplicate_loop): Use the
	phi_result when current_def is not available.
	(slpeel_update_phi_nodes_for_guard1): Don't fail if current_def is not
	available.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.8997&r2=2.8998
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-vectorizer.c.diff?cvsroot=gcc&r1=2.92&r2=2.93
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/ChangeLog.diff?cvsroot=gcc&r1=1.5577&r2=1.5578
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/vect/pr21734_1.cc.diff?cvsroot=gcc&r1=NONE&r2=1.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/g++.dg/vect/pr21734_2.cc.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 18 Andrew Pinski 2005-06-02 18:59:04 UTC

Fixed.