Bug 32182 - [4.2 Regression] -fstrict-aliasing optimizations cause constructor not to run for object causing segfault
Summary: [4.2 Regression] -fstrict-aliasing optimizations cause constructor not to run...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.2.1
: P1 normal
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, wrong-code
Depends on:
Blocks:
 
Reported: 2007-06-01 20:40 UTC by Tom Epperly
Modified: 2009-03-30 21:50 UTC (History)
3 users (show)

See Also:
Host: i486-pc-linux-gnu
Target: i486-pc-linux-gnu
Build: i486-pc-linux-gnu
Known to work: 4.3.0
Known to fail: 4.2.5
Last reconfirmed: 2007-06-02 11:35:26


Attachments
tar file containing complete source to reproduce problem (2.05 KB, application/octet-stream)
2007-06-01 20:44 UTC, Tom Epperly
Details
An example involving less casting than the previous one. (1.90 KB, application/octet-stream)
2007-06-01 22:04 UTC, Tom Epperly
Details
A further simplified example showing the problem without any C++ casting (1.85 KB, application/octet-stream)
2007-06-01 22:12 UTC, Tom Epperly
Details
Complete source (even simplier than previous attachments) to reproduce the problem (1.75 KB, application/octet-stream)
2007-06-01 23:24 UTC, Tom Epperly
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Epperly 2007-06-01 20:40:49 UTC
I may have found a situation where GCC's optimizations causes a constructor to be skipped that leads to a crash.  This problem first manifested itself in a program involving well over 100000 lines of code (not including the extra lines from #include'd files).  The initial problem is in code generated by Babel, http://www.llnl.gov/CASC/compnents/, in the runC2Cxx program part of the objarg regression test. After many hours of work, I've reproduced the bug with a program involving only 324 lines.

% wc *.c *.h *.hxx *.cxx
  45  108  861 main.c
  55  136 1174 RefCount.c
  35   59  503 RefCount.h
  78  170 1426 Wrapper.hxx
 111  239 2052 Wrapper.cxx
 324  712 6016 total

I compile these files with the following script:
#!/bin/sh
\rm -f *.o test_aliasing test_noaliasing
gcc-4.2 -g -O2  -c RefCount.c main.c Wrapper.cxx
g++-4.2 -g  -O2   -o test_aliasing RefCount.o main.o Wrapper.o
 
gcc-4.2 -g -O2 -fno-strict-aliasing -c RefCount.c main.c Wrapper.cxx
g++-4.2 -g  -O2 -fno-strict-aliasing  -o test_noaliasing RefCount.o main.o Wrapper.o

./test_noaliasing runs without crashing, and ./test_aliasing crashes in this operator= method:
TestClass &
TestClass::operator =(const TestClass &rhs)
{
  if (d_self != rhs.d_self) {
    if (d_self) {
      /* segfault at next line because d_self wasn't initialized to 0 */
      deleteRef(reinterpret_cast< struct RefCount_t * >(d_self));
    }
    d_self = rhs.d_self;
    if (d_self) {
      addRef(reinterpret_cast< struct RefCount_t * >(d_self));
    }
  }
  return *this;
}
when called from this extern "C" function:
struct Test *
getItem(struct C_Container *cont,
	int ind)
{
  struct Test *result = 0;
  TestClass _local_result;
  try {
    _local_result = cont->d_cont->at(ind); /* crash here */
  }
  catch(...) {
    return result;
  }
  result = _local_result.getIOR();
  if (result) {
    addRef(reinterpret_cast<struct RefCount_t *>(result));
  }
  return  result;
}

In getItem, it appears to have skipped executing empty constructor for _local_result that initializes d_self to 0.

Here is the declaration for TestClass and its super classes.
class BaseClass {
protected:
  void *d_self;
public:
  BaseClass() : d_self(0) {}

  BaseClass(void *ior) : d_self(ior) {}

  ~BaseClass() {
    if (d_self) {
      struct RefCount_t *ref = 
	reinterpret_cast<struct RefCount_t *>(d_self);
      deleteRef(ref);
      d_self = 0;
    }
  }
};


class NextClass : public virtual BaseClass {
public:
  typedef struct Next ior_t;
  NextClass() {}
  NextClass(ior_t *ior);
};


class TestClass : public virtual NextClass {
public:
  typedef struct Test ior_t;
  TestClass() {}
  TestClass(ior_t *ior);
  virtual ~TestClass() { }
  TestClass(const TestClass &src);
  TestClass& operator= (const TestClass &rhs);
  ior_t *getIOR() const { return reinterpret_cast < ior_t *>(d_self); }
  long getNum() const { return reinterpret_cast< Test *>(d_self)->num; }
};

My understanding is the _local_result should be initialized by running with the TestClass::TestClass() constructor which fires after the NextClass::NextClass() constructor which fires after the BaseClass::BaseClass() constructor where d_self is initialized to 0. If I add a printf("Hello\n"); call inside the BaseClass() constructor, it runs and the program doesn't segfault.

The output from running valgrind on the executable supports the idea that d_self is not being initialized.
% valgrind ./test_aliasing
==30651== Memcheck, a memory error detector.
==30651== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==30651== Using LibVEX rev 1732, a library for dynamic binary translation.
==30651== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==30651== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==30651== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==30651== For more details, rerun with: -v
==30651==
==30651== Conditional jump or move depends on uninitialised value(s)
==30651==    at 0x80489E5: TestClass::operator=(TestClass const&) (Wrapper.cxx:30)
==30651==    by 0x8048BE2: getItem (Wrapper.cxx:101)
==30651==    by 0x804887B: main (main.c:35)
==30651==
==30651== Conditional jump or move depends on uninitialised value(s)
==30651==    at 0x80489E9: TestClass::operator=(TestClass const&) (Wrapper.cxx:31)
==30651==    by 0x8048BE2: getItem (Wrapper.cxx:101)
==30651==    by 0x804887B: main (main.c:35)
==30651==
==30651== Use of uninitialised value of size 4
==30651==    at 0x8048716: deleteRef (RefCount.c:52)
==30651==    by 0x80489F2: TestClass::operator=(TestClass const&) (Wrapper.cxx:33)
==30651==    by 0x8048BE2: getItem (Wrapper.cxx:101)
==30651==    by 0x804887B: main (main.c:35)
==30651==
==30651== Process terminating with default action of signal 11 (SIGSEGV)
==30651==  Bad permissions for mapped region at address 0x8048EB4
==30651==    at 0x804871D: deleteRef (RefCount.c:52)
==30651==    by 0x80489F2: TestClass::operator=(TestClass const&) (Wrapper.cxx:33)
==30651==    by 0x8048BE2: getItem (Wrapper.cxx:101)
==30651==    by 0x804887B: main (main.c:35)
==30651==
==30651== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 19 from 1)
==30651== malloc/free: in use at exit: 1,008 bytes in 12 blocks.
==30651== malloc/free: 12 allocs, 0 frees, 1,008 bytes allocated.
==30651== For counts of detected errors, rerun with: -v
==30651== searching for pointers to 12 not-freed blocks.
==30651== checked 100,788 bytes.
==30651==
==30651== LEAK SUMMARY:
==30651==    definitely lost: 0 bytes in 0 blocks.
==30651==      possibly lost: 0 bytes in 0 blocks.
==30651==    still reachable: 1,008 bytes in 12 blocks.
==30651==         suppressed: 0 bytes in 0 blocks.
==30651== Rerun with --leak-check=full to see details of leaked memory.
Segmentation fault

The program doesn't crash when compiled with Intel's 9.0.21 C++ compiler. It doesn't crash when compiled with pre-4.2 GCC versions either.

Based on this evidence, it seems possible that this illustrates a case of over zealous optimization.

Release:	gcc-4.2 (GCC) 4.2.1 20070525 (prerelease) (Debian 4.2-20070525-1)
System: Linux driftcreek 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 GNU/Linux
Architecture: i686
configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --disable-werror --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Comment 1 Tom Epperly 2007-06-01 20:44:35 UTC
Created attachment 13646 [details]
tar file containing complete source to reproduce problem

% sha1sum bug32182.tar.bz2
ce7372671f73d316ad946aede1aad3d4176908bb  bug32182.tar.bz2
Comment 2 Tom Epperly 2007-06-01 20:53:12 UTC
To avoid depending on system #include files, the example has
typedef unsigned int size_t;
hardwired in the code.  This may be an incorrect definition for some platforms.

Oddly enough, if I delete NextClass and make TestClass inherit directly from BaseClass, the program no longer segfaults.

I am not sure if all the features of this example are strictly necessary. I basically created a new program from scratch that has the same characteristics as the original 100+K program, and I added things until it reproduced the same behavior.
Comment 3 Tom Epperly 2007-06-01 21:07:38 UTC
The Babel bug tracking entry corresponding to this GCC issue report is here:
https://www.cca-forum.org/bugs/babel/issue480
Comment 4 Andrew Pinski 2007-06-01 21:16:03 UTC
I am thinking you are volating C++ aliasing rules (though if you convert the static casts over to placement news it will not work either but that is PR 29286).
Comment 5 Tom Epperly 2007-06-01 21:24:37 UTC
In response to comment #4, I may be violating C++ aliasing rules, but I don't see  how that explains the behavior I am seeing and where I am seeing it. How could aliasing analysis give the compiler permission to skip _local_result's constructor? If it assumed that the operator= method didn't read from the left hand side, it might make such a mistake.
Comment 6 Tom Epperly 2007-06-01 22:04:12 UTC
Created attachment 13647 [details]
An example involving less casting than the previous one.
Comment 7 Tom Epperly 2007-06-01 22:12:45 UTC
Created attachment 13648 [details]
A further simplified example showing the problem without any C++ casting
Comment 8 Tom Epperly 2007-06-01 22:14:40 UTC
I've simplified the sample case that demonstrates the problem, and it has *no* casting in C++. In C, it casts the result of malloc to the appropriate pointer type.
Comment 9 Tom Epperly 2007-06-01 23:24:54 UTC
Created attachment 13650 [details]
Complete source (even simplier than previous attachments) to reproduce the problem

This is a smaller program that removes a couple function pointer casts. At this point, I don't think there are any aliasing problems.
Comment 10 Andrew Pinski 2007-06-01 23:57:46 UTC
test_3 works for me on the trunk on i686-linux-gnu.
Comment 11 Tom Epperly 2007-06-02 02:52:38 UTC
I tried test_4.tar.bz2 on my home PC whose details are below. I had to change the definition of size_t to "typedef unsigned long size_t;" in RefCount.c. It failed just like the other system I tested it with.

> ./compile
> ./test_aliasing
Segmentation fault
> ./test_noaliasing
Max value: 1804289383

System: Linux faerun 2.6.21-1-amd64 #1 SMP Sat May 26 17:22:54 CEST 2007 x86_64
GNU/Linux
Architecture: x86_64
Release:       gcc-4.2 (GCC) 4.2.1 20070528 (prerelease) (Debian 4.2-20070528-1
)
host: x86_64-pc-linux-gnu
build: x86_64-pc-linux-gnu
target: x86_64-pc-linux-gnu
configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c
++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/l
ib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-inc
lude-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enabl
e-libstdcxx-debug --enable-mpfr --disable-werror --enable-checking=release --bui
ld=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu

From comment #10, I see the issue is already addressed in the trunk. Will changes to the trunk make it into gcc-4.2.x?
Comment 12 Richard Biener 2007-06-02 11:35:26 UTC
Confirmed.  Actually compiling Wrapper.cxx with -fstrict-aliasing is enough to
trigger the failure.

In getItem() the difference is

 <bb 2>:
-  D.3769 = &_local_result + 4B;
-  this = (struct BaseClass *) D.3769;
-  this->d_self = 0B;
+  D.3821 = &_local_result + 4B;
+  this = (struct BaseClass *) D.3821;
   this = (struct NextClass *) &_local_result;
   iftmp.0 = (int (*__vtbl_ptr_type) (void) *) _ZTT9TestClass[2];
-  this->_vptr.NextClass = iftmp.0;
   _local_result.D.2186._vptr.NextClass = &_ZTV9TestClass[4];
   this->_vptr.NextClass = &_ZTV9TestClass[4];
   D.2853 = at (cont->d_cont, ind) [return slot optimization];

in addItem()

@@ -547,14 +541,12 @@
   tmp.D.2186._vptr.NextClass = &_ZTV9TestClass[4];
   this.9 = (struct NextClass *) &tmp;
   this.9->_vptr.NextClass = (int (*__vtbl_ptr_type) (void) *) _ZTT9TestClass[2]
;
-  D.4100 = &tmp + 4B;
-  this = (struct BaseClass *) D.4100;
-  D.4135 = this->d_self;
-  if (D.4135 != 0B) goto <L21>; else goto <L12>;
+  D.4160 = &tmp + 4B;
+  D.4195 = ((struct BaseClass *) D.4160)->d_self;
+  if (D.4195 != 0B) goto <L21>; else goto <L12>;
 
 <L21>:;
-  deleteRef (D.4135);
-  this->d_self = 0B;
+  deleteRef (D.4195);
 
 <L12>:;
   <<<exception object>>> = save_eptr.48;
@@ -565,14 +557,12 @@
   tmp.D.2186._vptr.NextClass = &_ZTV9TestClass[4];
   this.9 = (struct NextClass *) &tmp;
   this.9->_vptr.NextClass = (int (*__vtbl_ptr_type) (void) *) _ZTT9TestClass[2]
;
-  D.4155 = &tmp + 4B;
-  this = (struct BaseClass *) D.4155;
-  D.4190 = this->d_self;
-  if (D.4190 != 0B) goto <L39>; else goto <L4>;
+  D.4215 = &tmp + 4B;
+  D.4250 = ((struct BaseClass *) D.4215)->d_self;
+  if (D.4250 != 0B) goto <L39>; else goto <L4>;
 
 <L39>:;
-  deleteRef (D.4190);
-  this->d_self = 0B;
+  deleteRef (D.4250);
 

Note that making the inheritance non-virtual and fixing up
TestClass::TestClass(TestClass::ior_t*) to initialize NextClass instead
of BaseClass fixes the problem as well.

So this may be a C++ frontend problem with virtual inheritance or an
invalid testcase as well.

(What happens if you initialize BaseClass from TestClass, but not NextClass
-- if NextClass is default constructed then it will default construct
BaseClass as well?  C++ language lawyer question.)
Comment 13 Richard Biener 2007-06-02 11:45:05 UTC
It looks like 12.6.2/5-6 specify it enough to make the testcase valid.  The
BaseClass is only once initialized by the most derived object initializer
specification.
Comment 14 Jason Merrill 2007-10-04 01:29:54 UTC
Both bug32182 and test_4 work for me with pre-4.3.0 on i686-pc-linux-gnu, so I'm going to set known to work for 4.3.
Comment 15 Mark Mitchell 2007-10-09 19:20:26 UTC
Change target milestone to 4.2.3, as 4.2.2 has been released.
Comment 16 Joseph S. Myers 2008-02-01 16:54:06 UTC
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.
Comment 17 Joseph S. Myers 2008-05-19 20:23:07 UTC
4.2.4 is being released, changing milestones to 4.2.5.
Comment 18 Joseph S. Myers 2009-03-30 21:50:18 UTC
Closing 4.2 branch, fixed in 4.3.