User account creation filtered due to spam.

Bug 11376 - [3.3/3.4 regression] mozilla-1.4 miscompiled
Summary: [3.3/3.4 regression] mozilla-1.4 miscompiled
Status: RESOLVED DUPLICATE of bug 21920
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 3.3
: P2 critical
Target Milestone: 3.3.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 11846 21450 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-06-30 08:16 UTC by Franz Sirl
Modified: 2005-07-23 22:49 UTC (History)
7 users (show)

See Also:
Host: powerpc-unknown-linux-gnu
Target: powerpc-unknown-linux-gnu
Build: powerpc-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Full version of testcase (95.38 KB, text/plain)
2003-06-30 08:19 UTC, Franz Sirl
Details
Simplified testcase (1.44 KB, text/plain)
2003-06-30 08:20 UTC, Franz Sirl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Franz Sirl 2003-06-30 08:16:20 UTC
As described in <http://gcc.gnu.org/ml/gcc-bugs/2003-06/threads.html#02040>,
gcc-3.3/3.4 miscompiles mozilla-1.4 leading to a segfault at startup.
Due to the kind help of Kevin Hendricks there is now a simplified testcase that
shows the bug if compiled with -O2 -fno-exceptions.
It is not clear yet if the code itself is aliasing safe.
Comment 1 Franz Sirl 2003-06-30 08:19:11 UTC
Created attachment 4304 [details]
Full version of testcase

The miscompiled routine is inmContainerGIF::DecodingComplete.
This version needs to be linked against the mozilla libs -lnspr -lplc4 to be
executable.
Comment 2 Franz Sirl 2003-06-30 08:20:47 UTC
Created attachment 4305 [details]
Simplified testcase
Comment 3 Giovanni Bajo 2003-06-30 13:20:21 UTC
The bug cannot be reproduced on i686-pc-cygwin. Anyway, I have some comments on 
the code:

- Are the specializations of nsCOMPtr<> and nsGetterAddRefs<> really necessary 
to trigger the bug? They look to me just manual replacements of the template 
argument. If the are needed, did you try removing the definition of the general 
templates?

- Are all the member functions of the templates really used by the program? 

- This:

      nsDerivedSafe<T>*
      get() const
        {
          return reinterpret_cast< nsDerivedSafe<T>* >(mRawPtr);
        }

is not typesafe and generates undefined behaviour if you do anything with the 
returned pointer BUT casting it back to its original type (nsISupports*).
Comment 4 kevin.hendricks 2003-06-30 13:40:31 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

> - Are the specializations of nsCOMPtr<> and nsGetterAddRefs<> really
> necessary to trigger the bug? They look to me just manual replacements of
> the template argument. If the are needed, did you try removing the
> definition of the general templates?

I believe so yes.  I did try removing the general templates but ran into 
problems trying to compile this.  All of my attempts to recreate this problem 
with a general type did not succeed.

The nsCOMPtr template is some sort of smart pointer used in libxpcom that I am 
not sure I understand.  The specialization seems to be on handling the 
nsISupports xpcom interface types.  The real test problem is huge and 
involved numerous libraries.  I just tried to help out by narrowing down the 
problem to a standalone testcase and I had to bascially cut and paste the  
nsCOMPtr templates in order to see the problem at all.

I was afraid if I did too much hacking away of that template class and the 
getter routines (since I did not udnerstand them well)  I would introduce a 
new problem.


> - Are all the member functions of the templates really used by the program?

I am simply not sure.   I will try to hack more away when I have more time 
later today but I was afraid to do so until I understood what was actaully 
going on and how the variaous templates interact.  This is pretty much out of 
my league since I just dabble in C++.  My single goal was to recreate the 
problem in a standalone test case  and try not to introduce any new problems 
when doing so.

> - This:
>
>       nsDerivedSafe<T>*
>       get() const
>         {
>           return reinterpret_cast< nsDerivedSafe<T>* >(mRawPtr);
>         }
> is not typesafe and generates undefined behaviour if you do anything with
> the returned pointer BUT casting it back to its original type
> (nsISupports*).

Kevin

Comment 5 kevin.hendricks 2003-06-30 13:53:59 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

> > - Are the specializations of nsCOMPtr<> and nsGetterAddRefs<> really
> > necessary to trigger the bug?

I just rechecked this and they are not.  I was able to remove both of those 
specializations and I still see the same problem.

Kevin

Comment 6 kevin.hendricks 2003-06-30 14:04:47 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

Here is an even simpler test case with the specialized templates removed.
cat tcase2a.cxx

#include <cstdlib>
#include <cstdio>

typedef unsigned int nsrefcnt;
typedef unsigned int nsresult;

class nsISupports {
public:
  virtual nsrefcnt AddRef(void) = 0;
  virtual nsrefcnt Release(void) = 0;
};


class mc0 : public nsISupports {
public:
  int hello;
  int setHello(int j) { hello = j; }
  int SetMutable(int k) { setHello(k); }
  nsrefcnt AddRef(void) { hello++; }
  nsrefcnt Release(void) { hello--; }
};



template <class T>
class nsDerivedSafe : public T
  {
    private:
      nsrefcnt AddRef(void);
      nsrefcnt Release(void);
    protected:
      nsDerivedSafe();
  };


template <class T>
nsrefcnt
nsDerivedSafe<T>::AddRef()
  {
    return 0;
  }


template <class T>
nsrefcnt
nsDerivedSafe<T>::Release()
  {
    return 0;
  }



class nsCOMPtr_base
  {
    public:

      nsCOMPtr_base( nsISupports* rawPtr = 0 )
          : mRawPtr(rawPtr)
        {
        }


      ~nsCOMPtr_base();

      void assign_with_AddRef( nsISupports* );
      void** begin_assignment();

    protected:
      nsISupports* mRawPtr;

      void
      assign_assuming_AddRef( nsISupports* newPtr )
        {
          nsISupports* oldPtr = mRawPtr;
          mRawPtr = newPtr;
          if ( oldPtr )
            (oldPtr)->Release();
        }
  };


nsCOMPtr_base::~nsCOMPtr_base()
{
    if ( mRawPtr )
        (mRawPtr)->Release();
}


void
nsCOMPtr_base::assign_with_AddRef( nsISupports* rawPtr )
{
    if ( rawPtr )
        (rawPtr)->AddRef();
    assign_assuming_AddRef(rawPtr);
}


void**
nsCOMPtr_base::begin_assignment()
{
    assign_assuming_AddRef(0);
    return reinterpret_cast< void** >(&mRawPtr);
}



template <class T>
class nsCOMPtr : private nsCOMPtr_base
  {
    public:
      typedef T element_type;
      nsCOMPtr()
            : nsCOMPtr_base(0)
        {
        }

      nsCOMPtr( const nsCOMPtr<T>& aSmartPtr )
            : nsCOMPtr_base(aSmartPtr.mRawPtr)

        {
          if ( mRawPtr )
            (mRawPtr)->AddRef();
        }

      nsCOMPtr( T* aRawPtr )
            : nsCOMPtr_base(aRawPtr)

        {
          if ( mRawPtr )
            (mRawPtr)->AddRef();
        }


      nsCOMPtr<T>&
      operator=( const nsCOMPtr<T>& rhs )

        {
          assign_with_AddRef(rhs.mRawPtr);
          return *this;
        }

      nsCOMPtr<T>&
      operator=( T* rhs )

        {
          assign_with_AddRef(rhs);
          return *this;
        }


      nsDerivedSafe<T>*
      get() const
        {
          return reinterpret_cast< nsDerivedSafe<T>* >(mRawPtr);
        }

      operator nsDerivedSafe<T>*() const
        {
          return get();
        }

      nsDerivedSafe<T>*
      operator->() const
        {
          return get();
        }

      nsCOMPtr<T>*
      get_address()
        {
          return this;
        }

      const nsCOMPtr<T>*
      get_address() const
        {
          return this;
        }


    public:
      nsDerivedSafe<T>&
      operator*() const
        {
          return *get();
        }


      T**
      StartAssignment()
        {
          return reinterpret_cast< T** >(begin_assignment());
        }
  };



template <class T>
inline
nsCOMPtr<T>*
address_of( nsCOMPtr<T>& aPtr )
  {
    return aPtr.get_address();
  }



template <class T>
inline
const nsCOMPtr<T>*
address_of( const nsCOMPtr<T>& aPtr )
  {
    return aPtr.get_address();
  }



template <class T>
class nsGetterAddRefs
  {
    public:
      explicit
      nsGetterAddRefs( nsCOMPtr<T>& aSmartPtr )
          : mTargetSmartPtr(aSmartPtr)
        {
        }

      operator void**()
        {
          return reinterpret_cast< void** 
>(mTargetSmartPtr.StartAssignment());
        }

      operator nsISupports**()
        {
          return reinterpret_cast< nsISupports** 
>(mTargetSmartPtr.StartAssignment());
        }

      operator T**()
        {
          return mTargetSmartPtr.StartAssignment();
        }

      T*&
      operator*()
        {
          return *(mTargetSmartPtr.StartAssignment());
        }

    private:
      nsCOMPtr<T>& mTargetSmartPtr;
  };


template <class T>
inline
nsGetterAddRefs<T>
getter_AddRefs( nsCOMPtr<T>& aSmartPtr )
  {
    return nsGetterAddRefs<T>(aSmartPtr);
  }


int
main(int argc, char** argv)
{

    nsISupports* aFrames[1];
    nsCOMPtr<mc0> myframe;
    myframe = new mc0();
    aFrames[0]=static_cast<nsISupports*> (myframe);

    nsCOMPtr<mc0> currentFrame;
    mc0** retval = getter_AddRefs(currentFrame);

    nsISupports* _elem = aFrames[0];
    *retval = static_cast< mc0* >(_elem);

    currentFrame->SetMutable(0);

    fprintf(stderr,"done\n"); fflush(stderr);
    return 0;
}


Hope this helps.

Kevin

Comment 7 kevin.hendricks 2003-06-30 14:35:25 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

Okay I took a shot at removing all of the functions/methods that I felt were 
not involved and have a simpler test case.

cat tcase2b.cxx

#include <cstdlib>
#include <cstdio>

typedef unsigned int nsrefcnt;
typedef unsigned int nsresult;

class nsISupports {
public:
  virtual nsrefcnt AddRef(void) = 0;
  virtual nsrefcnt Release(void) = 0;
};


class mc0 : public nsISupports {
public:
  int hello;
  int setHello(int j) { hello = j; }
  int SetMutable(int k) { setHello(k); }
  nsrefcnt AddRef(void) { hello++; }
  nsrefcnt Release(void) { hello--; }
};



template <class T>
class nsDerivedSafe : public T
  {
    private:
      nsrefcnt AddRef(void);
      nsrefcnt Release(void);
    protected:
      nsDerivedSafe();
  };


template <class T>
nsrefcnt
nsDerivedSafe<T>::AddRef()
  {
     return 0;
  }


template <class T>
nsrefcnt
nsDerivedSafe<T>::Release()
  {
    return 0;
  }



class nsCOMPtr_base
  {
    public:

      nsCOMPtr_base( nsISupports* rawPtr = 0 )
          : mRawPtr(rawPtr)
        {
        }

      ~nsCOMPtr_base();

      void assign_with_AddRef( nsISupports* );
      void** begin_assignment();

    protected:
      nsISupports* mRawPtr;

      void
      assign_assuming_AddRef( nsISupports* newPtr )
        {
          nsISupports* oldPtr = mRawPtr;
          mRawPtr = newPtr;
          if ( oldPtr )
            (oldPtr)->Release();
        }
  };


nsCOMPtr_base::~nsCOMPtr_base()
{
    if ( mRawPtr )
        (mRawPtr)->Release();
}


void
nsCOMPtr_base::assign_with_AddRef( nsISupports* rawPtr )
{
    if ( rawPtr )
        (rawPtr)->AddRef();
    assign_assuming_AddRef(rawPtr);
}


void**
nsCOMPtr_base::begin_assignment()
{
    assign_assuming_AddRef(0);
    return reinterpret_cast< void** >(&mRawPtr);
}



template <class T>
class nsCOMPtr : private nsCOMPtr_base
  {
    public:
      typedef T element_type;
      nsCOMPtr()
            : nsCOMPtr_base(0)
        {
        }

      nsCOMPtr( T* aRawPtr )
           : nsCOMPtr_base(aRawPtr)
        {
          if ( mRawPtr )
            (mRawPtr)->AddRef();
        }

      nsCOMPtr<T>&
      operator=( const nsCOMPtr<T>& rhs )
        {
          assign_with_AddRef(rhs.mRawPtr);
          return *this;
        }

      nsDerivedSafe<T>*
      get() const
        {
          return reinterpret_cast< nsDerivedSafe<T>* >(mRawPtr);
        }

      operator nsDerivedSafe<T>*() const
        {
          return get();
        }

      nsDerivedSafe<T>*
      operator->() const
        {
         return get();
        }

    public:

      T**
      StartAssignment()
        {
          return reinterpret_cast< T** >(begin_assignment());
        }
  };


template <class T>
class nsGetterAddRefs
  {
    public:
      explicit
      nsGetterAddRefs( nsCOMPtr<T>& aSmartPtr )
          : mTargetSmartPtr(aSmartPtr)
        {
        }

          operator T**()
        {
          return mTargetSmartPtr.StartAssignment();
        }

    private:
      nsCOMPtr<T>& mTargetSmartPtr;
  };


template <class T>
inline
nsGetterAddRefs<T>
getter_AddRefs( nsCOMPtr<T>& aSmartPtr )
  {
    return nsGetterAddRefs<T>(aSmartPtr);
  }



int
main(int argc, char** argv)
{

    nsISupports* aFrames[1];
    nsCOMPtr<mc0> myframe;
    myframe = new mc0();
    aFrames[0]=static_cast<nsISupports*> (myframe);


    nsCOMPtr<mc0> currentFrame;
    mc0** retval = getter_AddRefs(currentFrame);

    nsISupports* _elem = aFrames[0];
    *retval = static_cast< mc0* >(_elem);

    currentFrame->SetMutable(0);

    fprintf(stderr,"done\n"); fflush(stderr);
    return 0;
}


[kbhend@base1 huh]$ gcc -dumpversion
3.3

[kbhend@base1 huh]$ g++ -O2 -fno-exceptions -o tcase2b tcase2b.cxx
[kbhend@base1 huh]$ ./tcase2b
Segmentation fault

[kbhend@base1 huh]$ g++ -O2 -o tcase2b tcase2b.cxx
[kbhend@base1 huh]$ ./tcase2b
Segmentation fault

[kbhend@base1 huh]$ g++ -O2 -fno-exceptions -fno-strict-aliasing -o tcase2b 
tcase2b.cxx
[kbhend@base1 huh]$ ./tcase2b
done

Hope this helps,

Kevin

Comment 8 Mark Mitchell 2003-07-02 02:08:32 UTC
I cannot reproduce this bug with the 20030701 versions of 3.3.1 or 3.4 using the
last test case in the audit trail on a Red Hat 8 system.  Can anyone else?
Comment 9 franz.sirl-kernel 2003-07-02 05:41:47 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4
  miscompiled

At 04:08 02.07.2003, mmitchel at gcc dot gnu dot org wrote:
>PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
>
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11376
>
>
>
>------- Additional Comments From mmitchel at gcc dot gnu dot 
>org  2003-07-02 02:08 -------
>I cannot reproduce this bug with the 20030701 versions of 3.3.1 or 3.4 
>using the
>last test case in the audit trail on a Red Hat 8 system.  Can anyone else?

Mark, so far it has only be reproducible on powerpc-linux, can you try it 
on entropy? It's up again and there I can reproduce it with both the 
installed gcc-3_3-rhl-branch compiler and the current gcc-3_3-branch:

[fsirl@entropy:~]$ g++ -O2 -fno-exceptions ~fsirl/bug-mozilla-5.cxx
[fsirl@entropy:~]$ ./a.out
Segmentation fault
[fsirl@entropy:~]$ ~fsirl/obj/gcc33/gcc/g++ -B ~fsirl/obj/gcc33/gcc/
-isystem /usr/include/c++/3.3/ -isystem /usr/include/c++/3.3/ppc-linux/ -L
~fsirl/obj/gcc33/ppc-linux/libstdc++-v3/src/.libs/ -O2 -fno-exceptions
~fsirl/bug-mozilla-5.cxx
[fsirl@entropy:~]$ ./a.out
Segmentation fault

Franz.

Comment 10 Giovanni Bajo 2003-07-07 15:46:34 UTC
I believe the code is not legal. In main(), getter_addRefs calls 
nsGetterAddRefs<mc0> constructor, which calls nsComPtr<mc0>::StartAssignment. 

This function calls begin_assignment which casts &mRawPtr (whose type is 
nsiSupports**) to void** through reinterpret_cast. Then, this void** is casted 
to mc0** through reinterpret_cast. Now, mc0 is derived from nsiSupports. In 
short, this looks like:

struct A {};
struct B : A {};
A* a;

void** pv = reinterpret_cast<void**>(&a);
B** pb = reinterpret_cast<B**>(pv);
*pb = new B;
assert (typeid(*a) == typeid(B));

I believe this violates ISO C++ aliasing rules, but I don't have a standard 
handy to double-check now. 

Kevin, can you please confirm that my snippet follows the code correctly?
Nathan, do you think the code is legal?
Comment 11 kevin.hendricks 2003-07-07 16:11:19 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

This is basically what my first shot at a simpler test program looked like but 
I could never recreate the problem with code like this and ended up having to 
pull in the nsCOMPtr template and related support code to recreate the 
problem.

I will try building this case up to see if it can recreate with this approach.

Kevin

 > This function calls begin_assignment which casts &mRawPtr (whose type is
> nsiSupports**) to void** through reinterpret_cast. Then, this void** is
> casted to mc0** through reinterpret_cast. Now, mc0 is derived from
> nsiSupports. In short, this looks like:
>
> struct A {};
> struct B : A {};
> A* a;
>
> void** pv = reinterpret_cast<void**>(&a);
> B** pb = reinterpret_cast<B**>(pv);
> *pb = new B;
> assert (typeid(*a) == typeid(B));
>
> I believe this violates ISO C++ aliasing rules, but I don't have a standard
> handy to double-check now.
>
> Kevin, can you please confirm that my snippet follows the code correctly?
> Nathan, do you think the code is legal?
>
>
>
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.

Comment 12 kevin.hendricks 2003-07-08 00:13:59 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

I could not recreate the bug with anything like the code you showed.

So I went back to the last testcase I posted and removed all of 
the templates to make it easier to see what is going on and to further
simplify things.

I found something very strange (to me).

[kbhend@base1 huh]$ g++ -O2 -o tcase3 tcase3.cxx
[kbhend@base1 huh]$ ./tcase3
Segmentation fault


[kbhend@base1 huh]$ g++ -O2 -DALL_INTERNAL -o tcase3 tcase3.cxx
[kbhend@base1 huh]$ ./tcase3
done



All ALL_INTERNAL does is control whether one method is defined 
internal to the class defintion or external to it.

This makes no sense to me but I hope it might for someone 
who knows C++ well.

Here is the tcase3.cxx

#include <cstdlib>
#include <cstdio>

class Iface {
public:
  virtual int Add(void) = 0;
};


class M : public Iface {
public:
  int i;
  int Add(void) { i++; return i; }
  int SetMutable(int k) { i = k; }
};


class Safe : public M {
private:
  int Add(void) { return 0; };
protected:
  Safe ();
};


class Bbase {
public:
  Bbase( Iface * rawPtr = 0 ) { a = rawPtr; }
  void assign( Iface * q ) { a = q; }
#ifdef ALL_INTERNAL
  void ** begin_assign() { assign(0); return reinterpret_cast< void **> (&a); }
#else
  void ** begin_assign();
#endif
protected:
  Iface * a;
};

#ifndef ALL_INTERNAL
void ** Bbase::begin_assign()
{
   assign(0);
   return reinterpret_cast< void **> (&a);
}
#endif


class B : private Bbase {
public:
  B(): Bbase(0) {}
  B( M* ptr ) : Bbase(ptr) {}
  M** StartAssign() { return reinterpret_cast< M**> (begin_assign()); }
  Safe* get () const { return reinterpret_cast< Safe* > (a); }
  operator Safe*() const { return get(); }
  Safe* operator->() const { return get(); }
};


class C {
public:
  C( B& bp) : mTarget(bp) {};
  operator M** () { return mTarget.StartAssign(); }
private:
  B& mTarget;
};


inline  C
getter( B & bp) { return C(bp); }


int
main(int argc, char** argv)
{

  Iface * p;
    B b1;
    b1 = new M();
    p=static_cast< Iface* >(b1);
    B b2;
    M** retval = getter(b2);
    *retval = static_cast< M* >(p);
    b2->SetMutable(0);
    fprintf(stderr,"done\n"); fflush(stderr);
    return 0;
}



Hope this tells someone something.

Kevin


Comment 13 Andrew Pinski 2003-07-08 00:22:19 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled


On Monday, Jul 7, 2003, at 20:14 US/Eastern, kevin dot hendricks at 
sympatico dot ca wrote:

> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* 
> gcc-bugs@gcc.gnu.org.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11376
>
>
>
> ------- Additional Comments From kevin dot hendricks at sympatico dot 
> ca  2003-07-08 00:13 -------
> Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled
>
> Hi,
>
> I could not recreate the bug with anything like the code you showed.
>
> So I went back to the last testcase I posted and removed all of
> the templates to make it easier to see what is going on and to further
> simplify things.
>
> I found something very strange (to me).
>
> [kbhend@base1 huh]$ g++ -O2 -o tcase3 tcase3.cxx
> [kbhend@base1 huh]$ ./tcase3
> Segmentation fault
>
>
> [kbhend@base1 huh]$ g++ -O2 -DALL_INTERNAL -o tcase3 tcase3.cxx
> [kbhend@base1 huh]$ ./tcase3
> done
>
>
>
> All ALL_INTERNAL does is control whether one method is defined
> internal to the class defintion or external to it.
>
> This makes no sense to me but I hope it might for someone
> who knows C++ well.

It makes sense to me, gcc is inlining begin_assign into StartAssign
and StartAssign into operator M** and operator M** into main under
ALL_INTERNAL while it under notdef ALL_INTERNAL, it does not inline 
begin_assign.
This still looks like an aliasing problem.

Thanks,
Andrew Pinski

>
> Here is the tcase3.cxx
>
> #include <cstdlib>
> #include <cstdio>
>
> class Iface {
> public:
>   virtual int Add(void) = 0;
> };
>
>
> class M : public Iface {
> public:
>   int i;
>   int Add(void) { i++; return i; }
>   int SetMutable(int k) { i = k; }
> };
>
>
> class Safe : public M {
> private:
>   int Add(void) { return 0; };
> protected:
>   Safe ();
> };
>
>
> class Bbase {
> public:
>   Bbase( Iface * rawPtr = 0 ) { a = rawPtr; }
>   void assign( Iface * q ) { a = q; }
> #ifdef ALL_INTERNAL
>   void ** begin_assign() { assign(0); return reinterpret_cast< void 
> **> (&a); }
> #else
>   void ** begin_assign();
> #endif
> protected:
>   Iface * a;
> };
>
> #ifndef ALL_INTERNAL
> void ** Bbase::begin_assign()
> {
>    assign(0);
>    return reinterpret_cast< void **> (&a);
> }
> #endif
>
>
> class B : private Bbase {
> public:
>   B(): Bbase(0) {}
>   B( M* ptr ) : Bbase(ptr) {}
>   M** StartAssign() { return reinterpret_cast< M**> (begin_assign()); }
>   Safe* get () const { return reinterpret_cast< Safe* > (a); }
>   operator Safe*() const { return get(); }
>   Safe* operator->() const { return get(); }
> };
>
>
> class C {
> public:
>   C( B& bp) : mTarget(bp) {};
>   operator M** () { return mTarget.StartAssign(); }
> private:
>   B& mTarget;
> };
>
>
> inline  C
> getter( B & bp) { return C(bp); }
>
>
> int
> main(int argc, char** argv)
> {
>
>   Iface * p;
>     B b1;
>     b1 = new M();
>     p=static_cast< Iface* >(b1);
>     B b2;
>     M** retval = getter(b2);
>     *retval = static_cast< M* >(p);
>     b2->SetMutable(0);
>     fprintf(stderr,"done\n"); fflush(stderr);
>     return 0;
> }
>
>
>
> Hope this tells someone something.
>
> Kevin
>
>

Comment 14 kevin.hendricks 2003-07-08 00:59:48 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

Thanks for explaining that.  Now that I am aware of which parts
of the program the code generation is sensitive to, I have been able to take 
the test case and hack away the C and Safe classes and remove the getter() 
function and still see the problem.

Here is yet again a smaller testcase and my last so I hope this helps.

I am still not sure if this is an aliasing problem or not.

Perhaps the developer who wrote the nsCOMPtr template class for 
Netscape / Mozilla can provide some insight into this problem?

cat tcase4.cxx

#include <cstdlib>
#include <cstdio>

class Iface {
public:
  virtual int Add(void) = 0;
};


class M : public Iface {
public:
  int i;
  int Add(void) { i++; return i; }
  int SetMutable(int k) { i = k; }
};


class Bbase {
public:
  Bbase( Iface * rawPtr = 0) { a = rawPtr; }
  void assign( Iface * q ) { a = q; }
#ifdef ALL_INTERNAL
  void ** begin_assign()
  {
     assign(0);
     return reinterpret_cast< void **> (&a);
  }
#else
  void ** begin_assign();
#endif
protected:
  Iface * a;
};

#ifndef ALL_INTERNAL
void ** Bbase::begin_assign()
{
   assign(0);
   return reinterpret_cast< void **> (&a);
}
#endif


class B : private Bbase {
public:
  B(): Bbase(0) {}
  B( M* ptr ) : Bbase(ptr) {}
  M** StartAssign() { return reinterpret_cast< M**> (begin_assign()); }
  M* get () const { return reinterpret_cast< M* > (a); }
  operator M*() const { return get(); }
  M* operator->() const { return get(); }
};


int
main(int argc, char** argv)
{
    B b1 = new M();
    Iface * p=static_cast< Iface* >(b1);
    B b2;
    M** retval = b2.StartAssign();
    *retval = static_cast< M* >(p);
    b2->SetMutable(0);
    fprintf(stderr,"done\n"); fflush(stderr);
    return 0;
}

Comment 15 Wolfgang Bangerth 2003-07-08 01:21:02 UTC
As Andrew said, inlining might be different (functions defined in-class
have an implicit "inline" attribute, which is not the case if a function
is defined out-of-class). You may see whether you can trigger the abort
in the out-of-class case when you pass -finline-functions to gcc, which
makes all functions eligible for inlining. Alternatively, use -O3, which
implies -finline-functions.

As a hint for further testcase reduction: make all classes structs, and
remove the public/private/protected markers. Then try to do some of
the inlining by hand, by replacing the use of accessor functions by
direct accesses of the respective values, and subsequent removal of
the accessor functions. By looking at the code, it's obvious that you are
very close to the hailed one-page-limit, at which everything suddenly
becomes clear. (I'd like to help here, but can't for lack of a respective
platform...)

W.
Comment 16 kevin.hendricks 2003-07-08 02:05:00 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

> As a hint for further testcase reduction: make all classes structs, and
> remove the public/private/protected markers. Then try to do some of
> the inlining by hand, by replacing the use of accessor functions by
> direct accesses of the respective values, and subsequent removal of
> the accessor functions.

Okay- following along those lines we now get tcase5.cxx

#include <cstdio>
struct A {
  virtual int Setit(int k) = 0;
};

struct B : A {
    int i;
    int Setit(int k) { i = k; }
};

struct SP {
    A * a;
    void ** begin_assign();
};
void ** SP::begin_assign() { return reinterpret_cast< void **> (&a); }

int
main(int argc, char** argv)
{
    SP p;
    B** retval = reinterpret_cast<B**> (p.begin_assign());
    *retval = new B();
    (p.a)->Setit(0);
    fprintf(stderr,"done\n"); fflush(stderr);
    return 0;
}

[kbhend@base1 huh]$ g++ -O2 -o tcase5 tcase5.cxx
[kbhend@base1 huh]$ ./tcase5
Segmentation fault


[kbhend@base1 huh]$ g++ -O2 -fno-strict-aliasing -o tcase5 tcase5.cxx
[kbhend@base1 huh]$ ./tcase5
done

[kbhend@base1 huh]$ g++ -O2 -finline-functions -o tcase5 tcase5.cxx
[kbhend@base1 huh]$ ./tcase5
done

Kevin

Comment 17 kevin.hendricks 2003-07-08 02:33:07 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

Just to add insult to injury here ...  


[kbhend@base1 huh]$ cat tcase6.cxx
struct A {
  virtual int Setit(int k) = 0;
};

struct B : A {
    int Setit(int k) { int i; i = k; }
};

A* a;
void ** begin_assign() { return reinterpret_cast< void **> (&a); }

int
main(int argc, char** argv)
{
    B** ppB = reinterpret_cast<B**> (begin_assign());
    *ppB = new B();
    a->Setit(0);
    return 0;
}

[kbhend@base1 huh]$ g++ -O2 -o tcase6 tcase6.cxx
[kbhend@base1 huh]$ ./tcase6
Segmentation fault

[kbhend@base1 huh]$ g++ -O2 -finline-functions -o tcase6 tcase6.cxx
[kbhend@base1 huh]$ ./tcase6
[kbhend@base1 huh]$ 

[kbhend@base1 huh]$ g++ -O2 -fno-strict-aliasing -o tcase6 tcase6.cxx
[kbhend@base1 huh]$ ./tcase6
[kbhend@base1 huh]$

Comment 18 Mark Mitchell 2003-07-08 05:36:34 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

On Mon, 2003-07-07 at 19:33, kevin dot hendricks at sympatico dot ca
wrote:
> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11376
> 
> 
> 
> ------- Additional Comments From kevin dot hendricks at sympatico dot ca  2003-07-08 02:33 -------
> Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled
> 
> Hi,
> 
> Just to add insult to injury here ...  

Thanks for cutting this down; now I can see easily that this test case
is invalid C++.

> [kbhend@base1 huh]$ cat tcase6.cxx
> struct A {
>   virtual int Setit(int k) = 0;
> };
> 
> struct B : A {
>     int Setit(int k) { int i; i = k; }
> };
> 
> A* a;
> void ** begin_assign() { return reinterpret_cast< void **> (&a); }
> 
> int
> main(int argc, char** argv)
> {
>     B** ppB = reinterpret_cast<B**> (begin_assign());
>     *ppB = new B();
>     a->Setit(0);
>     return 0;
> }

The problem is that "*ppB" modifies a "B*".  Unfortunately, the location
pointed to by "ppB" is actually an "A*".  That's invalid.  (It's true
that in C++ you can assign a "B*" to an "A*", but that's not the same as
using a "B**" to point at at an "A*" and then writing into the "B**".)

Simplifying your code even further gives:

  A* a;
  B** ppB = (B**) &a;
  *ppB = new B;

which makes it clearer that the code is invalid.

This code is very much analogous to:

  long l;
  int *ip = (int *) &l;
  *ip = 3;

on a machine where "int" and "long" are the same width.  It seems like
it should work, but it's not valid.

Comment 19 janis187 2003-07-08 16:32:08 UTC
The regression in PR 11376 was introduced or exposed by this patch:

--- gcc/gcc/cp/ChangeLog ---

2002-03-11  Dan Nicolaescu  <dann@ics.uci.edu>
            Daniel Berlin  <dan@dberlin.org>

        * cp-lang.c (ok_to_generate_alias_set_for_type): New function.
        (cxx_get_alias_set): Use it.

--- gcc/gcc/ChangeLog ---

2002-03-11  Dan Nicolaescu  <dann@ics.uci.edu>
            Daniel Berlin  <dan@dberlin.org>

        C++ alias analysis improvement.
        * alias.c (record_component_aliases): Record aliases for base
        classes too.

The regression hunt used the test case from Additional Comment #7 in           
the PR, compiled with -O2 on powerpc-unknown-linux-gnu.  A lot has
happened with this PR since I started the hunt yesterday, but perhaps
this information is still useful.
Comment 20 Daniel Berlin 2003-07-08 17:18:27 UTC
This is kind of humorous.
Mark first points out it's an aliasing violation, *then* we discover it was the 
patch that added TBAA for simple C++ classes that broke the code.
Usually it's the other way around.
However, to add something useful, the fact that it was exposed by this patch 
lends credence to Mark's point that it's an aliasing violation.

Unless someone doesn't think Mark is correct, this bug should be closed, as the 
code is invalid C++.
Comment 21 kevin.hendricks 2003-07-08 17:34:54 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

> Unless someone doesn't think Mark is correct, this bug should be 
> closed, as the
> code is invalid C++.

If it is an aliasing issue, changing the test case in this way

struct A {
   virtual int Setit(int k) = 0;
};

struct B : A {
     int Setit(int k) { int i; i = k; }
};

A* a;
void ** begin_assign() { return reinterpret_cast< void **> (&a); }

int
main(int argc, char** argv)
{

#if 0
     B** ppB = reinterpret_cast<B**> (begin_assign());
     *ppB = new B();
#else
     A** ppB = reinterpret_cast<A**> (begin_assign());
     *ppB = new B();
#endif

     a->Setit(0);
     return 0;
}

Since *ppB will be a A* which can hold a B* legally, it should prevent
any aliasing problem.

I tested this case and sure enough it compiles and runs at
all optimization levels just fine.

So it sure looks like an aliasing issue to me.

I agree  this should be closed and a bug against Mozilla nsCOMPtr<> 
code should be opened.

Franz?  What do you think?


Kevin


Comment 22 Franz Sirl 2003-07-08 17:50:11 UTC
Well, since the generated assembly looks now quite different than for the full 
testcase, I would just like to hear another opinion whether the full testcase 
was correctly minimized. If yes, I agree that this bug shoud be closed and one 
against mozilla should be opened. 
Hmm, Kevin, could you create a "fixed" (like you just did for the simplified 
one) full testcase (diff?)? If that fixes the full testcase I'll be happy :-). 
 
 
Comment 23 kevin.hendricks 2003-07-08 18:29:47 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi Franz,

Please try changing inlinedGetFrameAt() in your imgContainerGIF.ii

To look like the following (a rough hack job)

   inline nsresult inlinedGetFrameAt(PRUint32 index, gfxIImageFrame 
**_retval) {
     nsISupports ** rp = (nsISupports**) _retval;
     nsISupports *_elem = mFrames.ElementAt(index);
     if (!_elem) return ((nsresult) 0x80004005L);
     *rp = _elem;
     return 0;
}

This casts back the gfxIImageFrame** _retval to what is really is which 
is a nsISupports**

I tried this with your test case under gcc 3.3 and get the following:

[kbhend@base1 huh]$ g++ -O2 -fno-exceptions -o junk imgContainerGIF.ii 
-L/src1/mozilla-1.3/obj-ppc/dist/bin -lxpcom -lgkgfx
[kbhend@base1 huh]$ ./junk
rv 0
done
[kbhend@base1 huh]$ gcc -dumpversion
3.3

So it fixes the problem (no segfault!)

This is not the right fix.  The nsCOMPtr<> template type and the
many of the methods should be passing around nsISupports** instead
of gfxIImageFrame** in many places.

But this supports Mark's claim that the code is not aliasing safe.

Kevin


On Tuesday, July 8, 2003, at 01:50 PM, sirl at gcc dot gnu dot org 
wrote:

> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* 
> gcc-bugs@gcc.gnu.org.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11376
>
>
>
> ------- Additional Comments From sirl at gcc dot gnu dot org  
> 2003-07-08 17:50 -------
> Well, since the generated assembly looks now quite different than for 
> the full
> testcase, I would just like to hear another opinion whether the full 
> testcase
> was correctly minimized. If yes, I agree that this bug shoud be closed 
> and one
> against mozilla should be opened.
> Hmm, Kevin, could you create a "fixed" (like you just did for the 
> simplified
> one) full testcase (diff?)? If that fixes the full testcase I'll be 
> happy :-).
>
>
>
>
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
>
----
Kevin B. Hendricks
Associate Professor of Operations and Information Technology
Richard Ivey School of Business, University of Western Ontario
London Ontario, CANADA  N6A 3K7
khendricks@ivey.uwo.ca

Comment 24 kevin.hendricks 2003-07-08 19:07:41 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

One question ... if the original mozilla code is not strict-aliasing 
safe
as seems to be the case, why does this problem only impact ppc linux 
gcc 3.3
and not other ppc platforms nor it seems any other platform?

In particular x86 seems to be immune to most strict-aliasing violations
(the JDK,  OpenOffice.,org, and now Mozilla  are not aliasing safe) but 
the
bug always seems to miss x86 but ends up biting ppc.

Does that mean that fewer instructions are actually reschedule-able
based on this optimization under x86?  If so, what benefit actually
occurs from assuming strict-aliasing on most platforms?

Is there any measurable improvement on any platform?

Finding these types of violations is extremely time consuming and
it is hard to convince a non-impacted developer that his/her code is
incorrect (they keep saying it runs fine on x86 so it must be a ppc
specific problem).

It is simply hard at first glance to see that:

  if B's parent is A then
     A* can hold a B* and be completely type legal
     B* can hold an A* (via a downcast)
     Both A* and B* can hold a B*  (and be completely type)

But
*(B**) is not the same as a *(A**) from the point of view of aliasing.

Unfortunately, without some form of warning about when the
optimization is used it is next to impossible to find and fix all
of the occurrences.

Thanks,

Kevin

Comment 25 Franz Sirl 2003-07-08 19:13:32 UTC
Confirmed by inspecting the generated assembly. 
 
Comment 26 Mark Mitchell 2003-07-08 19:19:07 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled


> Unfortunately, without some form of warning about when the
> optimization is used it is next to impossible to find and fix all
> of the occurrences.

There's no easy way to do that.  The C/C++ standards say that casting
"X**" to "Y**" is OK -- the problem is *using* that pointer.  And by
that point we've lost track of the fact that it was cast.

I used to be an engineer at CenterLine; our tools
(CodeCenter/ObjectCenter/AcquaProva) could catch these kinds of bugs
dynamically, and give you a sensible "your code isn't right" message
when the program ran.  Unfortunately, CenterLine is largely defunct; I'm
not sure if you can still get any of these tools, and, AFAIK, there's no
open-source equivalent -- yet.

Comment 27 Wolfgang Bangerth 2003-07-08 19:35:36 UTC
I would like to posit one additional comment here that I hope you don't
take personal: When you say

> Finding these types of violations is extremely time consuming and
> it is hard to convince a non-impacted developer that his/her code is
> incorrect [...]
> Unfortunately, without some form of warning about when the
> optimization is used it is next to impossible to find and fix all
> of the occurrences.

I would like to disagree. You have certainly noticed that even those of
us who are well-trained in these questions tip-toed around the code and
dared not say whether it is legal or not until it was really down to
about one page. Rather than putting the blame on gcc, I would like to
put it on the original authors of the code. They chose a wicked code
design involving evil constructs such as casting to void** and using
reinterpret_cast. It is certainly not only my personal view that this
obfuscated the code to a degree that none of us could easily see what
was going on. It is not up to me to judge how this might affect the
mozilla developers community. However, I think there are more pressing
things to do in gcc than to invent ways to track the course pointers take
just to catch problems like this one. After all, it is not only time
consuming to track down such bugs, but also to implement tracking them
inside the compiler. 

Putting it somewhat bluntly (please take this with a grain of salt):
If you choose to shoot yourself in the foot, gcc shouldn't interfere.

W.
Comment 28 Franz Sirl 2003-07-08 19:42:47 UTC
Filed as a mozilla bug http://bugzilla.mozilla.org/show_bug.cgi?id=212082 
Comment 29 kevin.hendricks 2003-07-08 19:50:52 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

> They chose a wicked code
> design involving evil constructs such as casting to void** and using
> reinterpret_cast. It is certainly not only my personal view that this
> obfuscated the code to a degree that none of us could easily see what
> was going on.

Since it is not my code, I completely agree with you!    ;-)
Give me a dumb pointer over a "smart pointer" any day.

I am a big fan of obviousness and really love C compared to C++
because I hate having anything done "implicitly" and anything
that impacts following the clear flow of the code (such as templates,
multiple inheritance, implicit constructors and destructors, etc)
gives me a headache.  That was why it took me so long to trim
down the testcase to something small, I could not figure out
why nsDerivedsafe and nsGetterType templates were being
passed around to begin with.

> It is not up to me to judge how this might affect the
> mozilla developers community.

Unfortunately, it is just not them.  Almost every major piece of
code I have played with closely (the JDK, OpenOffice.org, and
now Mozilla) seems to have problems.  I build all of OOo, the JDK,
and even Mozilla with -fno-strict-aliasing just to avoid having to track
down these problems.

> However, I think there are more pressing
> things to do in gcc than to invent ways to track the course pointers 
> take
> just to catch problems like this one. After all, it is not only time
> consuming to track down such bugs, but also to implement tracking them
> inside the compiler.

I am sure there are other more important issues but the
improvements in the strictness of the C++ parser in gcc 3.3
plus heavy use of -Wall has certainly helped OpenOffice.org
find and fix serious bugs.  So having a similar warning for aliasing
violations really would help improve code in the wild.

> Putting it somewhat bluntly (please take this with a grain of salt):
> If you choose to shoot yourself in the foot, gcc shouldn't interfere.

Agreed!  ... but it might be nice if it told you the gun was using 
teflon
coated steel jacketed shells when you thought your gun was actually
loaded with blanks.


----
Kevin B. Hendricks
Associate Professor of Operations and Information Technology
Richard Ivey School of Business, University of Western Ontario
London Ontario, CANADA  N6A 3K7
khendricks@ivey.uwo.ca

Comment 30 Andrew Pinski 2003-07-08 19:59:04 UTC
replying to comment 24:
The reason why x86 does not have this problem is because the schedular does not move around 
load and stores as much as the PPC schedular does, look at the gcc which is SPEC, it contains non 
alias safe code and it passes comparison test using the pentium 3 schedular but not with the G4 
PPC one, it fails.  Another reason why it effects PPC linux only and not PPC darwin is that the PIC 
code generation is different so you get different code for the PIC code, it might effect PPC darwin 
(in fact the simplified testcase does seg fault on PPC darwin).  So the scheduling of the asm causes 
the load at different times.
Comment 31 kevin.hendricks 2003-07-08 20:02:56 UTC
Subject: Re:  [3.3/3.4 regression] mozilla-1.4 miscompiled

Hi,

If it helps with the Mozilla bug report.  The exact same bug bites with
Apple's gcc 3.3 under Mac OSX 10.2 if you add -fstrict-aliasing
(Apple's gcc 3.3 disables this by default).

Kevin

On Tuesday, July 8, 2003, at 03:42 PM, sirl at gcc dot gnu dot org 
wrote:

> PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* 
> gcc-bugs@gcc.gnu.org.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11376
>
>
>
> ------- Additional Comments From sirl at gcc dot gnu dot org  
> 2003-07-08 19:42 -------
> Filed as a mozilla bug 
> http://bugzilla.mozilla.org/show_bug.cgi?id=212082
>
>
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
>
----
Kevin B. Hendricks
Associate Professor of Operations and Information Technology
Richard Ivey School of Business, University of Western Ontario
London Ontario, CANADA  N6A 3K7
khendricks@ivey.uwo.ca

Comment 32 Wolfgang Bangerth 2003-07-08 20:10:51 UTC
> I am a big fan of obviousness and really love C compared to C++

It's not about C vs C++. C has the same aliasing rules, basically. In short,
they read: if you've got an object of some type, don't try to fool everybody
else (including the compiler) by pretending it has another type when you
access it. Type safety is not so hard to achieve, and there are big C++
projects out there that have banned the use of void* and reinterpret_cast.
I would certainly ask my collaborators to change their code if they tried
to introduce such concepts. There's no need to do this.

> So having a similar warning for aliasing
> violations really would help improve code in the wild.

This has been extensively discussed over and over again. I'd say for code
like the one in question here, there's little hope a compiler can track
what a certain pointer points to. (If it can, then just pack some accessor
functions into another translation unit, and things become an order of
magnitude harder). gcc has, between dot releases, switched off -fstrict-aliasing
when it was first introduced, to give developers more time to fix their code,
and because nobody could come up with a way to warn of invalid constructs.
But this was years ago. Besides, in most code, these problems are not so
hard to find -- as long as you avoid explicit casts, in particular
reinterpret_cast, there are not so many possibilities to introduce aliasing
violations. It is only if you try to outsmart C/C++'s type system that you
are in danger. But I don't see why people should do that, templates offer them
plenty of ways to always stay on the safe side, and the compiler will tell
you about illegal casts -- unless you use void* and reinterpret_cast.

So look out for these two constructs. Some perl-fu is more powerful than gcc
for this...

W.
Comment 33 Franz Sirl 2003-07-08 21:25:10 UTC
I've been asked to add a summary of this discussion to the mozilla bug 
http://bugzilla.mozilla.org/show_bug.cgi?id=212082 , but I'm not a good enough 
at c++ speak to do that. 
Would someone here please be so kind to do this? 
 
Comment 34 Wolfgang Bangerth 2003-07-08 21:57:00 UTC
I'll take care of that in a minute.
W.
Comment 35 David Baron 2003-07-08 22:00:32 UTC
I don't think such a summary is needed.  The people who would actually fix the
bug already understand the issues involved, even if the bug triagers who aren't
programmers don't.
Comment 36 David Baron 2003-07-08 23:26:02 UTC
In response to comment 32:
> It is only if you try to outsmart C/C++'s type system that you
> are in danger. But I don't see why people should do that, templates offer them
> plenty of ways to always stay on the safe side, and the compiler will tell
> you about illegal casts -- unless you use void* and reinterpret_cast.

Just to offer a little perspective from the Mozilla side of things, I think
there are two main reasons why people working on Mozilla don't do as you think
they should:

 1. Portability to compilers that exist now and that we want to support
generally ends up winning over portability to compilers that don't exist. 
Programmers generally don't understand all the rules (especially cases where
behavior is undefined) of the language standards.  Until late 2001, Mozilla
supported gcc 2.7.2.3 (because it was, until then, the only way to use Purify on
any Unix platform (Solaris), and valgrind didn't exist), which made using
templates very difficult.  Some of the other compilers that we still support
also cause problems, so introducing any new templates into Mozilla still
requires extensive testing and takes much more work than alternatives.

(This is even more true for many web standards.  While http://gcc.gnu.org/ is
valid HTML (I'm impressed), it's still violating a "should not"-level
requirement of the HTML specification since it uses tables for something other
than the presentation of tabular data.  Not that I blame you, since tables
remain the only portable way of achieving many types of presentation.)

 2. Compilers often don't optimize code the way the programmer wants.  (My most
recent experience with this was
http://bugzilla.mozilla.org/show_bug.cgi?id=206682#c25 , and I've been meaning
to file a gcc bug.)  nsCOMPtr is very heavily used in the Mozilla codebase and
has been hand-optimized for a number of different compilers.  (You miss all the
|#ifdef|s by looking at the preprocessor output.)  In this case, the aliasing
rules are being broken because of hand-optimization that I suspect was to
improve the size of the generated code (although one that's being used on all
platforms on which it compiles, which may not be necessary).


Don't take any of that to mean that we don't want to fix our aliasing bugs.  We do.
Comment 37 Wolfgang Bangerth 2003-07-08 23:55:47 UTC
I sympathize with your comments (since we have the same problem with
different compilers on different platforms, etc, although we don't have
to go back to gcc2.7...). Just one remark:

> Programmers generally don't understand all the rules (especially cases where
> behavior is undefined) of the language standards.

I certainly agree. On the other hand, note that it is not actually
trivial to trip over aliasing rules -- you have to actively cheat on the
compiler, by using reinterpret_casts etc. We get quite a number of reports 
with invalid code like that, so type safety doesn't seem to be very
high on the list for quite a number of programmers (how sad). I wonder
what cases these people are working around when you say you hand optimize
code that way. We'd certainly like to see such code in our "optimization"
section of bugzilla.

And don't forget to submit you gcc bug report :-)

Thanks
  Wolfgang
Comment 38 scc 2003-07-09 16:20:34 UTC
Howdy.  I'm the original author of the evil code in question :-)  Just checking
in to say I understand and appreciate the analysis.  The problem will be
extremely hard to fix within Mozilla in any `correct' way.  There are a couple
of different paths that will stop the symptoms from hurting the users at run-time.

I want to follow up to dbaron's earlier comment as well.

The purpose of |nsCOMPtr| is to provide sugar and glue to a clone of COM. 
Originally, |nsCOMPtr| was designed as a (almost) drop in replacement for raw
pointers to these reference counted COM objects.  The XPCOM inventors at the
time strongly resisted the introduction of a smart pointer class on the grounds
that it wasn't needed, that it couldn't work, that it would be a waste of space
and processing power, and that templates were broken in general.  The design had
to fit within the XPCOM rules without forcing itself on anyone.  That led to the
the primary hitch.  XPCOM functions that return pointers do so by assigning into
a |T**| result parameter.  This direct assignment interfered with any attempt to
manage refcounts automatically.  The answer was the ugly (and it turns out,
illegal) |getter_AddRefs( nsCOMPtr& )|, which |Releases| the current referent,
if any, and then provides a |T**| as expected by the called function for
assignment of the result.

This machinery allowed the same functions to be called with either a |T*| or an
|nsCOMPtr| in which to return their answer.

|nsCOMPtr|s began to dominate the code base because the people who didn't use
them pretty much always had leaks.  The people who did use them had far fewer. 
Had XPCOM's originators been more open minded about C++, we could have just
legislated that XPCOM object results are always returned through |nsCOMPtr&|s,
and no aliasing problem would exist; the |getter_AddRefs| glue could all be
thrown away.  I would like nothing more.  Such a change now would be very large.
 Not impossible, but certainly daunting.

Just a note of history to show how we got here, and to mention in passing that I
found nothing at which to take offense in any of the analyses and comments
above.  In fact, exactly the opposite: I really appreciate how deeply you guys
dug into the problem.  I hope I can come up with an answer that isn't just
another work-around on top of functionality that is, itself, a work-around.
Comment 39 Gwenole Beauchesne 2003-08-08 08:29:31 UTC
*** Bug 11846 has been marked as a duplicate of this bug. ***
Comment 40 Andrew Pinski 2005-05-07 22:30:27 UTC
*** Bug 21450 has been marked as a duplicate of this bug. ***
Comment 41 Andrew Pinski 2005-06-05 09:16:46 UTC
Reopening to ...
Comment 42 Andrew Pinski 2005-06-05 09:17:11 UTC
Mark as a dup of bug 21920.

*** This bug has been marked as a duplicate of 21920 ***