Bug 2692 - excessive compile time with optimization
Summary: excessive compile time with optimization
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.0
: P3 normal
Target Milestone: 4.0.0
Assignee: Richard Henderson
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2001-04-29 21:06 UTC by snyder
Modified: 2004-05-13 11:50 UTC (History)
4 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2004-04-29 01:05:52


Attachments
Original C++ test case (1.94 KB, application/x-gzip )
2003-05-21 15:17 UTC, snyder
Details
C version of test case (1.48 KB, application/gzip)
2004-01-17 21:13 UTC, Richard Henderson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description snyder 2001-04-29 21:06:01 UTC
If i try to compile the source below with -O2, gcc runs for over 45 minutes
and grows to over 300 MB.  On my machine, it exhausts the available
swap space and dies before completing.

Without optimization, it completes in a couple minutes.

gcc 2.95.2 dies immediately with an ICE on this source, regardless
of whether or not optimization is on, so i guess this isn't a regression.

Release:
3.0 20010429 (prerelease)

Environment:
System: Linux karma 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown
Architecture: i686

	
host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../egcs/configure --prefix=/usr/local/egcs --enable-threads=posix --enable-long-long
Comment 1 Richard Henderson 2002-04-02 00:31:55 UTC
State-Changed-From-To: open->feedback
State-Changed-Why: No test case.
Comment 2 Richard Henderson 2002-04-02 01:34:53 UTC
Responsible-Changed-From-To: unassigned->rth
Responsible-Changed-Why: .
Comment 3 Richard Henderson 2002-04-02 01:34:53 UTC
State-Changed-From-To: feedback->open
State-Changed-Why: Got test case.
Comment 4 Richard Henderson 2002-04-02 01:49:27 UTC
From: Richard Henderson <rth@redhat.com>
To: Scott Snyder <snyder@fnal.gov>
Cc: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
   nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org
Subject: Re: optimization/2692: excessive compile time with optimization
Date: Tue, 2 Apr 2002 01:49:27 -0800

 FWIW, 3.1 20020326 "only" took 12 minutes to compile this
 test case, and "only" used 45MB.  CPU users were
 
  expand                : 438.54 (63%) usr   0.11 ( 4%) sys 438.62 (62%) wall
  reload CSE regs       :  93.51 (14%) usr   0.01 ( 0%) sys  93.50 (13%) wall
  global alloc          :  77.31 (11%) usr   0.05 ( 2%) sys  77.38 (11%) wall
  regmove               :  46.59 ( 7%) usr   0.01 ( 0%) sys  46.56 ( 7%) wall
 
 
 r~
Comment 5 snyder 2002-04-02 03:11:25 UTC
From: Scott Snyder <snyder@fnal.gov>
To: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
        nobody@gcc.gnu.org, snyder@fnal.gov, gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: optimization/2692: excessive compile time with optimization
Date: 02 Apr 2002 03:11:25 -0600

 From: scott snyder <snyder@fnal.gov>
 Subject: optimization/2692: excessive compile time with optimization
 To: gcc-bugs@gcc.gnu.org
 Date: Fri, 18 May 2001 15:34:07 -0500
 
 
 hi -
 
 I just noticed that somehow the code sample for this report didn't make
 it into gnats.  I'm not sure what happened --- it's in my local copy
 of the report that i saved before sending it.
 
 Anyway, here's the complete version, including the code.
 
 sss
 
 SEND-PR: -*- send-pr -*-
 SEND-PR: Lines starting with `SEND-PR' will be removed automatically, as
 SEND-PR: will all comments (text enclosed in `<' and `>').
 SEND-PR: 
 SEND-PR: Please consult the GCC manual if you are not sure how to
 SEND-PR: fill out a problem report.
 SEND-PR: Note that the Synopsis field is mandatory.  The Subject (for
 SEND-PR: the mail) will be made the same as Synopsis unless explicitly
 SEND-PR: changed.
 SEND-PR:
 SEND-PR: Choose from the following categories:
 SEND-PR:
 SEND-PR: c++ c debug fortran java libf2c libgcj libobjc libstdc++ middle-end objc optimization other preprocessor target web 
 SEND-PR: 
 SEND-PR:
 To: gcc-gnats@gcc.gnu.org
 Subject: excessive compile time with optimization
 From: snyder@fnal.gov
 Reply-To: snyder@fnal.gov
 Cc: 
 X-send-pr-version: 3.113
 X-GNATS-Notify: 
 
 
 >Submitter-Id:	net
 >Originator:	scott snyder
 >Organization:	<organization of PR author (multiple lines)>
 >Confidential:	no
 SEND-PR: Leave "Confidential" as "no"; all GCC PRs are public.
 >Synopsis:	excessive compile time with optimization
 >Severity:	serious
 SEND-PR: critical     GCC is completely not operational; no work-around known.
 SEND-PR: serious      GCC is not working properly; a work-around is possible.
 SEND-PR: non-critical Report indicates minor problem.
 >Priority:	low
 SEND-PR: high         A solution is necessary as soon as possible.
 SEND-PR: medium       The problem should be solved in the next release.
 SEND-PR: low          The problem should be solve in a future release.
 >Category:	optimization
 >Class:		sw-bug
 SEND-PR: doc-bug          The doumentation is incorrect.
 SEND-PR: accepts-illegal  GCC fails to reject erroneous code.
 SEND-PR: rejects-legal    GCC gives an error message for correct code.
 SEND-PR: wrong-code       The machine code generated by gcc is incorrect.
 SEND-PR: ice-on-legal-code   GCC gives an Internal Compiler Error (ICE)
 SEND-PR:                     for correct code
 SEND-PR: ice-on-illegal-code GCC gives an ICE instead of reporting an error
 SEND-PR: pessimizes-code     GCC misses an important optimization opportunity
 SEND-PR: sw-bug              Software bug of some other class than above
 SEND-PR: change-request      A feature in GCC is missing.
 SEND-PR: support             I need help with gcc.
 >Release:	3.0 20010429 (prerelease)
 >Environment:
 System: Linux karma 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown
 Architecture: i686
 
 	<machine, os, target, libraries (multiple lines)>
 host: i686-pc-linux-gnu
 build: i686-pc-linux-gnu
 target: i686-pc-linux-gnu
 configured with: ../egcs/configure --prefix=/usr/local/egcs --enable-threads=posix --enable-long-long
 >Description:
 
 If i try to compile the source below with -O2, gcc runs for over 45 minutes
 and grows to over 300 MB.  On my machine, it exhausts the available
 swap space and dies before completing.
 
 Without optimization, it completes in a couple minutes.
 
 gcc 2.95.2 dies immediately with an ICE on this source, regardless
 of whether or not optimization is on, so i guess this isn't a regression.
 
 >How-To-Repeat:
 
 
 namespace std
 {
 
 
   class dcomplex
   {
   public:
     typedef double value_type;
 
     dcomplex(double  =0.0, double =0.0);
         
     double real() const;
     double imag() const;
         
     dcomplex& operator=(double);
     
     dcomplex& operator=(const dcomplex&);
     dcomplex& operator+=(const dcomplex&);
     dcomplex& operator-=(const dcomplex&);
     dcomplex& operator*=(const dcomplex&);
     
   private:
     typedef __complex__ double _ComplexT;
     _ComplexT _M_value;
 
     dcomplex(_ComplexT __z) : _M_value(__z) { }
   };
 
 inline dcomplex
 operator*(const dcomplex& __x, const dcomplex& __y)
 { return dcomplex (__x) *= __y; }
 
 inline dcomplex
 operator-(const dcomplex& __x, const dcomplex& __y)
 { return dcomplex (__x) -= __y; }
 
 inline dcomplex
 operator+(const dcomplex& __x, const dcomplex& __y)
 { return dcomplex (__x) += __y; }
 
   inline double
   dcomplex::real() const
   { return __real__ _M_value; }
 
   inline double
   dcomplex::imag() const
   { return __imag__ _M_value; }
 
   inline
   dcomplex::dcomplex(double __r, double __i)
   {
     __real__ _M_value = __r;
     __imag__ _M_value = __i;
   }
 
   inline dcomplex&
   dcomplex::operator=(double __d)
   {
     __real__ _M_value = __d;
     __imag__ _M_value = 0.0;
     return *this;
   }
 
 
 
     inline dcomplex&
     dcomplex::operator=(const dcomplex& __z)
     {
       __real__ _M_value = __z.real();
       __imag__ _M_value = __z.imag();
       return *this;
     }
     
     inline dcomplex&
     dcomplex::operator+=(const dcomplex& __z)
     {
       __real__ _M_value += __z.real();
       __imag__ _M_value += __z.imag();
       return *this;
     }
 
     inline dcomplex&
     dcomplex::operator-=(const dcomplex& __z)
     {
       __real__ _M_value -= __z.real();
       __imag__ _M_value -= __z.imag();
       return *this;
     }
 
     inline dcomplex&
     dcomplex::operator*=(const dcomplex& __z)
     {
       _ComplexT __t;
       __real__ __t = __z.real();
       __imag__ __t = __z.imag();
       _M_value *= __t;
       return *this;
     }
 
 
 } // namespace std
 
 
 
 
 
 typedef std::dcomplex Complex8;
 
 Complex8 determinant(Complex8 _m[6][6])
 {
   Complex8 ret  ( 0.0, 0.0 );
   Complex8 ret5 ( 0.0, 0.0 );
   Complex8 ret4 ( 0.0, 0.0 );
   Complex8 ret3 ( 0.0, 0.0 );
 
   ret3 =
     ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0];
 
 
   ret4 += _m[3][2] * ret3;
 
   ret5 += _m[4][4] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2];
 
   ret4 += _m[3][1] * ret3;
 
   ret5 += _m[4][0] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3];
 
 
   ret4 += _m[3][4] * ret3;
 
   ret5 += _m[4][1] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0];
 
 
   ret4 += _m[3][3] * ret3;
 
   ret5 += _m[4][2] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2];
 
 
   ret4 += _m[3][1] * ret3;
 
   ret5 += _m[4][3] * ret4;
   ret4 = 0.0;
 
   ret += _m[5][5] * ret5;
   ret5 = 0.0;
 
 
   ret3 =
     ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4];
 
 
   ret4 += _m[3][5] * ret3;
 
   ret5 += _m[4][3] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1];
 
 
   ret4 += _m[3][3] * ret3;
 
   ret5 += _m[4][4] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3];
 
 
   ret4 += _m[3][2] * ret3;
 
   ret5 += _m[4][5] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4];
 
 
   ret4 += _m[3][5] * ret3;
 
   ret5 += _m[4][1] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1];
 
   ret4 += _m[3][4] * ret3;
 
   ret5 += _m[4][2] * ret4;
   ret4 = 0.0;
 
   ret += _m[5][0] * ret5;
   ret5 = 0.0;
 
 
   ret3 =
     ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4];
 
 
   ret4 += _m[3][3] * ret3;
 
   ret5 += _m[4][2] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5];
 
 
   ret4 += _m[3][0] * ret3;
 
   ret5 += _m[4][3] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2];
 
 
   ret4 += _m[3][5] * ret3;
 
   ret5 += _m[4][4] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4];
 
 
   ret4 += _m[3][3] * ret3;
 
   ret5 += _m[4][5] * ret4;
   ret4 = 0.0;
 
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5];
 
 
   ret4 += _m[3][2] * ret3;
 
   ret5 += _m[4][0] * ret4;
   ret4 = 0.0;
 
   ret += _m[5][1] * ret5;
   ret5 = 0.0;
 
   ret3 =
     ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4]
   + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3];
 
 
   ret4 += _m[3][5] * ret3;
 
   ret5 += _m[4][0] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5];
 
 
   ret4 += _m[3][4] * ret3;
 
   ret5 += _m[4][1] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0];
 
 
   ret4 += _m[3][1] * ret3;
 
   ret5 += _m[4][3] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3];
 
 
   ret4 += _m[3][0] * ret3;
 
   ret5 += _m[4][4] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4]
   + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0];
 
 
   ret4 += _m[3][4] * ret3;
 
   ret5 += _m[4][5] * ret4;
   ret4 = 0.0;
 
   ret += _m[5][2] * ret5;
   ret5 = 0.0;
 
 
   ret3 =
     ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1];
 
 
   ret4 += _m[3][2] * ret3;
 
   ret5 += _m[4][5] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4]
   + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4];
 
 
   ret4 += _m[3][1] * ret3;
 
   ret5 += _m[4][0] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4]
   + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5]
   + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2];
 
   ret4 += _m[3][4] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4]
   + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0];
 
 
   ret4 += _m[3][5] * ret3;
 
   ret5 += _m[4][1] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0]
   + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4]
   + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5]
   + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4]
   + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1];
 
 
   ret4 += _m[3][4] * ret3;
 
   ret5 += _m[4][2] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5];
 
 
   ret4 += _m[3][1] * ret3;
 
   ret5 += _m[4][4] * ret4;
   ret4 = 0.0;
 
   ret += _m[5][3] * ret5;
   ret5 = 0.0;
 
   ret3 =
     ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1];
 
 
   ret4 += _m[3][0] * ret3;
 
   ret5 += _m[4][3] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2];
 
 
   ret4 += _m[3][3] * ret3;
 
   ret5 += _m[4][5] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2]
   + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5]
   + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2]
   + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3]
   + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3]
   + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5];
 
 
   ret4 += _m[3][2] * ret3;
 
   ret5 += _m[4][0] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5]
   + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0]
   + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3];
 
   ret4 += _m[3][2] * ret3;
 
   ret3 =
     ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2]
   + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5]
   + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0]
   + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2]
   + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
 
   ret3 =
     ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3]
   + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2];
 
 
   ret4 += _m[3][0] * ret3;
 
   ret5 += _m[4][1] * ret4;
   ret4 = 0.0;
 
   ret3 =
     ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1]
   + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3]
   + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5];
 
   ret4 += _m[3][0] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5]
   + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0]
   + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3];
 
   ret4 += _m[3][1] * ret3;
 
   ret3 =
     ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1]
   + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5]
   + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0];
 
   ret4 += _m[3][3] * ret3;
 
   ret3 =
     ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0]
   + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1]
   + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3];
 
   ret4 += _m[3][5] * ret3;
   ret5 += _m[4][2] * ret4;
   ret  += _m[5][4] * ret5;
 
   return ret;
 }
 
 
 
 >Fix:
 	<how to correct or work around the problem, if known (multiple lines)>
Comment 6 Richard Henderson 2002-04-12 16:43:50 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: http://gcc.gnu.org/ml/gcc-patches/2002-04/msg00626.html
    Doesn't actually work, but highlights the problem.
Comment 7 Steven Bosscher 2003-07-25 10:49:09 UTC
This is still slow, but not as bad as it used to be.  Here are time reports from
what I get on an Athlon XP2000 with 256MB RAM, for "g++-3.4 (GCC) 3.4 20030718
(experimental)":

$ g++-3.4 -c -ftime-report 2692.cc
 
Execution times (seconds)
 cfg construction      :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall
 trivially dead code   :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 life analysis         :   0.10 ( 9%) usr   0.00 ( 0%) sys   0.10 ( 8%) wall
 life info update      :   0.03 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall
 register scan         :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 parser                :   0.12 (11%) usr   0.01 (17%) sys   0.14 (12%) wall
 name lookup           :   0.01 ( 1%) usr   0.02 (33%) sys   0.04 ( 3%) wall
 expand                :   0.09 ( 8%) usr   0.01 (17%) sys   0.10 ( 8%) wall
 integration           :   0.03 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall
 flow analysis         :   0.02 ( 2%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall
 local alloc           :   0.11 (10%) usr   0.00 ( 0%) sys   0.11 ( 9%) wall
 global alloc          :   0.34 (31%) usr   0.00 ( 0%) sys   0.34 (29%) wall
 flow 2                :   0.03 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall
 shorten branches      :   0.04 ( 4%) usr   0.00 ( 0%) sys   0.04 ( 3%) wall
 reg stack             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall
 final                 :   0.06 ( 5%) usr   0.01 (17%) sys   0.07 ( 6%) wall
 rest of compilation   :   0.08 ( 7%) usr   0.01 (17%) sys   0.09 ( 8%) wall
 TOTAL                 :   1.11             0.06             1.19
$ g++-3.4 -c -O -ftime-report 2692.cc
 
Execution times (seconds)
 garbage collection    :   0.17 ( 0%) usr   0.01 ( 3%) sys   0.68 ( 0%) wall
 cfg construction      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 cfg cleanup           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 trivially dead code   :   0.13 ( 0%) usr   0.01 ( 3%) sys   0.14 ( 0%) wall
 life analysis         :  99.39 (67%) usr   0.06 (17%) sys 105.98 (66%) wall
 life info update      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 alias analysis        :   0.21 ( 0%) usr   0.01 ( 3%) sys   0.24 ( 0%) wall
 register scan         :   0.08 ( 0%) usr   0.01 ( 3%) sys   0.09 ( 0%) wall
 rebuild jump labels   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 preprocessing         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 parser                :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 name lookup           :   0.00 ( 0%) usr   0.03 ( 8%) sys   0.03 ( 0%) wall
 expand                :   0.52 ( 0%) usr   0.05 (14%) sys   0.60 ( 0%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 integration           :   0.47 ( 0%) usr   0.01 ( 3%) sys   0.48 ( 0%) wall
 jump                  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 CSE                   :   2.65 ( 2%) usr   0.03 ( 8%) sys   3.05 ( 2%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 branch prediction     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :   6.36 ( 4%) usr   0.00 ( 0%) sys   6.79 ( 4%) wall
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   0.33 ( 0%) usr   0.01 ( 3%) sys   0.48 ( 0%) wall
 global alloc          :  34.50 (23%) usr   0.11 (31%) sys  39.30 (24%) wall
 reload CSE regs       :   1.60 ( 1%) usr   0.00 ( 0%) sys   1.81 ( 1%) wall
 flow 2                :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 rename registers      :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall
 shorten branches      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 reg stack             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 final                 :   0.09 ( 0%) usr   0.01 ( 3%) sys   0.13 ( 0%) wall
 rest of compilation   :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall
 TOTAL                 : 147.62             0.36           161.10

So the expand hog is gone :-)

It's not a surprise that, for the test case for this PR, global alloc and life
analysis take so much time.  It would obviously be nice to have it faster, but
it is not the awful compile time hog anymore.

Richard, I have not reconfirmed this PR because I am not sure what's reasonable
here.  Do you think this report can be closed, or do you think these timings
still are unacceptable?

Gr.
Comment 8 Steven Bosscher 2003-07-25 15:35:34 UTC
Compiler:
GNU C++ version 3.4 20030725 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 3.4 20030725 (experimental).
GGC heuristics: --param ggc-min-expand=47 --param ggc-min-heapsize=31916

Flags:
-O -quiet

File:
z.cc

Flat profile:
                                                                               
                       Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 10.73     40.70    40.70    19930     2.04     3.22  find_equiv_reg
  9.68     77.41    36.71 193175759     0.00     0.00  find_base_term
  9.53    113.58    36.17 178499488     0.00     0.00  refers_to_regno_p
  8.26    144.90    31.32 446734067     0.00     0.00  canon_rtx
  7.49    173.30    28.40 467218101     0.00     0.00  rtx_equal_p
  7.20    200.62    27.32 176913679     0.00     0.00  read_dependence
  6.26    224.38    23.75 73755132     0.00     0.00  addr_side_effect_eval
  5.96    246.99    22.61 295299380     0.00     0.00  true_regnum
  4.69    264.79    17.80 175470757     0.00     0.00  canon_true_dependence
  4.38    281.39    16.61 578096780     0.00     0.00  ix86_find_base_term
  4.36    297.94    16.54 178512386     0.00     0.00  reg_overlap_mentioned_p
  4.11    313.51    15.58 578101529     0.00     0.00  i386_output_dwarf_dtprel
  3.67    327.43    13.91 18646114     0.00     0.00  regno_clobbered_at_setjmp
  2.38    336.45     9.02 176829268     0.00     0.00  main
  1.29    341.36     4.91 175470757     0.00     0.00  anti_dependence
  1.07    345.41     4.05  3014519     0.00     0.00  propagate_block
  (all others <1%)

So find_equiv_reg is a bottleneck for this code.
Comment 9 Steven Bosscher 2003-07-25 16:56:52 UTC
Bug 10776 may be related to this one.
Comment 10 Andrew Pinski 2003-08-16 14:23:36 UTC
On powerpc-apple-darwin6.6, the combiner is where most of the work is done:
Execution times (seconds)
 garbage collection    :   1.12 ( 1%) usr   0.00 ( 0%) sys   2.16 ( 2%) wall
 cfg construction      :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.57 ( 0%) wall
 cfg cleanup           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 trivially dead code   :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall
 life analysis         :   9.04 ( 9%) usr   0.00 ( 0%) sys  10.26 ( 8%) wall
 life info update      :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall
 alias analysis        :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall
 register scan         :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall
 rebuild jump labels   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 preprocessing         :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall
 parser                :   0.54 ( 1%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall
 name lookup           :   0.66 ( 1%) usr   0.00 ( 0%) sys   0.73 ( 1%) wall
 expand                :   1.28 ( 1%) usr   0.00 ( 0%) sys   2.42 ( 2%) wall
 integration           :   1.10 ( 1%) usr   0.00 ( 0%) sys   3.75 ( 3%) wall
 jump                  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 CSE                   :   5.04 ( 5%) usr   0.00 ( 0%) sys   6.21 ( 5%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 branch prediction     :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall
 flow analysis         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 combiner              :  63.88 (66%) usr   0.00 ( 0%) sys  82.54 (65%) wall
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   1.04 ( 1%) usr   0.00 ( 0%) sys   1.96 ( 2%) wall
 global alloc          :   5.34 ( 6%) usr   0.00 ( 0%) sys   6.33 ( 5%) wall
 reload CSE regs       :   3.57 ( 4%) usr   0.00 ( 0%) sys   4.07 ( 3%) wall
 flow 2                :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall
 rename registers      :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall
 shorten branches      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 final                 :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.41 ( 0%) wall
 rest of compilation   :   0.71 ( 1%) usr   0.00 ( 0%) sys   1.76 ( 1%) wall
 TOTAL                 :  96.15             0.00           126.51
Comment 11 Andrew Pinski 2003-10-30 21:22:58 UTC
On the mainline (20031030), this code with -O3, gcc ICEs on powerpc-apple-darwin in 
the webizer pass.
Comment 12 Andrew Pinski 2003-11-23 08:24:37 UTC
It is cool that -O3 and -O2 are faster than -O1 (unit-at-a-time causes this)
The time has migrated to rename (for -O3 at least) registers on the mainline:
-O3 -fno-web:
Execution times (seconds)
 garbage collection    :   1.78 ( 2%) usr   0.01 ( 0%) sys   2.76 ( 2%) wall
 callgraph construction:   0.09 ( 0%) usr   0.02 ( 1%) sys   0.34 ( 0%) wall
 cfg construction      :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall
 cfg cleanup           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 trivially dead code   :   0.34 ( 0%) usr   0.02 ( 1%) sys   0.70 ( 1%) wall
 life analysis         :   2.81 ( 3%) usr   0.00 ( 0%) sys   3.02 ( 3%) wall
 life info update      :   1.07 ( 1%) usr   0.02 ( 1%) sys   1.30 ( 1%) wall
 alias analysis        :   0.45 ( 0%) usr   0.05 ( 2%) sys   0.71 ( 1%) wall
 register scan         :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall
 rebuild jump labels   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 preprocessing         :   0.05 ( 0%) usr   0.04 ( 2%) sys   0.15 ( 0%) wall
 parser                :   0.65 ( 1%) usr   0.20 ( 9%) sys   1.54 ( 1%) wall
 name lookup           :   0.16 ( 0%) usr   0.44 (19%) sys   0.65 ( 1%) wall
 expand                :   0.91 ( 1%) usr   0.12 ( 5%) sys   3.07 ( 3%) wall
 integration           :   1.11 ( 1%) usr   0.09 ( 4%) sys   1.55 ( 1%) wall
 jump                  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 CSE                   :   3.09 ( 3%) usr   0.07 ( 3%) sys   3.85 ( 3%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 CSE 2                 :   0.92 ( 1%) usr   0.02 ( 1%) sys   1.02 ( 1%) wall
 branch prediction     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :  15.43 (16%) usr   0.13 ( 6%) sys  18.74 (16%) wall
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 regmove               :   0.40 ( 0%) usr   0.01 ( 0%) sys   0.41 ( 0%) wall
 scheduling            :   8.13 ( 8%) usr   0.46 (20%) sys   9.42 ( 8%) wall
 local alloc           :   1.21 ( 1%) usr   0.02 ( 1%) sys   1.30 ( 1%) wall
 global alloc          :   2.52 ( 3%) usr   0.08 ( 3%) sys   2.72 ( 2%) wall
 reload CSE regs       :   0.94 ( 1%) usr   0.00 ( 0%) sys   0.97 ( 1%) wall
 flow 2                :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 rename registers      :  51.52 (53%) usr   0.07 ( 3%) sys  56.65 (48%) wall
 scheduling 2          :   1.63 ( 2%) usr   0.39 (17%) sys   4.27 ( 4%) wall
 shorten branches      :   0.09 ( 0%) usr   0.01 ( 0%) sys   0.12 ( 0%) wall
 final                 :   0.19 ( 0%) usr   0.02 ( 1%) sys   0.22 ( 0%) wall
 rest of compilation   :   0.45 ( 0%) usr   0.01 ( 0%) sys   0.50 ( 0%) wall
 TOTAL                 :  96.78             2.33           116.99

-O0:
Execution times (seconds)
 garbage collection    :   0.13 ( 3%) usr   0.00 ( 0%) sys   0.14 ( 2%) wall
 cfg construction      :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 trivially dead code   :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall
 life analysis         :   0.41 (10%) usr   0.00 ( 0%) sys   0.50 ( 8%) wall
 life info update      :   0.21 ( 5%) usr   0.00 ( 0%) sys   0.23 ( 4%) wall
 register scan         :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 1%) wall
 rebuild jump labels   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 preprocessing         :   0.04 ( 1%) usr   0.07 ( 9%) sys   0.11 ( 2%) wall
 parser                :   0.68 (16%) usr   0.18 (22%) sys   0.78 (12%) wall
 name lookup           :   0.19 ( 5%) usr   0.45 (56%) sys   0.78 (12%) wall
 expand                :   0.22 ( 5%) usr   0.00 ( 0%) sys   0.26 ( 4%) wall
 integration           :   0.08 ( 2%) usr   0.00 ( 0%) sys   0.09 ( 1%) wall
 jump                  :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall
 flow analysis         :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   0.73 (18%) usr   0.02 ( 3%) sys   1.21 (19%) wall
 global alloc          :   0.76 (18%) usr   0.02 ( 3%) sys   1.11 (18%) wall
 flow 2                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 shorten branches      :   0.07 ( 2%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall
 final                 :   0.17 ( 4%) usr   0.01 ( 1%) sys   0.35 ( 6%) wall
 rest of compilation   :   0.24 ( 6%) usr   0.00 ( 0%) sys   0.35 ( 6%) wall
 TOTAL                 :   4.15             0.80             6.25

-O1:
Execution times (seconds)
 garbage collection    :   1.23 ( 1%) usr   0.00 ( 0%) sys   1.28 ( 1%) wall
 cfg construction      :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 cfg cleanup           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 trivially dead code   :   0.32 ( 0%) usr   0.01 ( 1%) sys   0.30 ( 0%) wall
 life analysis         :   3.45 ( 3%) usr   0.02 ( 1%) sys   4.09 ( 4%) wall
 life info update      :   1.02 ( 1%) usr   0.00 ( 0%) sys   1.05 ( 1%) wall
 alias analysis        :   0.41 ( 0%) usr   0.04 ( 3%) sys   0.47 ( 0%) wall
 register scan         :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall
 rebuild jump labels   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 preprocessing         :   0.05 ( 0%) usr   0.07 ( 5%) sys   0.16 ( 0%) wall
 parser                :   0.61 ( 1%) usr   0.18 (13%) sys   0.75 ( 1%) wall
 name lookup           :   0.37 ( 0%) usr   0.34 (25%) sys   0.76 ( 1%) wall
 expand                :   0.66 ( 1%) usr   0.08 ( 6%) sys   0.74 ( 1%) wall
 integration           :   1.02 ( 1%) usr   0.02 ( 1%) sys   1.07 ( 1%) wall
 jump                  :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall
 CSE                   :   3.27 ( 3%) usr   0.03 ( 2%) sys   3.39 ( 3%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 branch prediction     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 flow analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 combiner              :  81.10 (79%) usr   0.20 (15%) sys  83.87 (77%) wall
 if-conversion         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   1.62 ( 2%) usr   0.05 ( 4%) sys   1.67 ( 2%) wall
 global alloc          :   3.68 ( 4%) usr   0.23 (17%) sys   4.02 ( 4%) wall
 reload CSE regs       :   1.37 ( 1%) usr   0.01 ( 1%) sys   1.94 ( 2%) wall
 flow 2                :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall
 rename registers      :   0.47 ( 0%) usr   0.00 ( 0%) sys   0.60 ( 1%) wall
 shorten branches      :   0.17 ( 0%) usr   0.02 ( 1%) sys   0.23 ( 0%) wall
 final                 :   0.43 ( 0%) usr   0.01 ( 1%) sys   0.61 ( 1%) wall
 rest of compilation   :   0.75 ( 1%) usr   0.02 ( 1%) sys   0.80 ( 1%) wall
 TOTAL                 : 102.71             1.37           108.74

-O2:
Execution times (seconds)
 garbage collection    :   1.69 ( 4%) usr   0.00 ( 0%) sys   2.63 ( 4%) wall
 callgraph construction:   0.08 ( 0%) usr   0.02 ( 1%) sys   0.11 ( 0%) wall
 callgraph optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 cfg construction      :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 cfg cleanup           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 trivially dead code   :   0.34 ( 1%) usr   0.01 ( 0%) sys   0.38 ( 1%) wall
 life analysis         :   2.88 ( 6%) usr   0.03 ( 1%) sys   3.48 ( 6%) wall
 life info update      :   0.97 ( 2%) usr   0.00 ( 0%) sys   1.09 ( 2%) wall
 alias analysis        :   0.44 ( 1%) usr   0.07 ( 3%) sys   0.75 ( 1%) wall
 register scan         :   0.24 ( 1%) usr   0.01 ( 0%) sys   0.25 ( 0%) wall
 rebuild jump labels   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 preprocessing         :   0.07 ( 0%) usr   0.08 ( 4%) sys   0.13 ( 0%) wall
 parser                :   0.55 ( 1%) usr   0.14 ( 6%) sys   0.82 ( 1%) wall
 name lookup           :   0.27 ( 1%) usr   0.43 (19%) sys   0.63 ( 1%) wall
 expand                :   0.73 ( 2%) usr   0.05 ( 2%) sys   0.81 ( 1%) wall
 integration           :   1.04 ( 2%) usr   0.14 ( 6%) sys   1.24 ( 2%) wall
 jump                  :   0.08 ( 0%) usr   0.02 ( 1%) sys   0.08 ( 0%) wall
 CSE                   :   3.13 ( 7%) usr   0.06 ( 3%) sys   4.16 ( 7%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 CSE 2                 :   0.89 ( 2%) usr   0.02 ( 1%) sys   0.99 ( 2%) wall
 branch prediction     :   0.03 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :  15.77 (35%) usr   0.11 ( 5%) sys  18.25 (31%) wall
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 regmove               :   0.40 ( 1%) usr   0.00 ( 0%) sys   0.48 ( 1%) wall
 scheduling            :   8.15 (18%) usr   0.47 (21%) sys   9.75 (16%) wall
 local alloc           :   1.28 ( 3%) usr   0.02 ( 1%) sys   1.44 ( 2%) wall
 global alloc          :   2.46 ( 5%) usr   0.10 ( 4%) sys   3.05 ( 5%) wall
 reload CSE regs       :   0.96 ( 2%) usr   0.02 ( 1%) sys   1.17 ( 2%) wall
 flow 2                :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 rename registers      :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.55 ( 1%) wall
 scheduling 2          :   1.32 ( 3%) usr   0.41 (18%) sys   5.31 ( 9%) wall
 shorten branches      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 final                 :   0.22 ( 0%) usr   0.02 ( 1%) sys   0.89 ( 1%) wall
 rest of compilation   :   0.46 ( 1%) usr   0.00 ( 0%) sys   0.52 ( 1%) wall
 TOTAL                 :  45.16             2.24            59.57

-O1 -funit-at-a-time:

Execution times (seconds)
 garbage collection    :   1.18 ( 4%) usr   0.00 ( 0%) sys   1.22 ( 4%) wall
 callgraph construction:   0.10 ( 0%) usr   0.01 ( 1%) sys   0.10 ( 0%) wall
 cfg construction      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 cfg cleanup           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 trivially dead code   :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%) wall
 life analysis         :   2.70 ( 9%) usr   0.02 ( 2%) sys   2.75 ( 8%) wall
 life info update      :   0.54 ( 2%) usr   0.00 ( 0%) sys   0.56 ( 2%) wall
 alias analysis        :   0.27 ( 1%) usr   0.01 ( 1%) sys   0.31 ( 1%) wall
 register scan         :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 rebuild jump labels   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 preprocessing         :   0.01 ( 0%) usr   0.12 (11%) sys   0.12 ( 0%) wall
 parser                :   0.51 ( 2%) usr   0.15 (14%) sys   0.81 ( 2%) wall
 name lookup           :   0.34 ( 1%) usr   0.38 (35%) sys   0.70 ( 2%) wall
 expand                :   0.71 ( 2%) usr   0.07 ( 6%) sys   0.78 ( 2%) wall
 integration           :   1.11 ( 4%) usr   0.08 ( 7%) sys   1.21 ( 4%) wall
 jump                  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 CSE                   :   2.00 ( 6%) usr   0.05 ( 5%) sys   2.09 ( 6%) wall
 loop analysis         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 branch prediction     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :  17.04 (54%) usr   0.07 ( 6%) sys  17.57 (53%) wall
 local alloc           :   0.75 ( 2%) usr   0.03 ( 3%) sys   0.80 ( 2%) wall
 global alloc          :   1.80 ( 6%) usr   0.04 ( 4%) sys   1.89 ( 6%) wall
 reload CSE regs       :   0.47 ( 2%) usr   0.00 ( 0%) sys   0.49 ( 1%) wall
 flow 2                :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 rename registers      :   0.23 ( 1%) usr   0.00 ( 0%) sys   0.23 ( 1%) wall
 shorten branches      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 final                 :   0.18 ( 1%) usr   0.02 ( 2%) sys   0.21 ( 1%) wall
 rest of compilation   :   0.47 ( 2%) usr   0.01 ( 1%) sys   0.47 ( 1%) wall
 TOTAL                 :  31.27             1.09            33.21

-O0 -funit-at-a-time:
Execution times (seconds)
 garbage collection    :   0.14 ( 3%) usr   0.00 ( 0%) sys   0.14 ( 2%) wall
 callgraph construction:   0.10 ( 2%) usr   0.00 ( 0%) sys   0.11 ( 2%) wall
 cfg construction      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 trivially dead code   :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 life analysis         :   0.42 (10%) usr   0.01 ( 1%) sys   1.09 (15%) wall
 life info update      :   0.21 ( 5%) usr   0.00 ( 0%) sys   0.23 ( 3%) wall
 register scan         :   0.05 ( 1%) usr   0.01 ( 1%) sys   0.04 ( 1%) wall
 rebuild jump labels   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 preprocessing         :   0.07 ( 2%) usr   0.06 ( 9%) sys   0.09 ( 1%) wall
 parser                :   0.64 (15%) usr   0.15 (22%) sys   0.80 (11%) wall
 name lookup           :   0.25 ( 6%) usr   0.36 (54%) sys   0.71 (10%) wall
 expand                :   0.22 ( 5%) usr   0.01 ( 1%) sys   0.24 ( 3%) wall
 integration           :   0.07 ( 2%) usr   0.01 ( 1%) sys   0.07 ( 1%) wall
 jump                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 local alloc           :   0.74 (17%) usr   0.01 ( 1%) sys   0.80 (11%) wall
 global alloc          :   0.74 (17%) usr   0.02 ( 3%) sys   1.25 (17%) wall
 flow 2                :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 shorten branches      :   0.07 ( 2%) usr   0.00 ( 0%) sys   0.08 ( 1%) wall
 final                 :   0.20 ( 5%) usr   0.00 ( 0%) sys   1.01 (14%) wall
 rest of compilation   :   0.29 ( 7%) usr   0.00 ( 0%) sys   0.31 ( 4%) wall
 TOTAL                 :   4.37             0.67             7.16
Comment 13 Richard Henderson 2004-01-15 01:16:50 UTC
Current state on mainline, at least for x86 at -O2, is that we spend lots
of time in flow doing dead store elimination,

 life analysis         :  60.94 (76%) usr   0.00 ( 0%) sys  61.02 (75%) wall
 TOTAL                 :  80.58             0.39            81.01

If I tweek flow.c to not do *any* store elimination at all, I can pull
the total down to ~75 seconds.  I don't see anything easy to do to even
bridge the gap between these two times at this late stage of 3.4.

On tree-ssa branch, we do significantly better.

 TOTAL                 :  18.08             0.51            18.57

This with the original C++ test case.  If I crop the std::dcomplex parts
and use the _Complex support in C, then I get

 TOTAL                 :   5.85             0.17             6.01

Clearly there's work to do yet in unraveling the abstraction, but either
compilation time is acceptable, so I'm going to suspend this PR as fixed
pending merge to mainline.
Comment 14 Richard Henderson 2004-01-17 21:13:18 UTC
Created attachment 5509 [details]
C version of test case

For comparison purposes, a C version of the test case using _Complex.
We *should* get the same code out of the C++ front end.  That we don't
is a missed optimization.
Comment 15 Andrew Pinski 2004-04-29 01:05:52 UTC
With my cast pass we cut the time in half of the current tree-ssa compiler.  It also 
improves the code too.
Comment 16 Giovanni Bajo 2004-04-29 01:34:24 UTC
The C vs C++ difference can probably be tracked in a new different (cleaner) PR.
Comment 17 Andrew Pinski 2004-04-29 03:29:01 UTC
I filed a bug which should help the code generation differences between C and C++, PR 15197.
Comment 18 Andrew Pinski 2004-05-13 11:50:39 UTC
Fixed for 3.5.0 by the merge of the tree-ssa.