If i try to compile the source below with -O2, gcc runs for over 45 minutes and grows to over 300 MB. On my machine, it exhausts the available swap space and dies before completing. Without optimization, it completes in a couple minutes. gcc 2.95.2 dies immediately with an ICE on this source, regardless of whether or not optimization is on, so i guess this isn't a regression. Release: 3.0 20010429 (prerelease) Environment: System: Linux karma 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown Architecture: i686 host: i686-pc-linux-gnu build: i686-pc-linux-gnu target: i686-pc-linux-gnu configured with: ../egcs/configure --prefix=/usr/local/egcs --enable-threads=posix --enable-long-long
State-Changed-From-To: open->feedback State-Changed-Why: No test case.
Responsible-Changed-From-To: unassigned->rth Responsible-Changed-Why: .
State-Changed-From-To: feedback->open State-Changed-Why: Got test case.
From: Richard Henderson <rth@redhat.com> To: Scott Snyder <snyder@fnal.gov> Cc: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-gnats@gcc.gnu.org Subject: Re: optimization/2692: excessive compile time with optimization Date: Tue, 2 Apr 2002 01:49:27 -0800 FWIW, 3.1 20020326 "only" took 12 minutes to compile this test case, and "only" used 45MB. CPU users were expand : 438.54 (63%) usr 0.11 ( 4%) sys 438.62 (62%) wall reload CSE regs : 93.51 (14%) usr 0.01 ( 0%) sys 93.50 (13%) wall global alloc : 77.31 (11%) usr 0.05 ( 2%) sys 77.38 (11%) wall regmove : 46.59 ( 7%) usr 0.01 ( 0%) sys 46.56 ( 7%) wall r~
From: Scott Snyder <snyder@fnal.gov> To: rth@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, nobody@gcc.gnu.org, snyder@fnal.gov, gcc-gnats@gcc.gnu.org Cc: Subject: Re: optimization/2692: excessive compile time with optimization Date: 02 Apr 2002 03:11:25 -0600 From: scott snyder <snyder@fnal.gov> Subject: optimization/2692: excessive compile time with optimization To: gcc-bugs@gcc.gnu.org Date: Fri, 18 May 2001 15:34:07 -0500 hi - I just noticed that somehow the code sample for this report didn't make it into gnats. I'm not sure what happened --- it's in my local copy of the report that i saved before sending it. Anyway, here's the complete version, including the code. sss SEND-PR: -*- send-pr -*- SEND-PR: Lines starting with `SEND-PR' will be removed automatically, as SEND-PR: will all comments (text enclosed in `<' and `>'). SEND-PR: SEND-PR: Please consult the GCC manual if you are not sure how to SEND-PR: fill out a problem report. SEND-PR: Note that the Synopsis field is mandatory. The Subject (for SEND-PR: the mail) will be made the same as Synopsis unless explicitly SEND-PR: changed. SEND-PR: SEND-PR: Choose from the following categories: SEND-PR: SEND-PR: c++ c debug fortran java libf2c libgcj libobjc libstdc++ middle-end objc optimization other preprocessor target web SEND-PR: SEND-PR: To: gcc-gnats@gcc.gnu.org Subject: excessive compile time with optimization From: snyder@fnal.gov Reply-To: snyder@fnal.gov Cc: X-send-pr-version: 3.113 X-GNATS-Notify: >Submitter-Id: net >Originator: scott snyder >Organization: <organization of PR author (multiple lines)> >Confidential: no SEND-PR: Leave "Confidential" as "no"; all GCC PRs are public. >Synopsis: excessive compile time with optimization >Severity: serious SEND-PR: critical GCC is completely not operational; no work-around known. SEND-PR: serious GCC is not working properly; a work-around is possible. SEND-PR: non-critical Report indicates minor problem. >Priority: low SEND-PR: high A solution is necessary as soon as possible. SEND-PR: medium The problem should be solved in the next release. SEND-PR: low The problem should be solve in a future release. >Category: optimization >Class: sw-bug SEND-PR: doc-bug The doumentation is incorrect. SEND-PR: accepts-illegal GCC fails to reject erroneous code. SEND-PR: rejects-legal GCC gives an error message for correct code. SEND-PR: wrong-code The machine code generated by gcc is incorrect. SEND-PR: ice-on-legal-code GCC gives an Internal Compiler Error (ICE) SEND-PR: for correct code SEND-PR: ice-on-illegal-code GCC gives an ICE instead of reporting an error SEND-PR: pessimizes-code GCC misses an important optimization opportunity SEND-PR: sw-bug Software bug of some other class than above SEND-PR: change-request A feature in GCC is missing. SEND-PR: support I need help with gcc. >Release: 3.0 20010429 (prerelease) >Environment: System: Linux karma 2.2.16-22 #1 Tue Aug 22 16:49:06 EDT 2000 i686 unknown Architecture: i686 <machine, os, target, libraries (multiple lines)> host: i686-pc-linux-gnu build: i686-pc-linux-gnu target: i686-pc-linux-gnu configured with: ../egcs/configure --prefix=/usr/local/egcs --enable-threads=posix --enable-long-long >Description: If i try to compile the source below with -O2, gcc runs for over 45 minutes and grows to over 300 MB. On my machine, it exhausts the available swap space and dies before completing. Without optimization, it completes in a couple minutes. gcc 2.95.2 dies immediately with an ICE on this source, regardless of whether or not optimization is on, so i guess this isn't a regression. >How-To-Repeat: namespace std { class dcomplex { public: typedef double value_type; dcomplex(double =0.0, double =0.0); double real() const; double imag() const; dcomplex& operator=(double); dcomplex& operator=(const dcomplex&); dcomplex& operator+=(const dcomplex&); dcomplex& operator-=(const dcomplex&); dcomplex& operator*=(const dcomplex&); private: typedef __complex__ double _ComplexT; _ComplexT _M_value; dcomplex(_ComplexT __z) : _M_value(__z) { } }; inline dcomplex operator*(const dcomplex& __x, const dcomplex& __y) { return dcomplex (__x) *= __y; } inline dcomplex operator-(const dcomplex& __x, const dcomplex& __y) { return dcomplex (__x) -= __y; } inline dcomplex operator+(const dcomplex& __x, const dcomplex& __y) { return dcomplex (__x) += __y; } inline double dcomplex::real() const { return __real__ _M_value; } inline double dcomplex::imag() const { return __imag__ _M_value; } inline dcomplex::dcomplex(double __r, double __i) { __real__ _M_value = __r; __imag__ _M_value = __i; } inline dcomplex& dcomplex::operator=(double __d) { __real__ _M_value = __d; __imag__ _M_value = 0.0; return *this; } inline dcomplex& dcomplex::operator=(const dcomplex& __z) { __real__ _M_value = __z.real(); __imag__ _M_value = __z.imag(); return *this; } inline dcomplex& dcomplex::operator+=(const dcomplex& __z) { __real__ _M_value += __z.real(); __imag__ _M_value += __z.imag(); return *this; } inline dcomplex& dcomplex::operator-=(const dcomplex& __z) { __real__ _M_value -= __z.real(); __imag__ _M_value -= __z.imag(); return *this; } inline dcomplex& dcomplex::operator*=(const dcomplex& __z) { _ComplexT __t; __real__ __t = __z.real(); __imag__ __t = __z.imag(); _M_value *= __t; return *this; } } // namespace std typedef std::dcomplex Complex8; Complex8 determinant(Complex8 _m[6][6]) { Complex8 ret ( 0.0, 0.0 ); Complex8 ret5 ( 0.0, 0.0 ); Complex8 ret4 ( 0.0, 0.0 ); Complex8 ret3 ( 0.0, 0.0 ); ret3 = ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret5 += _m[4][4] * ret4; ret4 = 0.0; ret3 = ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret5 += _m[4][0] * ret4; ret4 = 0.0; ret3 = ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3]; ret4 += _m[3][4] * ret3; ret5 += _m[4][1] * ret4; ret4 = 0.0; ret3 = ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0]; ret4 += _m[3][3] * ret3; ret5 += _m[4][2] * ret4; ret4 = 0.0; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret5 += _m[4][3] * ret4; ret4 = 0.0; ret += _m[5][5] * ret5; ret5 = 0.0; ret3 = ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][5] * ret3; ret5 += _m[4][3] * ret4; ret4 = 0.0; ret3 = ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret5 += _m[4][4] * ret4; ret4 = 0.0; ret3 = ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][2] * ret3; ret5 += _m[4][5] * ret4; ret4 = 0.0; ret3 = ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4]; ret4 += _m[3][5] * ret3; ret5 += _m[4][1] * ret4; ret4 = 0.0; ret3 = ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret5 += _m[4][2] * ret4; ret4 = 0.0; ret += _m[5][0] * ret5; ret5 = 0.0; ret3 = ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4]; ret4 += _m[3][3] * ret3; ret5 += _m[4][2] * ret4; ret4 = 0.0; ret3 = ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5]; ret4 += _m[3][0] * ret3; ret5 += _m[4][3] * ret4; ret4 = 0.0; ret3 = ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2]; ret4 += _m[3][5] * ret3; ret5 += _m[4][4] * ret4; ret4 = 0.0; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][2] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][3]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][3] * ret3; ret5 += _m[4][5] * ret4; ret4 = 0.0; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][2] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5]; ret4 += _m[3][2] * ret3; ret5 += _m[4][0] * ret4; ret4 = 0.0; ret += _m[5][1] * ret5; ret5 = 0.0; ret3 = ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][3] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][4]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][4] + ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][1] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret5 += _m[4][0] * ret4; ret4 = 0.0; ret3 = ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][3] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][0]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][3] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][4]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][4] * ret3; ret5 += _m[4][1] * ret4; ret4 = 0.0; ret3 = ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0]; ret4 += _m[3][1] * ret3; ret5 += _m[4][3] * ret4; ret4 = 0.0; ret3 = ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][0] * ret3; ret5 += _m[4][4] * ret4; ret4 = 0.0; ret3 = ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][4] + ( _m[0][3] * _m[1][4] - _m[0][4] * _m[1][3] ) * _m[2][1]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][4] * _m[1][3] - _m[0][3] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][3] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][4]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0]; ret4 += _m[3][4] * ret3; ret5 += _m[4][5] * ret4; ret4 = 0.0; ret += _m[5][2] * ret5; ret5 = 0.0; ret3 = ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][4] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][1]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][2] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][4]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][4] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][0] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][1]; ret4 += _m[3][2] * ret3; ret5 += _m[4][5] * ret4; ret4 = 0.0; ret3 = ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][4] + ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][5]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][1]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][2] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][4]; ret4 += _m[3][1] * ret3; ret5 += _m[4][0] * ret4; ret4 = 0.0; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][4] + ( _m[0][2] * _m[1][4] - _m[0][4] * _m[1][2] ) * _m[2][5] + ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][2]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][4] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][5]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2]; ret4 += _m[3][4] * ret3; ret3 = ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][4] + ( _m[0][4] * _m[1][2] - _m[0][2] * _m[1][4] ) * _m[2][0]; ret4 += _m[3][5] * ret3; ret5 += _m[4][1] * ret4; ret4 = 0.0; ret3 = ( _m[0][1] * _m[1][4] - _m[0][4] * _m[1][1] ) * _m[2][0] + ( _m[0][4] * _m[1][0] - _m[0][0] * _m[1][4] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][4]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][4] + ( _m[0][4] * _m[1][1] - _m[0][1] * _m[1][4] ) * _m[2][5] + ( _m[0][5] * _m[1][4] - _m[0][4] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][4] * _m[1][5] - _m[0][5] * _m[1][4] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][4] + ( _m[0][0] * _m[1][4] - _m[0][4] * _m[1][0] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1]; ret4 += _m[3][4] * ret3; ret5 += _m[4][2] * ret4; ret4 = 0.0; ret3 = ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5]; ret4 += _m[3][1] * ret3; ret5 += _m[4][4] * ret4; ret4 = 0.0; ret += _m[5][3] * ret5; ret5 = 0.0; ret3 = ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][5] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][5] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][0] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][2]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][5] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][1]; ret4 += _m[3][0] * ret3; ret5 += _m[4][3] * ret4; ret4 = 0.0; ret3 = ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][1] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][0] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][0]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][0] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][2]; ret4 += _m[3][3] * ret3; ret5 += _m[4][5] * ret4; ret4 = 0.0; ret3 = ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][2] + ( _m[0][1] * _m[1][2] - _m[0][2] * _m[1][1] ) * _m[2][5] + ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][1]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][1] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][2] + ( _m[0][2] * _m[1][1] - _m[0][1] * _m[1][2] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][3] + ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][5] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][2]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][1] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][3] + ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][5]; ret4 += _m[3][2] * ret3; ret5 += _m[4][0] * ret4; ret4 = 0.0; ret3 = ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][5] + ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][0] + ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][3]; ret4 += _m[3][2] * ret3; ret3 = ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][2] + ( _m[0][2] * _m[1][0] - _m[0][0] * _m[1][2] ) * _m[2][5] + ( _m[0][5] * _m[1][2] - _m[0][2] * _m[1][5] ) * _m[2][0]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][2] * _m[1][3] - _m[0][3] * _m[1][2] ) * _m[2][0] + ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][2] + ( _m[0][0] * _m[1][2] - _m[0][2] * _m[1][0] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret3 = ( _m[0][2] * _m[1][5] - _m[0][5] * _m[1][2] ) * _m[2][3] + ( _m[0][3] * _m[1][2] - _m[0][2] * _m[1][3] ) * _m[2][5] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][2]; ret4 += _m[3][0] * ret3; ret5 += _m[4][1] * ret4; ret4 = 0.0; ret3 = ( _m[0][3] * _m[1][5] - _m[0][5] * _m[1][3] ) * _m[2][1] + ( _m[0][5] * _m[1][1] - _m[0][1] * _m[1][5] ) * _m[2][3] + ( _m[0][1] * _m[1][3] - _m[0][3] * _m[1][1] ) * _m[2][5]; ret4 += _m[3][0] * ret3; ret3 = ( _m[0][3] * _m[1][0] - _m[0][0] * _m[1][3] ) * _m[2][5] + ( _m[0][5] * _m[1][3] - _m[0][3] * _m[1][5] ) * _m[2][0] + ( _m[0][0] * _m[1][5] - _m[0][5] * _m[1][0] ) * _m[2][3]; ret4 += _m[3][1] * ret3; ret3 = ( _m[0][5] * _m[1][0] - _m[0][0] * _m[1][5] ) * _m[2][1] + ( _m[0][0] * _m[1][1] - _m[0][1] * _m[1][0] ) * _m[2][5] + ( _m[0][1] * _m[1][5] - _m[0][5] * _m[1][1] ) * _m[2][0]; ret4 += _m[3][3] * ret3; ret3 = ( _m[0][3] * _m[1][1] - _m[0][1] * _m[1][3] ) * _m[2][0] + ( _m[0][0] * _m[1][3] - _m[0][3] * _m[1][0] ) * _m[2][1] + ( _m[0][1] * _m[1][0] - _m[0][0] * _m[1][1] ) * _m[2][3]; ret4 += _m[3][5] * ret3; ret5 += _m[4][2] * ret4; ret += _m[5][4] * ret5; return ret; } >Fix: <how to correct or work around the problem, if known (multiple lines)>
State-Changed-From-To: open->analyzed State-Changed-Why: http://gcc.gnu.org/ml/gcc-patches/2002-04/msg00626.html Doesn't actually work, but highlights the problem.
This is still slow, but not as bad as it used to be. Here are time reports from what I get on an Athlon XP2000 with 256MB RAM, for "g++-3.4 (GCC) 3.4 20030718 (experimental)": $ g++-3.4 -c -ftime-report 2692.cc Execution times (seconds) cfg construction : 0.02 ( 2%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall trivially dead code : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall life analysis : 0.10 ( 9%) usr 0.00 ( 0%) sys 0.10 ( 8%) wall life info update : 0.03 ( 3%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall register scan : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall parser : 0.12 (11%) usr 0.01 (17%) sys 0.14 (12%) wall name lookup : 0.01 ( 1%) usr 0.02 (33%) sys 0.04 ( 3%) wall expand : 0.09 ( 8%) usr 0.01 (17%) sys 0.10 ( 8%) wall integration : 0.03 ( 3%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall flow analysis : 0.02 ( 2%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall local alloc : 0.11 (10%) usr 0.00 ( 0%) sys 0.11 ( 9%) wall global alloc : 0.34 (31%) usr 0.00 ( 0%) sys 0.34 (29%) wall flow 2 : 0.03 ( 3%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall shorten branches : 0.04 ( 4%) usr 0.00 ( 0%) sys 0.04 ( 3%) wall reg stack : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall final : 0.06 ( 5%) usr 0.01 (17%) sys 0.07 ( 6%) wall rest of compilation : 0.08 ( 7%) usr 0.01 (17%) sys 0.09 ( 8%) wall TOTAL : 1.11 0.06 1.19 $ g++-3.4 -c -O -ftime-report 2692.cc Execution times (seconds) garbage collection : 0.17 ( 0%) usr 0.01 ( 3%) sys 0.68 ( 0%) wall cfg construction : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall trivially dead code : 0.13 ( 0%) usr 0.01 ( 3%) sys 0.14 ( 0%) wall life analysis : 99.39 (67%) usr 0.06 (17%) sys 105.98 (66%) wall life info update : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall alias analysis : 0.21 ( 0%) usr 0.01 ( 3%) sys 0.24 ( 0%) wall register scan : 0.08 ( 0%) usr 0.01 ( 3%) sys 0.09 ( 0%) wall rebuild jump labels : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall preprocessing : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall parser : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall name lookup : 0.00 ( 0%) usr 0.03 ( 8%) sys 0.03 ( 0%) wall expand : 0.52 ( 0%) usr 0.05 (14%) sys 0.60 ( 0%) wall varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall integration : 0.47 ( 0%) usr 0.01 ( 3%) sys 0.48 ( 0%) wall jump : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall CSE : 2.65 ( 2%) usr 0.03 ( 8%) sys 3.05 ( 2%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall branch prediction : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 6.36 ( 4%) usr 0.00 ( 0%) sys 6.79 ( 4%) wall if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall local alloc : 0.33 ( 0%) usr 0.01 ( 3%) sys 0.48 ( 0%) wall global alloc : 34.50 (23%) usr 0.11 (31%) sys 39.30 (24%) wall reload CSE regs : 1.60 ( 1%) usr 0.00 ( 0%) sys 1.81 ( 1%) wall flow 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall rename registers : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall shorten branches : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall reg stack : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall final : 0.09 ( 0%) usr 0.01 ( 3%) sys 0.13 ( 0%) wall rest of compilation : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall TOTAL : 147.62 0.36 161.10 So the expand hog is gone :-) It's not a surprise that, for the test case for this PR, global alloc and life analysis take so much time. It would obviously be nice to have it faster, but it is not the awful compile time hog anymore. Richard, I have not reconfirmed this PR because I am not sure what's reasonable here. Do you think this report can be closed, or do you think these timings still are unacceptable? Gr.
Compiler: GNU C++ version 3.4 20030725 (experimental) (i686-pc-linux-gnu) compiled by GNU C version 3.4 20030725 (experimental). GGC heuristics: --param ggc-min-expand=47 --param ggc-min-heapsize=31916 Flags: -O -quiet File: z.cc Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 10.73 40.70 40.70 19930 2.04 3.22 find_equiv_reg 9.68 77.41 36.71 193175759 0.00 0.00 find_base_term 9.53 113.58 36.17 178499488 0.00 0.00 refers_to_regno_p 8.26 144.90 31.32 446734067 0.00 0.00 canon_rtx 7.49 173.30 28.40 467218101 0.00 0.00 rtx_equal_p 7.20 200.62 27.32 176913679 0.00 0.00 read_dependence 6.26 224.38 23.75 73755132 0.00 0.00 addr_side_effect_eval 5.96 246.99 22.61 295299380 0.00 0.00 true_regnum 4.69 264.79 17.80 175470757 0.00 0.00 canon_true_dependence 4.38 281.39 16.61 578096780 0.00 0.00 ix86_find_base_term 4.36 297.94 16.54 178512386 0.00 0.00 reg_overlap_mentioned_p 4.11 313.51 15.58 578101529 0.00 0.00 i386_output_dwarf_dtprel 3.67 327.43 13.91 18646114 0.00 0.00 regno_clobbered_at_setjmp 2.38 336.45 9.02 176829268 0.00 0.00 main 1.29 341.36 4.91 175470757 0.00 0.00 anti_dependence 1.07 345.41 4.05 3014519 0.00 0.00 propagate_block (all others <1%) So find_equiv_reg is a bottleneck for this code.
Bug 10776 may be related to this one.
On powerpc-apple-darwin6.6, the combiner is where most of the work is done: Execution times (seconds) garbage collection : 1.12 ( 1%) usr 0.00 ( 0%) sys 2.16 ( 2%) wall cfg construction : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.57 ( 0%) wall cfg cleanup : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall trivially dead code : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall life analysis : 9.04 ( 9%) usr 0.00 ( 0%) sys 10.26 ( 8%) wall life info update : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall alias analysis : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall register scan : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall rebuild jump labels : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall preprocessing : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall parser : 0.54 ( 1%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall name lookup : 0.66 ( 1%) usr 0.00 ( 0%) sys 0.73 ( 1%) wall expand : 1.28 ( 1%) usr 0.00 ( 0%) sys 2.42 ( 2%) wall integration : 1.10 ( 1%) usr 0.00 ( 0%) sys 3.75 ( 3%) wall jump : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall CSE : 5.04 ( 5%) usr 0.00 ( 0%) sys 6.21 ( 5%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall branch prediction : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall flow analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall combiner : 63.88 (66%) usr 0.00 ( 0%) sys 82.54 (65%) wall if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall local alloc : 1.04 ( 1%) usr 0.00 ( 0%) sys 1.96 ( 2%) wall global alloc : 5.34 ( 6%) usr 0.00 ( 0%) sys 6.33 ( 5%) wall reload CSE regs : 3.57 ( 4%) usr 0.00 ( 0%) sys 4.07 ( 3%) wall flow 2 : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall rename registers : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall shorten branches : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall final : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall rest of compilation : 0.71 ( 1%) usr 0.00 ( 0%) sys 1.76 ( 1%) wall TOTAL : 96.15 0.00 126.51
On the mainline (20031030), this code with -O3, gcc ICEs on powerpc-apple-darwin in the webizer pass.
It is cool that -O3 and -O2 are faster than -O1 (unit-at-a-time causes this) The time has migrated to rename (for -O3 at least) registers on the mainline: -O3 -fno-web: Execution times (seconds) garbage collection : 1.78 ( 2%) usr 0.01 ( 0%) sys 2.76 ( 2%) wall callgraph construction: 0.09 ( 0%) usr 0.02 ( 1%) sys 0.34 ( 0%) wall cfg construction : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall trivially dead code : 0.34 ( 0%) usr 0.02 ( 1%) sys 0.70 ( 1%) wall life analysis : 2.81 ( 3%) usr 0.00 ( 0%) sys 3.02 ( 3%) wall life info update : 1.07 ( 1%) usr 0.02 ( 1%) sys 1.30 ( 1%) wall alias analysis : 0.45 ( 0%) usr 0.05 ( 2%) sys 0.71 ( 1%) wall register scan : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall rebuild jump labels : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall preprocessing : 0.05 ( 0%) usr 0.04 ( 2%) sys 0.15 ( 0%) wall parser : 0.65 ( 1%) usr 0.20 ( 9%) sys 1.54 ( 1%) wall name lookup : 0.16 ( 0%) usr 0.44 (19%) sys 0.65 ( 1%) wall expand : 0.91 ( 1%) usr 0.12 ( 5%) sys 3.07 ( 3%) wall integration : 1.11 ( 1%) usr 0.09 ( 4%) sys 1.55 ( 1%) wall jump : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall CSE : 3.09 ( 3%) usr 0.07 ( 3%) sys 3.85 ( 3%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall CSE 2 : 0.92 ( 1%) usr 0.02 ( 1%) sys 1.02 ( 1%) wall branch prediction : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 15.43 (16%) usr 0.13 ( 6%) sys 18.74 (16%) wall if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall regmove : 0.40 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall scheduling : 8.13 ( 8%) usr 0.46 (20%) sys 9.42 ( 8%) wall local alloc : 1.21 ( 1%) usr 0.02 ( 1%) sys 1.30 ( 1%) wall global alloc : 2.52 ( 3%) usr 0.08 ( 3%) sys 2.72 ( 2%) wall reload CSE regs : 0.94 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall flow 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall rename registers : 51.52 (53%) usr 0.07 ( 3%) sys 56.65 (48%) wall scheduling 2 : 1.63 ( 2%) usr 0.39 (17%) sys 4.27 ( 4%) wall shorten branches : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall final : 0.19 ( 0%) usr 0.02 ( 1%) sys 0.22 ( 0%) wall rest of compilation : 0.45 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%) wall TOTAL : 96.78 2.33 116.99 -O0: Execution times (seconds) garbage collection : 0.13 ( 3%) usr 0.00 ( 0%) sys 0.14 ( 2%) wall cfg construction : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall trivially dead code : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.07 ( 1%) wall life analysis : 0.41 (10%) usr 0.00 ( 0%) sys 0.50 ( 8%) wall life info update : 0.21 ( 5%) usr 0.00 ( 0%) sys 0.23 ( 4%) wall register scan : 0.05 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 1%) wall rebuild jump labels : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall preprocessing : 0.04 ( 1%) usr 0.07 ( 9%) sys 0.11 ( 2%) wall parser : 0.68 (16%) usr 0.18 (22%) sys 0.78 (12%) wall name lookup : 0.19 ( 5%) usr 0.45 (56%) sys 0.78 (12%) wall expand : 0.22 ( 5%) usr 0.00 ( 0%) sys 0.26 ( 4%) wall integration : 0.08 ( 2%) usr 0.00 ( 0%) sys 0.09 ( 1%) wall jump : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall flow analysis : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall local alloc : 0.73 (18%) usr 0.02 ( 3%) sys 1.21 (19%) wall global alloc : 0.76 (18%) usr 0.02 ( 3%) sys 1.11 (18%) wall flow 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall shorten branches : 0.07 ( 2%) usr 0.00 ( 0%) sys 0.07 ( 1%) wall final : 0.17 ( 4%) usr 0.01 ( 1%) sys 0.35 ( 6%) wall rest of compilation : 0.24 ( 6%) usr 0.00 ( 0%) sys 0.35 ( 6%) wall TOTAL : 4.15 0.80 6.25 -O1: Execution times (seconds) garbage collection : 1.23 ( 1%) usr 0.00 ( 0%) sys 1.28 ( 1%) wall cfg construction : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall cfg cleanup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall trivially dead code : 0.32 ( 0%) usr 0.01 ( 1%) sys 0.30 ( 0%) wall life analysis : 3.45 ( 3%) usr 0.02 ( 1%) sys 4.09 ( 4%) wall life info update : 1.02 ( 1%) usr 0.00 ( 0%) sys 1.05 ( 1%) wall alias analysis : 0.41 ( 0%) usr 0.04 ( 3%) sys 0.47 ( 0%) wall register scan : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall rebuild jump labels : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall preprocessing : 0.05 ( 0%) usr 0.07 ( 5%) sys 0.16 ( 0%) wall parser : 0.61 ( 1%) usr 0.18 (13%) sys 0.75 ( 1%) wall name lookup : 0.37 ( 0%) usr 0.34 (25%) sys 0.76 ( 1%) wall expand : 0.66 ( 1%) usr 0.08 ( 6%) sys 0.74 ( 1%) wall integration : 1.02 ( 1%) usr 0.02 ( 1%) sys 1.07 ( 1%) wall jump : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.03 ( 0%) wall CSE : 3.27 ( 3%) usr 0.03 ( 2%) sys 3.39 ( 3%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall branch prediction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall combiner : 81.10 (79%) usr 0.20 (15%) sys 83.87 (77%) wall if-conversion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall local alloc : 1.62 ( 2%) usr 0.05 ( 4%) sys 1.67 ( 2%) wall global alloc : 3.68 ( 4%) usr 0.23 (17%) sys 4.02 ( 4%) wall reload CSE regs : 1.37 ( 1%) usr 0.01 ( 1%) sys 1.94 ( 2%) wall flow 2 : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall rename registers : 0.47 ( 0%) usr 0.00 ( 0%) sys 0.60 ( 1%) wall shorten branches : 0.17 ( 0%) usr 0.02 ( 1%) sys 0.23 ( 0%) wall final : 0.43 ( 0%) usr 0.01 ( 1%) sys 0.61 ( 1%) wall rest of compilation : 0.75 ( 1%) usr 0.02 ( 1%) sys 0.80 ( 1%) wall TOTAL : 102.71 1.37 108.74 -O2: Execution times (seconds) garbage collection : 1.69 ( 4%) usr 0.00 ( 0%) sys 2.63 ( 4%) wall callgraph construction: 0.08 ( 0%) usr 0.02 ( 1%) sys 0.11 ( 0%) wall callgraph optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall cfg construction : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall trivially dead code : 0.34 ( 1%) usr 0.01 ( 0%) sys 0.38 ( 1%) wall life analysis : 2.88 ( 6%) usr 0.03 ( 1%) sys 3.48 ( 6%) wall life info update : 0.97 ( 2%) usr 0.00 ( 0%) sys 1.09 ( 2%) wall alias analysis : 0.44 ( 1%) usr 0.07 ( 3%) sys 0.75 ( 1%) wall register scan : 0.24 ( 1%) usr 0.01 ( 0%) sys 0.25 ( 0%) wall rebuild jump labels : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall preprocessing : 0.07 ( 0%) usr 0.08 ( 4%) sys 0.13 ( 0%) wall parser : 0.55 ( 1%) usr 0.14 ( 6%) sys 0.82 ( 1%) wall name lookup : 0.27 ( 1%) usr 0.43 (19%) sys 0.63 ( 1%) wall expand : 0.73 ( 2%) usr 0.05 ( 2%) sys 0.81 ( 1%) wall integration : 1.04 ( 2%) usr 0.14 ( 6%) sys 1.24 ( 2%) wall jump : 0.08 ( 0%) usr 0.02 ( 1%) sys 0.08 ( 0%) wall CSE : 3.13 ( 7%) usr 0.06 ( 3%) sys 4.16 ( 7%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall CSE 2 : 0.89 ( 2%) usr 0.02 ( 1%) sys 0.99 ( 2%) wall branch prediction : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 15.77 (35%) usr 0.11 ( 5%) sys 18.25 (31%) wall if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall regmove : 0.40 ( 1%) usr 0.00 ( 0%) sys 0.48 ( 1%) wall scheduling : 8.15 (18%) usr 0.47 (21%) sys 9.75 (16%) wall local alloc : 1.28 ( 3%) usr 0.02 ( 1%) sys 1.44 ( 2%) wall global alloc : 2.46 ( 5%) usr 0.10 ( 4%) sys 3.05 ( 5%) wall reload CSE regs : 0.96 ( 2%) usr 0.02 ( 1%) sys 1.17 ( 2%) wall flow 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall rename registers : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.55 ( 1%) wall scheduling 2 : 1.32 ( 3%) usr 0.41 (18%) sys 5.31 ( 9%) wall shorten branches : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall final : 0.22 ( 0%) usr 0.02 ( 1%) sys 0.89 ( 1%) wall rest of compilation : 0.46 ( 1%) usr 0.00 ( 0%) sys 0.52 ( 1%) wall TOTAL : 45.16 2.24 59.57 -O1 -funit-at-a-time: Execution times (seconds) garbage collection : 1.18 ( 4%) usr 0.00 ( 0%) sys 1.22 ( 4%) wall callgraph construction: 0.10 ( 0%) usr 0.01 ( 1%) sys 0.10 ( 0%) wall cfg construction : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall trivially dead code : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.21 ( 1%) wall life analysis : 2.70 ( 9%) usr 0.02 ( 2%) sys 2.75 ( 8%) wall life info update : 0.54 ( 2%) usr 0.00 ( 0%) sys 0.56 ( 2%) wall alias analysis : 0.27 ( 1%) usr 0.01 ( 1%) sys 0.31 ( 1%) wall register scan : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall rebuild jump labels : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall preprocessing : 0.01 ( 0%) usr 0.12 (11%) sys 0.12 ( 0%) wall parser : 0.51 ( 2%) usr 0.15 (14%) sys 0.81 ( 2%) wall name lookup : 0.34 ( 1%) usr 0.38 (35%) sys 0.70 ( 2%) wall expand : 0.71 ( 2%) usr 0.07 ( 6%) sys 0.78 ( 2%) wall integration : 1.11 ( 4%) usr 0.08 ( 7%) sys 1.21 ( 4%) wall jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall CSE : 2.00 ( 6%) usr 0.05 ( 5%) sys 2.09 ( 6%) wall loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall branch prediction : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 17.04 (54%) usr 0.07 ( 6%) sys 17.57 (53%) wall local alloc : 0.75 ( 2%) usr 0.03 ( 3%) sys 0.80 ( 2%) wall global alloc : 1.80 ( 6%) usr 0.04 ( 4%) sys 1.89 ( 6%) wall reload CSE regs : 0.47 ( 2%) usr 0.00 ( 0%) sys 0.49 ( 1%) wall flow 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall rename registers : 0.23 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%) wall shorten branches : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall final : 0.18 ( 1%) usr 0.02 ( 2%) sys 0.21 ( 1%) wall rest of compilation : 0.47 ( 2%) usr 0.01 ( 1%) sys 0.47 ( 1%) wall TOTAL : 31.27 1.09 33.21 -O0 -funit-at-a-time: Execution times (seconds) garbage collection : 0.14 ( 3%) usr 0.00 ( 0%) sys 0.14 ( 2%) wall callgraph construction: 0.10 ( 2%) usr 0.00 ( 0%) sys 0.11 ( 2%) wall cfg construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall trivially dead code : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall life analysis : 0.42 (10%) usr 0.01 ( 1%) sys 1.09 (15%) wall life info update : 0.21 ( 5%) usr 0.00 ( 0%) sys 0.23 ( 3%) wall register scan : 0.05 ( 1%) usr 0.01 ( 1%) sys 0.04 ( 1%) wall rebuild jump labels : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall preprocessing : 0.07 ( 2%) usr 0.06 ( 9%) sys 0.09 ( 1%) wall parser : 0.64 (15%) usr 0.15 (22%) sys 0.80 (11%) wall name lookup : 0.25 ( 6%) usr 0.36 (54%) sys 0.71 (10%) wall expand : 0.22 ( 5%) usr 0.01 ( 1%) sys 0.24 ( 3%) wall integration : 0.07 ( 2%) usr 0.01 ( 1%) sys 0.07 ( 1%) wall jump : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall local alloc : 0.74 (17%) usr 0.01 ( 1%) sys 0.80 (11%) wall global alloc : 0.74 (17%) usr 0.02 ( 3%) sys 1.25 (17%) wall flow 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall shorten branches : 0.07 ( 2%) usr 0.00 ( 0%) sys 0.08 ( 1%) wall final : 0.20 ( 5%) usr 0.00 ( 0%) sys 1.01 (14%) wall rest of compilation : 0.29 ( 7%) usr 0.00 ( 0%) sys 0.31 ( 4%) wall TOTAL : 4.37 0.67 7.16
Current state on mainline, at least for x86 at -O2, is that we spend lots of time in flow doing dead store elimination, life analysis : 60.94 (76%) usr 0.00 ( 0%) sys 61.02 (75%) wall TOTAL : 80.58 0.39 81.01 If I tweek flow.c to not do *any* store elimination at all, I can pull the total down to ~75 seconds. I don't see anything easy to do to even bridge the gap between these two times at this late stage of 3.4. On tree-ssa branch, we do significantly better. TOTAL : 18.08 0.51 18.57 This with the original C++ test case. If I crop the std::dcomplex parts and use the _Complex support in C, then I get TOTAL : 5.85 0.17 6.01 Clearly there's work to do yet in unraveling the abstraction, but either compilation time is acceptable, so I'm going to suspend this PR as fixed pending merge to mainline.
Created attachment 5509 [details] C version of test case For comparison purposes, a C version of the test case using _Complex. We *should* get the same code out of the C++ front end. That we don't is a missed optimization.
With my cast pass we cut the time in half of the current tree-ssa compiler. It also improves the code too.
The C vs C++ difference can probably be tracked in a new different (cleaner) PR.
I filed a bug which should help the code generation differences between C and C++, PR 15197.
Fixed for 3.5.0 by the merge of the tree-ssa.