This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6/4.7 Regression] 30% performance slowdown in floating-point code caused by r118475

From: "lucier at math dot purdue.edu" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Sat, 2 Apr 2011 16:58:38 +0000
Subject: [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6/4.7 Regression] 30% performance slowdown in floating-point code caused by r118475
Auto-submitted: auto-generated
References: <bug-33928-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

--- Comment #121 from lucier at math dot purdue.edu 2011-04-02 16:58:16 UTC ---
I'm inclined to close this as "Fixed" for 4.6.0.

I've taken the file mentioned in the previous comment and followed the
instructions in the readme.  The times for a forward FFT of 2^{25} complex
doubles on a 2.4HGz Intel Core i5 on x86_64-apple-darwin10.7.0 are as follows:

With the usual compiler options of

-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp

4.5.2:

    2433 ms cpu time (2427 user, 6 system)

4.6.0:

    2158 ms cpu time (2154 user, 4 system)

Adding -fschedule-insns -march=native to the above:

4.5.2:

    2067 ms cpu time (2060 user, 7 system)

4.6.0:

    2016 ms cpu time (2012 user, 4 system)

The assembly for the main loop looks much better.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]