This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6/4.7 Regression] 30% performance slowdown in floating-point code caused by r118475
- From: "lucier at math dot purdue.edu" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 2 Apr 2011 16:58:38 +0000
- Subject: [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6/4.7 Regression] 30% performance slowdown in floating-point code caused by r118475
- Auto-submitted: auto-generated
- References: <bug-33928-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #121 from lucier at math dot purdue.edu 2011-04-02 16:58:16 UTC ---
I'm inclined to close this as "Fixed" for 4.6.0.
I've taken the file mentioned in the previous comment and followed the
instructions in the readme. The times for a forward FFT of 2^{25} complex
doubles on a 2.4HGz Intel Core i5 on x86_64-apple-darwin10.7.0 are as follows:
With the usual compiler options of
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
4.5.2:
2433 ms cpu time (2427 user, 6 system)
4.6.0:
2158 ms cpu time (2154 user, 4 system)
Adding -fschedule-insns -march=native to the above:
4.5.2:
2067 ms cpu time (2060 user, 7 system)
4.6.0:
2016 ms cpu time (2012 user, 4 system)
The assembly for the main loop looks much better.