This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/54776] New: [4.8 Regression] tramp3d-v4: 20% performance regression using -O3
- From: "markus at trippelsdorf dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 02 Oct 2012 08:21:55 +0000
- Subject: [Bug tree-optimization/54776] New: [4.8 Regression] tramp3d-v4: 20% performance regression using -O3
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54776
Bug #: 54776
Summary: [4.8 Regression] tramp3d-v4: 20% performance
regression using -O3
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: markus@trippelsdorf.de
With gcc-4.8 (--enable-checking=release):
markus@x4 ~ % time c++ -w -O3 tramp3d-v4.cpp
c++ -w -O3 tramp3d-v4.cpp 24.87s user 0.34s system 99% cpu 25.293 total
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 7.35642
With gcc-4.7.2:
markus@x4 ~ % time c++ -w -O3 tramp3d-v4.cpp
c++ -w -O3 tramp3d-v4.cpp 25.15s user 0.33s system 99% cpu 25.568 total
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 5.81199
LTO doesn't help much (gcc-4.8):
markus@x4 ~ % time c++ -w -O3 -flto tramp3d-v4.cpp
c++ -w -O3 -flto tramp3d-v4.cpp 45.78s user 0.95s system 99% cpu 47.012 total
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 7.2111
(For comparison here are some clang results:
markus@x4 ~ % time clang++ -w -O3 tramp3d-v4.cpp
clang++ -w -O3 tramp3d-v4.cpp 14.67s user 0.12s system 99% cpu 14.874 total
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 6.1923
markus@x4 ~ % time clang++ -w -O3 -flto tramp3d-v4.cpp
clang++ -w -O3 -flto tramp3d-v4.cpp 20.28s user 0.16s system 99% cpu 20.535
total
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 4.47936
That's an almost 28% improvement due to -flto)