This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
gcc 4.3 generates less efficient code than gcc 4.1 or 4.2
- From: Vincent Lefevre <vincent+gcc at vinc17 dot org>
- To: gcc-help at gcc dot gnu dot org
- Date: Wed, 2 Jul 2008 16:29:32 +0200
- Subject: gcc 4.3 generates less efficient code than gcc 4.1 or 4.2
On one of my programs (that has many branches in the internal loop),
I've found that gcc 4.3.1 generates less efficient code than gcc 4.1.2.
Now, I'm not sure I select the right optimization options.
For instance, here are various timings I got on various x86_64 machines.
Is there something else I should test? Could this be regarded as a bug
in gcc 4.3 (though the code is correct, the timing is unexpected)?
In the tables below, pgen=0 means without profile generation, and
pgen=8 means a first compilation with -fprofile-generate, a test on
a subset, a second compilation with -fprofile-use, and the timing
on the obtained binary.
AMD Opteron, 2.3 GHz
--------------------
CC / CFLAGS -O1 -O2 -O3
pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8
gcc 3.3.6 373.7 374.8 329.7
gcc 3.4.6 330.9 / 280.4 323.9 / 278.8 281.0 / 318.0 (!!!)
gcc 4.1.2 283.9 / 214.6 237.4 / 197.1 237.2 / 197.3
gcc 4.3.1 327.4 / 238.5 232.6 / 210.4 236.3 / 210.2
Core2 Q6600, 2.40 GHz
---------------------
CC / CFLAGS -O1 -O2 -O3
pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8
gcc 3.3.6 262.3 269.7 267.7
gcc 3.4.6 254.9 / 260.0 266.8 / 263.6 263.4 / 266.7 (!!!)
gcc 4.1.2 240.1 / 248.6 255.7 / 238.5 255.6 / 238.7
gcc 4.3.1 270.5 / 251.8 263.2 / 242.2 263.3 / 242.3
Core2 Q9450, 2.66 GHz
---------------------
CC / CFLAGS -O1 -O2 -O3
pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8
gcc 3.3.6 227.1 233.6 231.9
gcc 3.4.6 220.9 / 224.4 228.6 / 228.9 228.1 / 230.7 (!!!)
gcc 4.1.2 206.8 / 215.6 221.0 / 206.7 221.0 / 206.6
gcc 4.3.1 234.8 / 218.9 228.0 / 210.0 229.0 / 210.0
Pentium D, 3.0 GHz
------------------
CC / CFLAGS -O1 -O2 -O3
pgen=0 pgen=8 pgen=0 pgen=8 pgen=0 pgen=8
gcc 3.3.6 317.2 345.1 344.5
gcc 3.4.6 315.3 / 329.4 337.7 / 339.6 338.9 / 342.4
gcc 4.1.3 306.3 / 312.5 316.8 / 312.8 316.6 / 313.1
gcc 4.2.2 305.1 / 314.2 305.6 / 308.2 305.1 / 308.2
gcc 4.2.4 305.5 / 314.0 305.2 / 307.6 305.2 / 308.1
gcc 4.3.1 318.3 / 311.1 313.9 / 309.2 315.4 / 309.2
Note: each test has run 3 times and I kept the median value.
The timing accuracy is about 1 second.
Since this is code meant to run for millions of hours, the efficiency
is really important.
--
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)