This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gcc 4.3 generates less efficient code than gcc 4.1 or 4.2


On one of my programs (that has many branches in the internal loop),
I've found that gcc 4.3.1 generates less efficient code than gcc 4.1.2.
Now, I'm not sure I select the right optimization options.

For instance, here are various timings I got on various x86_64 machines.
Is there something else I should test? Could this be regarded as a bug
in gcc 4.3 (though the code is correct, the timing is unexpected)?

In the tables below, pgen=0 means without profile generation, and
pgen=8 means a first compilation with -fprofile-generate, a test on
a subset, a second compilation with -fprofile-use, and the timing
on the obtained binary.

AMD Opteron, 2.3 GHz
--------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       373.7           374.8           329.7
gcc 3.4.6       330.9 / 280.4   323.9 / 278.8   281.0 / 318.0 (!!!)
gcc 4.1.2       283.9 / 214.6   237.4 / 197.1   237.2 / 197.3
gcc 4.3.1       327.4 / 238.5   232.6 / 210.4   236.3 / 210.2

Core2 Q6600, 2.40 GHz
---------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       262.3           269.7           267.7
gcc 3.4.6       254.9 / 260.0   266.8 / 263.6   263.4 / 266.7 (!!!)
gcc 4.1.2       240.1 / 248.6   255.7 / 238.5   255.6 / 238.7
gcc 4.3.1       270.5 / 251.8   263.2 / 242.2   263.3 / 242.3

Core2 Q9450, 2.66 GHz
---------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       227.1           233.6           231.9
gcc 3.4.6       220.9 / 224.4   228.6 / 228.9   228.1 / 230.7 (!!!)
gcc 4.1.2       206.8 / 215.6   221.0 / 206.7   221.0 / 206.6
gcc 4.3.1       234.8 / 218.9   228.0 / 210.0   229.0 / 210.0

Pentium D, 3.0 GHz
------------------
CC / CFLAGS          -O1             -O2             -O3
                pgen=0 pgen=8   pgen=0 pgen=8   pgen=0 pgen=8
gcc 3.3.6       317.2           345.1           344.5
gcc 3.4.6       315.3 / 329.4   337.7 / 339.6   338.9 / 342.4
gcc 4.1.3       306.3 / 312.5   316.8 / 312.8   316.6 / 313.1
gcc 4.2.2       305.1 / 314.2   305.6 / 308.2   305.1 / 308.2
gcc 4.2.4       305.5 / 314.0   305.2 / 307.6   305.2 / 308.1
gcc 4.3.1       318.3 / 311.1   313.9 / 309.2   315.4 / 309.2

Note: each test has run 3 times and I kept the median value.
The timing accuracy is about 1 second.

Since this is code meant to run for millions of hours, the efficiency
is really important.

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]