Bug 42376 - [4.5 Regression] Performance regression of generated code
Summary: [4.5 Regression] Performance regression of generated code
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-12-15 08:40 UTC by Martin Reinecke
Modified: 2010-01-07 16:13 UTC (History)
3 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work: 4.4.3
Known to fail: 4.5.0
Last reconfirmed:


Attachments
test case (707 bytes, text/plain)
2009-12-15 08:41 UTC, Martin Reinecke
Details
assembler generated by gcc 4.5 (1.51 KB, text/plain)
2009-12-15 08:42 UTC, Martin Reinecke
Details
assembler generated by gcc 4.4 (1.50 KB, text/plain)
2009-12-15 08:43 UTC, Martin Reinecke
Details
Proposed wwwdocs patch to explain the apparent performance regression (648 bytes, patch)
2010-01-07 15:16 UTC, Martin Reinecke
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Reinecke 2009-12-15 08:40:58 UTC
I have noticed a big performance decrease in one of my numerical codes
when switching from gcc 4.4 to gcc 4.5. A small test case is attached.
When compiling this test case with "gcc -O3 perf.c -lm -std=c99"
and executing the resulting binary, the CPU time with the head of
the 4.4 branch is about 1.1s, with the head of the trunk it is 2.1s.

This is on a Pentium D CPU. I have verified that both binaries produce
identical results.

Verbose output of gcc-4.4:

~/tmp/wigner3j>gcc -O3 perf.c -lm -std=c99 -save_temps -v
Using built-in specs.
gcc: unrecognized option '-save_temps'
Target: i686-pc-linux-gnu
Configured with: /scratch/martin/gcc44/configure --prefix=/scratch/martin/ugcc44
 --enable-languages=c++,fortran --enable-target=all --disable-bootstrap --enable
-checking=release
Thread model: posix
gcc version 4.4.3 20091130 (prerelease) [gcc-4_4-branch revision 154765] (GCC) 
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 /scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1 -quiet -v perf.c
 -quiet -dumpbase perf.c -mtune=generic -auxbase perf -O3 -std=c99 -version -o /
tmp/cc3D10Yi.s
ignoring nonexistent directory "/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu
/4.4.3/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /scratch/martin/ugcc44/include
 /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/include
 /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/include-fixed
 /usr/include
End of search list.
GNU C (GCC) version 4.4.3 20091130 (prerelease) [gcc-4_4-branch revision 154765]
 (i686-pc-linux-gnu)
        compiled by GNU C version 4.2.3, GMP version 4.2.4, MPFR version 2.3.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 0428a618e74de3f947d92ab031f86f8a
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 as -V -Qy -o /tmp/cc6AnZqy.o /tmp/cc3D10Yi.s
GNU assembler version 2.18 (i686-pc-linux-gnu) using BFD version (GNU Binutils) 
2.18
COMPILER_PATH=/scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/:/scrat
ch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/:/scratch/martin/ugcc44/lib
exec/gcc/i686-pc-linux-gnu/:/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4
.3/:/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/:/usr/libexec/gcc/i686-pc-l
inux-gnu/:/usr/lib/gcc/i686-pc-linux-gnu/
LIBRARY_PATH=/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/:/scratch/ma
rtin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 /scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/collect2 --eh-frame-
hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti
.o /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/crtbegin.o -L/scratch/
martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3 -L/scratch/martin/ugcc44/lib/gcc/i
686-pc-linux-gnu/4.4.3/../../.. /tmp/cc6AnZqy.o -lm -lgcc --as-needed -lgcc_s --
no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /scratch/martin/ugcc44
/lib/gcc/i686-pc-linux-gnu/4.4.3/crtend.o /usr/lib/crtn.o

Verbose output of gcc-4.5:
~/tmp/wigner3j>gcc -O3 perf.c -lm -std=c99 -save-temps -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: /scratch/martin/gcc/configure --enable-gold --prefix=/afs/mpa/data/martin/ugcc --with-mpfr=/afs/mpa/data/martin/numlibs --with-gmp=/afs/mpa/data/martin/numlibs --with-mpc=/afs/mpa/data/martin/numlibs --enable-languages=c++,fortran --enable-target=all --enable-checking=release
Thread model: posix
gcc version 4.5.0 20091214 (experimental) [trunk revision 155208] (GCC) 
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/cc1 -E -quiet -v perf.c -mtune=generic -std=c99 -O3 -fpch-preprocess -o perf.i
ignoring nonexistent directory "/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /afs/mpa/data/martin/ugcc/include
 /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/include
 /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/cc1 -fpreprocessed perf.i -quiet -dumpbase perf.c -mtune=generic -auxbase perf -O3 -std=c99 -version -o perf.s
GNU C (GCC) version 4.5.0 20091214 (experimental) [trunk revision 155208] (i686-pc-linux-gnu)
        compiled by GNU C version 4.5.0 20091214 (experimental) [trunk revision 155208], GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C (GCC) version 4.5.0 20091214 (experimental) [trunk revision 155208] (i686-pc-linux-gnu)
        compiled by GNU C version 4.5.0 20091214 (experimental) [trunk revision 155208], GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 9df7fe822ccb89478c9ff357db9be45e
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 as -V -Qy --32 -o perf.o perf.s
GNU assembler version 2.18 (i686-pc-linux-gnu) using BFD version (GNU Binutils) 2.18
COMPILER_PATH=/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/
LIBRARY_PATH=/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/crtbegin.o -L/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0 -L/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../.. perf.o -lm -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/crtend.o /usr/lib/crtn.o

I attach the test case and the two generated assembler files.
Comment 1 Martin Reinecke 2009-12-15 08:41:51 UTC
Created attachment 19305 [details]
test case
Comment 2 Martin Reinecke 2009-12-15 08:42:20 UTC
Created attachment 19306 [details]
assembler generated by gcc 4.5
Comment 3 Martin Reinecke 2009-12-15 08:43:15 UTC
Created attachment 19307 [details]
assembler generated by gcc 4.4
Comment 4 Richard Biener 2009-12-15 13:15:54 UTC
This is because (quoting http://gcc.gnu.org/gcc-4.5/changes.html):

"GCC now supports handling floating-point excess precision arising from use of the x87 floating-point unit in a way that conforms to ISO C99. This is enabled with -fexcess-precision=standard and with standards conformance options such as -std=c99, and may be disabled using -fexcess-precision=fast."

GCC with -std=c99 makes sure to properly handle the i387 FPU excess precision.
With -fexcess-precision=fast the code is as fast (and non-conforming) like
with GCC 4.4.  Using -std=gnu99 is also an option.
Comment 5 Martin Reinecke 2009-12-17 14:12:28 UTC
> GCC with -std=c99 makes sure to properly handle the i387 FPU excess precision.
> With -fexcess-precision=fast the code is as fast (and non-conforming) like
> with GCC 4.4.  Using -std=gnu99 is also an option.

Thanks a lot for pointing this out! I was aware of the floating-point change but simply had not realized it would be switched on by -std=c99.
I imagine that this might catch many people by surprise once 4.5.0 is released,
and it might be politically advisable to mention it (and the "fix") in a place where users can't miss it.
Is there a plan to mention this (in a prominent place) in the release notes?
Or in the FAQ or the "non-bugs" section of bugs.html? I can prepare a documentation patch if this is desirable.
Comment 6 Richard Biener 2009-12-17 16:22:54 UTC
Documentation improvement is always welcome, especially if you looked for it but
missed the critical piece.
Comment 7 Martin Reinecke 2010-01-07 15:16:45 UTC
Created attachment 19499 [details]
Proposed wwwdocs patch to explain the apparent performance regression

Here is a proposed patch to gcc-4.5/changes.html, which mentions the apparent performance regression (and describes how to avoid it) in the "Caveats" section.

The FSF should have my copyright assignment; in any case I think this patch is
trivial enough.
Comment 8 Richard Biener 2010-01-07 15:27:33 UTC
Can you please post the patch to gcc-patches@gcc.gnu.org instead?  Thanks.