[Bug tree-optimization/18704] New: Inlining limits cause 340% performance regression
rguenth at tat dot physik dot uni-tuebingen dot de
gcc-bugzilla@gcc.gnu.org
Sun Nov 28 18:16:00 GMT 2004
Compared to 3.4, the default inlining limits in 4.0 cause a 340%
performance regression on the tramp3d-v3.cpp testcase here:
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v3.cpp.gz
The regression can be attributed to the inlining limits, as
patching both compilers with the leafify patch results in same
performance.
Compilation options used are -Dleafify=fooblah -O2 -fpeel-loops -ffast-math
-march=pentium4 -mfpmath=sse -fno-exceptions. Binary size is
"improved" by about 9% with the current defaults.
Using --param max-inline-insns-single=1000 worsens the situation to
a
Playing with the inlining params gives
max-inline-insns-single large-function-growth inline-unit-growth regression
340%
1000 375%
500 348%
200 -36% (1%
size regression)
175 -35% (4%
size improvement)
165 -12%
150 -12% (!?)
100 232%
So I guess, limiting overall unit growth is bad - can we disable limiting at
-Os, or provide a higher default value? The "correct" value will be different
depending on the application. Also, the documented default value for
inline-unit-growth is not what it actually seems to be (it is 50 reading
params.def, large-function-growth is also not correctly documented).
If we make the documented values the default, we get a 68% compile time
and a 3.7% code size regression for a 71% performance improvement (this was
including "correcting" the large-function-growth limit, which seems to hurt
rather than help).
--
Summary: Inlining limits cause 340% performance regression
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
More information about the Gcc-bugs
mailing list