This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug ipa/65701] r221530 makes 187.facerec drop with -Ofast -flto
- From: "hubicka at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 09 Apr 2015 19:40:51 +0000
- Subject: [Bug ipa/65701] r221530 makes 187.facerec drop with -Ofast -flto
- Auto-submitted: auto-generated
- References: <bug-65701-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65701
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenther at suse dot de,
| |vmakarov at redhat dot com
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This is on clean mainline and bdver1 machine.
GCC with patch reverted runtime is:
real 0m50.714s
user 0m50.402s
sys 0m0.356s
and now with different inliner settings:
(talos4)$ sh compile
real 1m4.636s
user 1m4.270s
sys 0m0.420s
(talos4)$ sh compile --param large-function-insns=1000
real 0m51.063s
user 0m50.742s
sys 0m0.364s
(talos4)$ sh compile --param large-function-insns=100000 --param
large-stack-frame=100000
real 1m1.369s
user 1m1.012s
sys 0m0.407s
(talos4)$ sh compile -fno-tree-vectorize
real 1m0.629s
user 1m0.299s
sys 0m0.381s
(talos4)$ sh compile -fno-tree-vectorize --param large-function-insns=1000
real 0m53.375s
user 0m53.053s
sys 0m0.367s
(talos4)$ sh compile -fno-tree-vectorize --param large-function-insns=100000
--param large-stack-frame=100000
real 0m55.131s
user 0m54.826s
sys 0m0.351s
param large-function-insns=1000 is thus a winner, but apparently by an
accident.
It seems that tree vectorizer actually make code slower when more inlining and
SRA happens. Richard, perhaps with you vect-costmodel-fu, you can take a look?
It also may be just an RA issue, but I do not see particularly many spills in
ther internal loops.
To completely flatten the whole benchmark, one needs to also bump up
max-inline-insns-auto. This seems to firther degrade perofmrance with both
vectorizer and nonvectorizer, so it also may be just an register pressure and
IRA issue.
Richard, since it is the second time we run into large-function-insns being
beneficial, I wonder if you can patch frescobaldi or czerny (so we have c++
benchmark and LTO spec covered) with change of the parameter value?
The current value was never really tuned it is quite possibly just too large.
I will see if I can get anything useful out of firefox benchmarks.