[Bug ipa/65701] r221530 makes 187.facerec drop with -Ofast -flto

hubicka at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Apr 9 19:40:00 GMT 2015


Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
                 CC|                            |rguenther at suse dot de,
                   |                            |vmakarov at redhat dot com

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This is on clean mainline and bdver1 machine.

GCC with patch reverted runtime is:
real    0m50.714s
user    0m50.402s
sys     0m0.356s

and now with different inliner settings:

(talos4)$ sh compile

real    1m4.636s
user    1m4.270s
sys     0m0.420s
(talos4)$ sh compile --param large-function-insns=1000

real    0m51.063s
user    0m50.742s
sys     0m0.364s
(talos4)$ sh compile --param large-function-insns=100000 --param

real    1m1.369s
user    1m1.012s
sys     0m0.407s
(talos4)$ sh compile -fno-tree-vectorize

real    1m0.629s
user    1m0.299s
sys     0m0.381s
(talos4)$ sh compile -fno-tree-vectorize --param large-function-insns=1000

real    0m53.375s
user    0m53.053s
sys     0m0.367s
(talos4)$ sh compile -fno-tree-vectorize --param large-function-insns=100000
--param large-stack-frame=100000

real    0m55.131s
user    0m54.826s
sys     0m0.351s

param large-function-insns=1000 is thus a winner, but apparently by an

It seems that tree vectorizer actually make code slower when more inlining and
SRA happens. Richard, perhaps with you vect-costmodel-fu, you can take a look?
It also may be just an RA issue, but I do not see particularly many spills in
ther internal loops.

To completely flatten the whole benchmark, one needs to also bump up
max-inline-insns-auto. This seems to firther degrade perofmrance with both
vectorizer and nonvectorizer, so it also may be just an register pressure and
IRA issue.

Richard, since it is the second time we run into large-function-insns being
beneficial, I wonder if you can patch frescobaldi or czerny (so we have c++
benchmark and LTO spec covered) with change of the parameter value?

The current value was never really tuned it is quite possibly just too large.
I will see if I can get anything useful out of firefox benchmarks.

More information about the Gcc-bugs mailing list