without -flto 106856 bytes with -flto 156312 bytes http://www.tux.org/~mayer/linux/nbench-byte-2.2.3.tar.gz CFLAGS = -s -Wall -O3 -g0 -march=core2 -fomit-frame-pointer -funroll-loops -ffast-math -mssse3 -fno-PIE -fno-exceptions -fno-stack-protector
When using the linker-plugin? That is, with -fwhole-program?
156312 bytes with -s -Wall -O3 -g0 -march=core2 -fomit-frame-pointer -funroll-loops -ffast-math -mssse3 -fno-PIE -fno-exceptions -fno-stack-protector -flto -fwhole-program?
(In reply to comment #0) > without -flto 106856 bytes > with -flto 156312 bytes But is it faster?
one of tests is faster
branch 4.9 without lto 101462 bytes with -flto -fwhole-program 157243 bytes - linker bfd 155488 bytes - linker gold other CFLAGS = -O3 -g0 -march=corei7 -fomit-frame-pointer -funroll-loops -ffast-math -fno-PIE -fno-exceptions -fno-stack-protector
executable is smaller with lto when I add -fno-inline-functions 95928 vs 93880
-fno-inline-functions makes same tests 12% or 6% slower with lto/gold NUMERIC SORT : 1689.2 : 43.32 : 14.23 NUMERIC SORT : 1483.2 : 38.04 : 12.49 IDEA : 9932 : 151.91 : 45.10 IDEA : 9360 : 143.16 : 42.50
lto/gold -finline-limit=43 99960 bytes NUMERIC SORT : 1471.2 : 37.73 : 12.39 -finline-limit=44 149136 bytes NUMERIC SORT : 1705.2 : 43.73 : 14.36
Btw, with -O3 you essentially say you do not care for program size (IPA cloning decisions blow up the unit without limits I think - unlike inlining which is limited by unit-growth for large units).
there is difference also with O2 and branch 4.9 size in bytes 57199 -O2 55222 -O2 -flto 60681 -O2 -finline-functions 75301 -O2 -flto -finline-functions 67083 -O2 -flto -finline-functions --param large-unit-insns=1000