[PATCH] Increase min-lto-partition.
Martin Liška
mliska@suse.cz
Fri Mar 13 15:25:45 GMT 2020
On 3/13/20 4:11 PM, Jan Hubicka wrote:
>>> $ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
>>> real 0m8.709s
>>> user 0m8.543s
>>>
>>> WPA+LTRANS:
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=4 -fno-checking
>>> real 0m11.220s
>>> user 0m33.067s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=6 -fno-checking
>>> real 0m9.880s
>>> user 0m35.599s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=8 -fno-checking
>>> real 0m6.681s
>>> user 0m39.746s
>>>
>>> default:
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o -fno-checking
>>> real 0m6.065s
>>> user 1m22.698s
>
> I did
> /aux/hubicka/trunk-git/build2/./prev-gcc/xg++ -B/aux/hubicka/trunk-git/build2/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/aux/hubicka/trunk-git/libstdc++-v3/libsupc++ -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c -g -O2 -fchecking=0 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/. -I../../gcc/../include -I../../gcc/../libcpp/include -I/aux/hubicka/trunk-git/build2/./gmp -I/aux/hubicka/trunk-git/gmp -I/aux/hubicka/trunk-git/build2/./mpfr/src -I/aux/hubicka/trunk-git/mpfr/src -I/aux/hubicka/trunk-git/mpc/src -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -I/aux/hubicka/trunk-git/build2/./isl/include -I/aux/hubicka/trunk-git/isl/include -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c -flto
>
> (copying from build disabling checking and adding -flto) and I get:
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r
>
> real 0m10.394s
> user 2m13.809s
> sys 0m3.896s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r
>
> real 0m21.033s
> user 2m3.063s
> sys 0m2.539s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=6 -r
>
> real 0m23.975s
> user 1m56.139s
> sys 0m2.595s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r
>
> real 0m32.383s
> user 1m39.411s
> sys 0m2.213s
>
> With debug info disabled (like you do, but I guess in less realistic
> setting) I get:
>
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=128 -r
>
> real 0m10.905s
> user 1m55.065s
> sys 0m2.956s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=8 -r
>
> real 0m17.297s
> user 1m26.513s
> sys 0m1.626s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=6 -r
>
> real 0m22.365s
> user 1m30.969s
> sys 0m1.386s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=4 -r
>
> real 0m26.534s
> user 1m21.593s
> sys 0m0.902s
>
> So I do not see such notable idfference in user times (but they are
> consistently worse than yours). Perhaps, can you try to perf it
> including the system profile? It may give us some idea why things behave
> differently.
That's strange. So let's take my gimple-match.ii:
https://drive.google.com/file/d/1B8d3bIvz1KA_ksIo8h-JgkaJTCRiSPR4/view?usp=sharing
For gcc9 package (LTO+PGO) I get:
$ time g++ -O2 gimple-match.ii -c -flto
real 0m8.180s
user 0m7.992s
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r
real 0m9.041s
user 0m28.157s
sys 0m0.493s
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r
real 0m6.011s
user 1m20.326s
sys 0m2.147s
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking -r
real 0m6.303s
user 1m18.789s
sys 0m2.244s
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r
real 0m5.875s
user 0m38.938s
sys 0m0.784s
For default I get:
perf report --stdio | head -n30
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 351K of event 'cycles:u'
# Event count (approx.): 341558047686
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................... ............................................................................
#
3.61% lto1-ltrans lto1 [.] df_worklist_dataflow
1.93% lto1-ltrans lto1 [.] cleanup_cfg
1.15% lto1-ltrans lto1 [.] init_alias_analysis
1.02% lto1-ltrans lto1 [.] pre_and_rev_post_order_compute_fn
0.93% lto1-ltrans lto1 [.] calculate_dominance_info
0.84% lto1-ltrans lto1 [.] inverted_post_order_compute
0.75% lto1-ltrans lto1 [.] post_order_compute
0.71% lto1-ltrans libc-2.31.so [.] _int_malloc
0.69% lto1-ltrans lto1 [.] constrain_operands
0.68% lto1-ltrans lto1 [.] df_bb_refs_record
0.59% lto1-ltrans lto1 [.] side_effects_p
0.53% lto1-ltrans lto1 [.] delete_unreachable_blocks
0.53% lto1-ltrans lto1 [.] rewrite_update_dom_walker::before_dom_children
0.49% lto1-ltrans lto1 [.] bitmap_set_bit
0.47% lto1-ltrans lto1 [.] record_temporary_equivalences
0.46% lto1-ltrans lto1 [.] single_def_use_dom_walker::before_dom_children
0.46% lto1-ltrans lto1 [.] df_compact_blocks
0.45% lto1-ltrans lto1 [.] substitute_and_fold_engine::substitute_and_fold
0.45% lto1-ltrans libc-2.31.so [.] _int_free
Martin
>
> Compiler binary I use is profiledbootstrapped with LTO.
>
> Honza
>>>
>>> So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:
>>>
>>> 9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.
>>>
>>> Thoughts?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-03-13 Martin Liska <mliska@suse.cz>
>>>
>>> * params.opt: Bump min-lto-partition in order to not create
>>> too many LTRANS.
>>> ---
>>> gcc/params.opt | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>>
>>
>>> diff --git a/gcc/params.opt b/gcc/params.opt
>>> index e39216aa7d0..49fafac20af 100644
>>> --- a/gcc/params.opt
>>> +++ b/gcc/params.opt
>>> @@ -363,7 +363,7 @@ Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
>>> maximal number of LTO partitions streamed in parallel.
>>>
>>> -param=lto-min-partition=
>>> -Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
>>> +Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
>>> Minimal size of a partition for LTO (in estimated instructions).
>>>
>>> -param=lto-partitions=
>>>
>>
More information about the Gcc-patches
mailing list