[PATCH] Increase min-lto-partition.

Martin Liška mliska@suse.cz
Fri Mar 13 15:25:45 GMT 2020


On 3/13/20 4:11 PM, Jan Hubicka wrote:
>>> $ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
>>> real	0m8.709s
>>> user	0m8.543s
>>>
>>> WPA+LTRANS:
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=4  -fno-checking
>>> real	0m11.220s
>>> user	0m33.067s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=6  -fno-checking
>>> real	0m9.880s
>>> user	0m35.599s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=8  -fno-checking
>>> real	0m6.681s
>>> user	0m39.746s
>>>
>>> default:
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o -fno-checking
>>> real	0m6.065s
>>> user	1m22.698s
> 
> I did
> /aux/hubicka/trunk-git/build2/./prev-gcc/xg++ -B/aux/hubicka/trunk-git/build2/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/aux/hubicka/trunk-git/libstdc++-v3/libsupc++ -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c   -g -O2 -fchecking=0  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/.  -I../../gcc/../include -I../../gcc/../libcpp/include -I/aux/hubicka/trunk-git/build2/./gmp -I/aux/hubicka/trunk-git/gmp -I/aux/hubicka/trunk-git/build2/./mpfr/src -I/aux/hubicka/trunk-git/mpfr/src -I/aux/hubicka/trunk-git/mpc/src -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -I/aux/hubicka/trunk-git/build2/./isl/include -I/aux/hubicka/trunk-git/isl/include  -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c -flto
> 
> (copying from build disabling checking and adding -flto) and I get:
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r
> 
> real    0m10.394s
> user    2m13.809s
> sys     0m3.896s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r
> 
> real    0m21.033s
> user    2m3.063s
> sys     0m2.539s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=6 -r
> 
> real    0m23.975s
> user    1m56.139s
> sys     0m2.595s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r
> 
> real    0m32.383s
> user    1m39.411s
> sys     0m2.213s
> 
> With debug info disabled (like you do, but I guess in less realistic
> setting) I get:
> 
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=128 -r
> 
> real    0m10.905s
> user    1m55.065s
> sys     0m2.956s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=8 -r
> 
> real    0m17.297s
> user    1m26.513s
> sys     0m1.626s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=6 -r
> 
> real    0m22.365s
> user    1m30.969s
> sys     0m1.386s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=4 -r
> 
> real    0m26.534s
> user    1m21.593s
> sys     0m0.902s
> 
> So I do not see such notable idfference in user times (but they are
> consistently worse than yours). Perhaps, can you try to perf it
> including the system profile? It may give us some idea why things behave
> differently.

That's strange. So let's take my gimple-match.ii:
https://drive.google.com/file/d/1B8d3bIvz1KA_ksIo8h-JgkaJTCRiSPR4/view?usp=sharing

For gcc9 package (LTO+PGO) I get:

$ time g++ -O2 gimple-match.ii -c -flto
real	0m8.180s
user	0m7.992s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r

real	0m9.041s
user	0m28.157s
sys	0m0.493s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r

real	0m6.011s
user	1m20.326s
sys	0m2.147s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking -r

real	0m6.303s
user	1m18.789s
sys	0m2.244s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r

real	0m5.875s
user	0m38.938s
sys	0m0.784s

For default I get:

perf report --stdio | head -n30
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 351K of event 'cycles:u'
# Event count (approx.): 341558047686
#
# Overhead  Command          Shared Object                Symbol
# ........  ...............  ...........................  ............................................................................
#
      3.61%  lto1-ltrans      lto1                         [.] df_worklist_dataflow
      1.93%  lto1-ltrans      lto1                         [.] cleanup_cfg
      1.15%  lto1-ltrans      lto1                         [.] init_alias_analysis
      1.02%  lto1-ltrans      lto1                         [.] pre_and_rev_post_order_compute_fn
      0.93%  lto1-ltrans      lto1                         [.] calculate_dominance_info
      0.84%  lto1-ltrans      lto1                         [.] inverted_post_order_compute
      0.75%  lto1-ltrans      lto1                         [.] post_order_compute
      0.71%  lto1-ltrans      libc-2.31.so                 [.] _int_malloc
      0.69%  lto1-ltrans      lto1                         [.] constrain_operands
      0.68%  lto1-ltrans      lto1                         [.] df_bb_refs_record
      0.59%  lto1-ltrans      lto1                         [.] side_effects_p
      0.53%  lto1-ltrans      lto1                         [.] delete_unreachable_blocks
      0.53%  lto1-ltrans      lto1                         [.] rewrite_update_dom_walker::before_dom_children
      0.49%  lto1-ltrans      lto1                         [.] bitmap_set_bit
      0.47%  lto1-ltrans      lto1                         [.] record_temporary_equivalences
      0.46%  lto1-ltrans      lto1                         [.] single_def_use_dom_walker::before_dom_children
      0.46%  lto1-ltrans      lto1                         [.] df_compact_blocks
      0.45%  lto1-ltrans      lto1                         [.] substitute_and_fold_engine::substitute_and_fold
      0.45%  lto1-ltrans      libc-2.31.so                 [.] _int_free


Martin

> 
> Compiler binary I use is profiledbootstrapped with LTO.
> 
> Honza
>>>
>>> So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:
>>>
>>> 9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.
>>>
>>> Thoughts?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-03-13  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* params.opt: Bump min-lto-partition in order to not create
>>> 	too many LTRANS.
>>> ---
>>>   gcc/params.opt | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>>
>>
>>> diff --git a/gcc/params.opt b/gcc/params.opt
>>> index e39216aa7d0..49fafac20af 100644
>>> --- a/gcc/params.opt
>>> +++ b/gcc/params.opt
>>> @@ -363,7 +363,7 @@ Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
>>>   maximal number of LTO partitions streamed in parallel.
>>>   
>>>   -param=lto-min-partition=
>>> -Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
>>> +Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
>>>   Minimal size of a partition for LTO (in estimated instructions).
>>>   
>>>   -param=lto-partitions=
>>>
>>



More information about the Gcc-patches mailing list