This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] gcc parallel make check
- From: Mike Stump <mikestump at comcast dot net>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: VandeVondele Joost <joost dot vandevondele at mat dot ethz dot ch>, David Malcolm <dmalcolm at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, "fortran at gcc dot gnu dot org" <fortran at gcc dot gnu dot org>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>
- Date: Fri, 12 Sep 2014 16:42:25 -0700
- Subject: Re: [PATCH] gcc parallel make check
- Authentication-results: sourceware.org; auth=none
- References: <908103EDB4893A42920B21D3568BFD93150F816B at MBX23 dot d dot ethz dot ch> <229476F6-B901-4C6E-AE0B-3A53521AE996 at comcast dot net> <1410381512 dot 28338 dot 9 dot camel at surprise> <20140910210822 dot GK17454 at tucnak dot redhat dot com> <20140910212334 dot GL17454 at tucnak dot redhat dot com> <20140911075123 dot GN17454 at tucnak dot redhat dot com> <20140911080640 dot GP17454 at tucnak dot redhat dot com> <20140911145300 dot GR17454 at tucnak dot redhat dot com> <908103EDB4893A42920B21D3568BFD93150F876D at MBX23 dot d dot ethz dot ch> <908103EDB4893A42920B21D3568BFD93150FE8D2 at MBX13 dot d dot ethz dot ch> <20140912163241 dot GC17454 at tucnak dot redhat dot com>
On Sep 12, 2014, at 9:32 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Here is my latest version of the patch.
I did a timing test:
Before:
real 0m57.198s
user 1m24.736s
sys 0m19.816s
after:
real 0m28.224s
user 1m27.823s
sys 0m22.374s
This is a -j70 run on a 64 core power7 of check-objc, I picked an obscure test case that I had no reason to believe was other than ignored and certainly not engineered for and kinda small to ensure the overhead would penalize it… 50.66% faster. There is still room for improvement:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
7 0 0 99046848 8515072 16748672 0 0 0 0 1 1 0 0 100 0 0
7 0 0 99050432 8515072 16748736 0 0 0 0 7501 9022 13 3 84 0 0
7 0 0 99029376 8515072 16749248 0 0 0 0 7320 8777 10 2 88 0 0
7 0 0 99070656 8515072 16749440 0 0 0 1524 7162 8156 9 2 88 1 0
7 0 0 99034560 8515072 16749824 0 0 0 0 8096 10363 7 2 91 0 0
7 0 0 99030080 8515072 16750720 0 0 0 0 8798 11673 8 3 90 0 0
9 0 0 99037376 8515072 16750080 0 0 0 0 9151 12598 9 3 87 0 0
7 0 0 99024128 8515136 16750656 0 0 0 0 9078 13168 7 3 90 0 0
10 0 0 99034496 8515136 16751488 0 0 0 1800 8633 11675 8 3 88 1 0
8 0 0 98986304 8515136 16751296 0 0 0 0 10159 14553 7 3 90 0 0
7 0 0 99010112 8515520 16765824 0 0 0 0 8814 12036 10 3 87 0 0
4 0 0 99014016 8515648 16773568 0 0 0 0 8091 10445 8 3 90 0 0
4 0 0 99064832 8515712 16773120 0 0 0 0 5416 5071 9 2 89 0 0
3 0 0 99118976 8515712 16773184 0 0 0 12716 4743 3533 4 1 92 2 0
3 0 0 99077504 8515840 16773248 0 0 0 0 4525 3988 3 1 96 0 0
2 0 0 99121152 8515840 16773824 0 0 0 0 4687 3757 3 1 97 0 0
2 0 0 99117056 8515840 16773632 0 0 0 0 4334 3156 3 1 96 0 0
2 0 0 99105728 8515840 16774336 0 0 0 0 4355 3246 3 1 96 0 0
3 0 0 99069120 8515904 16773632 0 0 0 648 4902 4037 2 1 97 0 0
1 0 0 99153664 8515968 16774592 0 0 0 0 3776 2711 2 1 97 0 0
1 0 0 99151232 8515968 16774400 0 0 0 0 877 205 4 0 96 0 0
1 0 0 99151424 8516032 16774528 0 0 0 236 774 466 2 0 97 0 0
2 0 0 99148032 8516032 16774656 0 0 0 0 853 350 2 0 98 0 0
2 0 0 99146176 8516032 16774656 0 0 0 1208 1630 1363 1 0 99 0 0
1 0 0 99156032 8516352 16777152 0 0 0 0 1919 2104 1 0 99 0 0
0 0 0 99189376 8516416 16776512 0 0 0 0 1181 799 2 0 98 0 0
0 0 0 99189312 8516416 16776512 0 0 0 0 118 18 0 0 100 0 0
0 0 0 99189312 8516416 16776512 0 0 0 0 90 18 0 0 100 0 0
0 0 0 99187968 8516416 16776512 0 0 0 5468 196 42 0 0 100 0 0
0 0 0 99187968 8516416 16776512 0 0 0 0 92 24 0 0 100 0 0
0 0 0 99188032 8516416 16776512 0 0 0 0 146 37 0 0 100 0 0
0 0 0 99188160 8516416 16776512 0 0 0 128 91 36 0 0 100 0 0
1 0 0 99188160 8516416 16776512 0 0 0 0 74 16 0 0 100 0 0
0 0 0 99188160 8516416 16776512 0 0 0 0 72 20 0 0 100 0 0
0 0 0 99188224 8516416 16776512 0 0 0 0 76 22 0 0 100 0 0
0 0 0 99188224 8516416 16776512 0 0 0 0 118 29 0 0 100 0 0
which averages to 95% idle. I changed:
check_objc_parallelize = 6
to
check_objc_parallelize = 70
to try and get it to go faster:
real 0m21.252s
user 3m21.035s
sys 1m9.937s
:-( 7 seconds (24.6%) faster, but consumes 146% more resources to see the benefit.
with the filesystem update to 2 (instead of 10):
real 0m22.478s
user 4m38.564s
sys 1m25.293s
and filesystem update 5:
real 0m21.665s
user 3m51.615s
sys 1m16.005s
and filesystem update 20:
real 0m22.681s
user 3m2.746s
sys 1m5.576s
a -j1 filesystem update 20 for comparison:
real 1m48.127s
user 1m17.953s
sys 0m17.191s
a -j1 check_objc_parallelize 6 filesystem update 10 for comparison:
real 1m47.552s
user 1m17.410s
sys 0m16.909s
a -j70 check_objc_parallelize 10000 filesystem update 10 for comparison:
real 0m21.292s
user 3m17.368s
sys 1m10.106s
a -j70 check_objc_parallelize 10000 filesystem update 2 for comparison:
real 0m21.976s
user 4m37.600s
sys 1m26.598s
a -j70 check_objc_parallelize 10000 filesystem update 200 for comparison:
real 1m12.319s
user 2m49.975s
sys 1m4.537s
a -j70 check_objc_parallelize 12 filesystem update 10 for comparison:
real 0m23.176s
user 1m33.100s
sys 0m25.722s
=======================================================
Switching over to check-c…
-j70 before, 94.4% idle:
real 22m38.331s
user 67m11.810s
sys 13m40.974s
-j70 after (71.28% idle):
real 10m41.448s
user 160m24.871s
sys 36m5.220s
143% more resource intensive to get a 52.8% faster check. I still see a long tail on the test suite run (30 second per line):
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 96997696 8707392 18756352 0 0 0 0 0 0 0 0 100 0 0
70 6 0 95642688 8709824 18719232 0 0 0 1231 23366 47068 54 23 18 4 0
66 10 0 95591872 8711744 18734976 0 0 0 3131 19437 37168 69 19 6 6 0
66 9 0 94251520 8716352 18780096 0 0 0 3304 18211 34222 70 18 7 6 0
60 16 0 94398400 8732288 18857152 0 0 0 2654 15808 29888 74 16 5 5 0
60 14 0 95059008 8749056 18973888 0 0 0 5678 17521 33177 72 17 6 5 0
60 12 0 94594880 8766656 18981376 0 0 0 2874 15686 28166 72 16 6 6 0
12 2 0 95515520 8773184 18997760 0 0 0 2109 14987 23655 48 9 39 4 0
6 1 0 96211264 8774144 19010560 0 0 0 2111 5049 4993 14 1 85 0 0
3 0 0 96441408 8774336 19016640 0 0 0 529 1870 980 7 0 93 0 0
2 0 0 96493248 8774336 19016128 0 0 0 359 462 79 3 0 97 0 0
2 0 0 96540992 8774400 19016000 0 0 0 417 458 89 3 0 97 0 0
1 0 0 96564736 8774400 19012864 0 0 0 277 482 164 2 0 98 0 0
1 0 0 96566080 8774400 19012928 0 0 0 16 194 31 2 0 98 0 0
1 0 0 96574208 8774400 19012928 0 0 0 9 185 27 2 0 98 0 0
1 0 0 96576192 8774400 19012672 0 0 0 9 197 32 2 0 98 0 0
1 0 0 96584384 8774400 19012736 0 0 0 9 185 26 2 0 98 0 0
1 0 0 96588608 8774400 19012480 0 0 0 9 187 27 2 0 98 0 0
1 0 0 96583872 8774400 19012672 0 0 0 18 183 27 2 0 98 0 0
1 0 0 96579072 8774528 19017472 0 0 0 32 230 55 2 0 98 0 0
1 0 0 96603264 8774592 19016832 0 0 0 92 373 219 2 0 98 0 0
1 0 0 96606528 8774592 19017984 0 0 0 111 357 241 2 0 98 0 0
About 3 minutes of using the machine, then 7 minutes of mostly idle. The worse offenders are:
gcc.dg/atomic/atomic.exp completed in 522 seconds
gcc.dg/compat/struct-layout-1.exp completed in 253 seconds
gcc.c-torture/compile/compile.exp completed in 252 seconds
gcc.c-torture/compile/compile.exp completed in 252 seconds
gcc.c-torture/execute/builtins/builtins.exp completed in 193 seconds
gcc.c-torture/execute/builtins/builtins.exp completed in 177 seconds
gcc.dg/atomic/atomic.exp completed in 141 seconds
gcc.c-torture/execute/execute.exp completed in 134 seconds
gcc.c-torture/compile/compile.exp completed in 128 seconds
gcc.dg/guality/guality.exp completed in 112 seconds
gcc.dg/ubsan/ubsan.exp completed in 111 seconds
gcc.dg/torture/dg-torture.exp completed in 109 seconds
gcc.dg/guality/guality.exp completed in 108 seconds
gcc.dg/dg.exp completed in 103 seconds
(all that are over 100 seconds).
curious, when I run atomic.exp=stdatom\*.c:
gcc.dg/atomic/atomic.exp completed in 30 seconds.
atomic.exp=c\*.c takes 522 seconds with 3, 2, 5 and 4 being the worst offenders.
I worry a little about the scaling overhead of the scheme. The bin packing method I was thinking of would just use a larger number of bins and then bin pack them into n bins using the actual testing time taken. Large bins, we’d just split in two. I kinda expected a -j70 of atomic.exp to use more than 1 core.