This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH] RE: gcc parallel make check
- From: "VandeVondele Joost" <joost dot vandevondele at mat dot ethz dot ch>
- To: Jakub Jelinek <jakub at redhat dot com>, Yury Gribov <y dot gribov at samsung dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, "fortran at gcc dot gnu dot org" <fortran at gcc dot gnu dot org>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 9 Sep 2014 10:57:09 +0000
- Subject: RE: [PATCH] RE: gcc parallel make check
- Authentication-results: sourceware.org; auth=none
- References: <908103EDB4893A42920B21D3568BFD93150F4103 at MBX23 dot d dot ethz dot ch> <20140905143740 dot GL17454 at tucnak dot redhat dot com> <908103EDB4893A42920B21D3568BFD93150F414C at MBX23 dot d dot ethz dot ch> <20140905145304 dot GM17454 at tucnak dot redhat dot com> <908103EDB4893A42920B21D3568BFD93150F7F45 at MBX23 dot d dot ethz dot ch> <540ED02A dot 9080002 at samsung dot com>,<20140909101043 dot GQ17454 at tucnak dot redhat dot com>
> No. As I wrote earlier, splitting on filenames and test counts only is only
> very rough split, all the splits really need to be backed out by real timing
> data from popular targets.
I'm actually doing quite some testing trying to get a reasonable balance, checking 'completed in' in all *.log.sep files. However, it is important that the procedure is semi-automatic, otherwise few people will be interested in doing so. Furthermore, for parallel performance, it is not so important that times are distributed evenly (it is anyway unlikely the number of goals is exactly divided by N of -jN), but rather that the goals are ordered (executed) from slow to fast (similar to omp schedule guided). Most of the real bottlenecks are single letter patterns (e.g. p* since prxxxx is such a common filename), and this is ultimately limiting.
In the project (CP2K) I'm working on, we also parallelize testing over directories, but we keep a list of approximate runtimes per directory, and keep that (global) list sorted. Testing follows that list. As a result, we have near perfect parallel speedup, despite (or because) timings per directory ranging from a few 100s to 1s.
> Also, I'm afraid of some tests being left out
> unintentionally (e.g. the wildcards created at some point, then a new test
> is added with a weird starting character that hasn't been used before and
> suddenly it will not be tested with make -j?).
I agree this is an issue, partially addressed by not having to write patterns by hand anymore (i.e. a script does this), and by having the script check its input. There are something like 10 testnames that do not fall in [0-9A-Za-z], as mentioned in a previous email.