While testing some mods to the current pre-4.5 tree, I ran into sporadic "make install" failures, when running the make install as a parallel make (ie, "make -jN" where N > 1). The host is an x86_64 with 4 CPU cores, being built in the default multi-lib mode. The failures are sporadic, occurring in only about 2% of the cases. Here is an example of the failures: mv: cannot stat `rls/usr/local/lib/../lib64/./libiberty.an': No such file or directory /usr/bin/install: cannot change permissions of `rls/usr/local/lib/../lib64/./libiberty.an': No such file or directory /usr/bin/install: cannot create regular file `rls/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include/ssp/ssp.h': File exists Looking at libssp/Makefile.am, there is this definition: nobase_libsubinclude_HEADERS = ssp/ssp.h ssp/string.h ssp/stdio.h ssp/unistd.h The failure above, I think may be the result of make install running in both the 64-bit and 32-bit libssp build directories at the ame time. The install for the nobase_libsubinclude_HEADERS target is run twice, and apparently in parallel, which seems to lead to a race condition. The libiberty failure is unrelated to the use of automake, (because it isn't used). I didn't research the cause of that failure - it was the most frequently occurring failure, fyi. Several libraries use automake and the libsubinclude_HEADERS header definition: libgomp, libmudflap, and libssp which would seem to make them susceptible to this parallel install failure. Interestingly, libgcc appears to run make in non-parallel mode? make[3]: Entering directory `x86_64-unknown-linux-gnu/libgcc' make[3]: warning: jobserver unavailable: using -j1. Add `+' to parent make rul e.
Related to PR 33119. I don't many people who install with -j.
(In reply to comment #1) > Related to PR 33119. I don't many people who install with -j. > It resulted from "make install" being invoked from a Makefile, where the overall make was run in parallel, and then a last step did this: $(MAKE) install It inherited the -j setting from the top-level make invocation. I checked the install docs, and didn't see where it warned away from doing a parallel make install, fyi. The automake aspect of this bug seems like a bug in the way automake handles certain install rules in a multi-lib setting. Certain rules should only be run once, and not re-run in the multi-lib'd directories.
Can you make a bit more of the output of such a failed install available, say, about 50 lines around each of the two different failures? Wrt. the failure with headers, it seems GNU coreutils install does not allow concurrent installs of the same file. I wasn't aware of this limitation, will ask on bug-coreutils whether that is on purpose. And no, the two issues mentioned in this bug do not look like they are the same as in PR 33119.
Created attachment 19827 [details] Log excerpts of parallel make install failures Excerpts from parallel install (ie, "make -j6 install") failures. Each file is separated by "=== <file> ===" strings. Search for the word "cannot" to find the point of failure. These are mainly libiberty failures, with a single libssp failure. We saw a similar libgomp failure to the libssp failure, but apparently didn't save the log file. The problem can be replicated by creating a loop that repeatedly runs "make -j6 install", and that saves the log when a failure is detected. The -j6 is arbitrary - this turns out to be the "sweet spot" for parallel makes on our 4 core system. Something like this: #!/bin/sh dest=`pwd`/rls for i in `seq 1 1000`; do for j in 6; do echo "$i: make -j$j install" rm -rf $dest logfile="install-${i}-${j}.log" make -j$j -C bld DESTDIR=$dest install >& $logfile if [ $? -ne 0 ]; then echo "$i: make -j$j install failed." else rm -f $logfile fi done done
Thanks for the logs. I don't understand the libiberty installation failure yet. Can you please run the following, and provide the log file and the number of runs needed, in case it provokes a failure? while make -j install-target-libiberty SHELL="/bin/sh -x" > log-file 2>&1; do echo $((n++)) done Otherwise, you might need to just interrupt this after a while. Thanks.
Created attachment 19936 [details] parallel "make -j6 install" failure logs Attached, a collection of install logs run with SHELL="/bin/sh -x", where the install failed. The first log is for a libgomp install failure, and the rest are for libiberty. Note that it is difficult to reproduce the failures. I found that "make -j6" led to more failures than "make -j" on our system. Also, couldn't get the install-target-libiberty target to fail, but perhaps didn't wait long enough. Instead, ran "make -j6 install". Also, tried running this test on a "ram fs", but couldn't make the installs fail, likely due to different interlocks. Similarly, the installs were more likely to fail on our RAID-ed disk subsystem, populated with high speed drives, than on a file system on a single drive. (These tests were run using the gcc-4.5-20100114 (gcc core) snapshot.)
Thank you very much for the logs, and the note that install-target-libiberty alone wasn't sufficient to provoke a race, that provided the needed clue. Please try the following three patches (split up because they're likely to go into the tree at different times) to ensure there are no more likely issues left. Thanks.
Created attachment 19953 [details] properly propagate (parallel) make flags in libgcc install rule
Created attachment 19954 [details] Fix parallel libiberty install failure This patch should fix the bulk of the failures, and is hopefully simple enough to go in before 4.6.
Created attachment 19955 [details] Changes proposed for automake-generated files (pending upstream fix) This patch is just a diff against files generated by automake, more precisely, a fix that hasn't gone upstream yet (discussion to follow on gcc-patches). After you apply this, due to PR 43171, you either need to start a new build tree or remove x86_64-unknown-linux-gnu/*/Makefile and x86_64-unknown-linux-gnu/32/*/Makefile before rerunning make.
Patches posted at <http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01236.html>
*** Bug 38388 has been marked as a duplicate of this bug. ***
Subject: Bug 42980 Author: rwild Date: Tue Mar 2 06:09:56 2010 New Revision: 157159 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157159 Log: Small multilib rule fixups. libgcc/: PR other/42980 * Makefile.in (install): Use $(MAKE) string in rule, for parallel make. libiberty/: * Makefile.in (all): Do not use exec. Modified: trunk/libgcc/ChangeLog trunk/libgcc/Makefile.in trunk/libiberty/ChangeLog trunk/libiberty/Makefile.in
Seems fixed. If so, please close. If not, please summarize remaining issues.
(In reply to comment #14) > Seems fixed. If so, please close. If not, please summarize remaining issues. The patches in comments #8 and #10 are essentially unfixed, IIRC. #10 needs an update to Automake, that then needs propagated to GCC. This update changes multilib handling semantics slightly. I'm not sure if there are other problems with parallel install, but at that time I didn't see any others at least.
(In reply to comment #15) Ralf W. wrote (in part) > I'm not sure if there are other problems with parallel install, but at that > time I didn't see any others at least. Agreed. Off list, I wrote the following. On 03/01/10 07:48:09, Gary Funck wrote: > Ralf, > > I ran 1000 installs and they're all clean. > Typically, something would've failed in > that many installs.
I now get this when bootstrapping today's trunk and then "make -j4 install": /usr/bin/install: cannot create regular file `/home/janne/src/gfortran/trunk/install/bin/gcc-ar': No such file or directory make[2]: *** [install-gcc-ar] Error 1 As a result, my install/bin directory is empty. Running "make install" fixes the problem, suggesting a race somewhere. Perhaps due to Rainer's recent move of configury logic to libgcc? My configure line: nice ../$SRCDIR/configure --enable-checking \ --prefix=/home/janne/src/gfortran/$GCCDIR/install --enable-languages=fortran \ --enable-maintainer-mode \ --enable-__cxa_atexit \ --disable-bootstrap \ --enable-threads=posix \ --disable-multilib Target & host: x86_64-unknown-linux-gnu
I've observed the same problem with my glibc buildbot, where installing GCC (GCC 6 branch) with parallel make tried to install omp.h in the same directory from more than one multilib in parallel. install: cannot create regular file '/scratch/jmyers/glibc-bot/install/compilers/mips64el-linux-gnu-nan2008/lib/gcc/mips64el-glibc-linux-gnu/6.3.1/include/omp.h': File exists Makefile:849: recipe for target 'install-nodist_libsubincludeHEADERS' failed make[9]: *** [install-nodist_libsubincludeHEADERS] Error 1 make[9]: Leaving directory '/scratch/jmyers/glibc-bot/build/compilers/mips64el-linux-gnu-nan2008/gcc/mips64el-glibc-linux-gnu/32/libgomp' Makefile:1040: recipe for target 'install-am' failed make[8]: *** [install-am] Error 2