Bug 42980 - GCC parallel "make install" failures
Summary: GCC parallel "make install" failures
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: bootstrap (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: ---
Assignee: Ralf Wildenhues
URL:
Keywords: build
Depends on:
Blocks: 84402
  Show dependency treegraph
 
Reported: 2010-02-05 21:46 UTC by Gary Funck
Modified: 2023-10-12 13:07 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-02-24 22:15:05


Attachments
Log excerpts of parallel make install failures (4.62 KB, text/plain)
2010-02-08 23:11 UTC, Gary Funck
Details
parallel "make -j6 install" failure logs (321.76 KB, application/octet-stream)
2010-02-22 17:48 UTC, Gary Funck
Details
properly propagate (parallel) make flags in libgcc install rule (616 bytes, patch)
2010-02-24 22:17 UTC, Ralf Wildenhues
Details | Diff
Fix parallel libiberty install failure (664 bytes, patch)
2010-02-24 22:18 UTC, Ralf Wildenhues
Details | Diff
Changes proposed for automake-generated files (pending upstream fix) (1.10 KB, patch)
2010-02-24 22:22 UTC, Ralf Wildenhues
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gary Funck 2010-02-05 21:46:14 UTC
While testing some mods to the current pre-4.5 tree,
I ran into sporadic "make install" failures, when
running the make install as a parallel make (ie,
"make -jN" where N > 1).

The host is an x86_64 with 4 CPU cores, being built in the default
multi-lib mode.  The failures are sporadic, occurring
in only about 2% of the cases.

Here is an example of the failures:

mv: cannot stat `rls/usr/local/lib/../lib64/./libiberty.an': No such file
or directory
/usr/bin/install: cannot change permissions of
`rls/usr/local/lib/../lib64/./libiberty.an': No such file or directory
/usr/bin/install: cannot create regular file
`rls/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/include/ssp/ssp.h':
File exists

Looking at libssp/Makefile.am, there is this definition:
nobase_libsubinclude_HEADERS = ssp/ssp.h ssp/string.h ssp/stdio.h ssp/unistd.h

The failure above, I think may be the result of make install running
in both the 64-bit and 32-bit libssp build directories at the
ame time.  The install for the nobase_libsubinclude_HEADERS
target is run twice, and apparently in parallel, which seems
to lead to a race condition.

The libiberty failure is unrelated to the use of automake,
(because it isn't used).  I didn't research the cause of
that failure - it was the most frequently occurring failure, fyi.

Several libraries use automake and the libsubinclude_HEADERS
header definition: libgomp, libmudflap, and libssp which
would seem to make them susceptible to this parallel
install failure.

Interestingly, libgcc appears to run make in non-parallel
mode?

make[3]: Entering directory `x86_64-unknown-linux-gnu/libgcc'
make[3]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rul
e.
Comment 1 Andrew Pinski 2010-02-05 22:37:59 UTC
Related to PR 33119.  I don't many people who install with -j.
Comment 2 Gary Funck 2010-02-05 23:03:48 UTC
(In reply to comment #1)
> Related to PR 33119.  I don't many people who install with -j.
> 

It resulted from "make install" being invoked from a Makefile, where
the overall make was run in parallel, and then a last step did this:
  $(MAKE) install
It inherited the -j setting from the top-level make invocation.

I checked the install docs, and didn't see where it warned away from
doing a parallel make install, fyi.

The automake aspect of this bug seems like a bug in the way
automake handles certain install rules in a multi-lib setting.  Certain
rules should only be run once, and not re-run in the multi-lib'd directories.
Comment 3 Ralf Wildenhues 2010-02-08 18:18:07 UTC
Can you make a bit more of the output of such a failed install available,
say, about 50 lines around each of the two different failures?

Wrt. the failure with headers, it seems GNU coreutils install does not
allow concurrent installs of the same file.  I wasn't aware of this
limitation, will ask on bug-coreutils whether that is on purpose.

And no, the two issues mentioned in this bug do not look like they are the
same as in PR 33119.
Comment 4 Gary Funck 2010-02-08 23:11:31 UTC
Created attachment 19827 [details]
Log excerpts of parallel make install failures

Excerpts from parallel install (ie, "make -j6 install") failures.  Each file is separated by "=== <file> ===" strings.  Search for the word "cannot" to find the point of failure.  These are mainly libiberty failures, with a single libssp failure.  We saw a similar libgomp failure to the libssp failure, but apparently didn't save the log file.  The problem can be replicated by creating a loop that repeatedly runs "make -j6 install", and that saves the log when a failure is detected.  The -j6 is arbitrary - this turns out to be the "sweet spot" for parallel makes on our 4 core system.

Something like this:

#!/bin/sh
dest=`pwd`/rls
for i in `seq 1 1000`; do
  for j in 6; do
    echo "$i: make -j$j install"
    rm -rf $dest
    logfile="install-${i}-${j}.log"
    make -j$j -C bld DESTDIR=$dest install >& $logfile
    if [ $? -ne 0 ]; then
      echo "$i: make -j$j install failed."
    else
      rm -f $logfile
    fi
  done
done
Comment 5 Ralf Wildenhues 2010-02-21 16:13:14 UTC
Thanks for the logs.  I don't understand the libiberty installation failure
yet. Can you please run the following, and provide the log file and the number
of runs needed, in case it provokes a failure?

  while make -j install-target-libiberty SHELL="/bin/sh -x" > log-file 2>&1; do
    echo $((n++))
  done

Otherwise, you might need to just interrupt this after a while.  Thanks.
Comment 6 Gary Funck 2010-02-22 17:48:27 UTC
Created attachment 19936 [details]
parallel "make -j6 install" failure logs

Attached, a collection of install logs run with SHELL="/bin/sh -x", where the install failed.  The first log is for a libgomp install failure, and the rest are for libiberty.  Note that it is difficult to reproduce the failures.  I found that "make -j6" led to more failures than "make -j" on our system.  Also, couldn't get the install-target-libiberty target to fail, but perhaps didn't wait long enough.  Instead, ran "make -j6 install".  Also, tried running this test on a "ram fs", but couldn't make the installs fail, likely due to different interlocks.  Similarly, the installs were more likely to fail on our RAID-ed disk subsystem, populated with high speed drives, than on a file system on a single drive. (These tests were run using the gcc-4.5-20100114 (gcc core) snapshot.)
Comment 7 Ralf Wildenhues 2010-02-24 22:15:05 UTC
Thank you very much for the logs, and the note that install-target-libiberty alone wasn't sufficient to provoke a race, that provided the needed clue.

Please try the following three patches (split up because they're likely to go into the tree at different times) to ensure there are no more likely issues left.  Thanks.
Comment 8 Ralf Wildenhues 2010-02-24 22:17:26 UTC
Created attachment 19953 [details]
properly propagate (parallel) make flags in libgcc install rule
Comment 9 Ralf Wildenhues 2010-02-24 22:18:47 UTC
Created attachment 19954 [details]
Fix parallel libiberty install failure

This patch should fix the bulk of the failures, and is hopefully simple enough to go in before 4.6.
Comment 10 Ralf Wildenhues 2010-02-24 22:22:05 UTC
Created attachment 19955 [details]
Changes proposed for automake-generated files (pending upstream fix)

This patch is just a diff against files generated by automake, more precisely, a fix that hasn't gone upstream yet (discussion to follow on gcc-patches).

After you apply this, due to PR 43171, you either need to start a new build tree or remove x86_64-unknown-linux-gnu/*/Makefile and x86_64-unknown-linux-gnu/32/*/Makefile before rerunning make.
Comment 11 Ralf Wildenhues 2010-02-28 09:57:21 UTC
Patches posted at <http://gcc.gnu.org/ml/gcc-patches/2010-02/msg01236.html>
Comment 12 Pawel Sikora 2010-02-28 10:41:24 UTC
*** Bug 38388 has been marked as a duplicate of this bug. ***
Comment 13 Ralf Wildenhues 2010-03-02 06:10:28 UTC
Subject: Bug 42980

Author: rwild
Date: Tue Mar  2 06:09:56 2010
New Revision: 157159

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157159
Log:
Small multilib rule fixups.

libgcc/:
        PR other/42980
        * Makefile.in (install): Use $(MAKE) string in rule, for
        parallel make.

libiberty/:
        * Makefile.in (all): Do not use exec.

Modified:
    trunk/libgcc/ChangeLog
    trunk/libgcc/Makefile.in
    trunk/libiberty/ChangeLog
    trunk/libiberty/Makefile.in

Comment 14 Hans-Peter Nilsson 2011-06-20 16:34:28 UTC
Seems fixed.  If so, please close.  If not, please summarize remaining issues.
Comment 15 Ralf Wildenhues 2011-06-22 21:35:10 UTC
(In reply to comment #14)
> Seems fixed.  If so, please close.  If not, please summarize remaining issues.

The patches in comments #8 and #10 are essentially unfixed, IIRC.  #10 needs an update to Automake, that then needs propagated to GCC.  This update changes multilib handling semantics slightly.

I'm not sure if there are other problems with parallel install, but at that time I didn't see any others at least.
Comment 16 Gary Funck 2011-06-22 22:19:07 UTC
(In reply to comment #15) Ralf W. wrote (in part)
> I'm not sure if there are other problems with parallel install, but at that
> time I didn't see any others at least.

Agreed.  Off list, I wrote the following.

On 03/01/10 07:48:09, Gary Funck wrote:
> Ralf,
> 
> I ran 1000 installs and they're all clean.
> Typically, something would've failed in
> that many installs.
Comment 17 Janne Blomqvist 2011-11-07 15:07:09 UTC
I now get this when bootstrapping today's trunk and then "make -j4 install":

/usr/bin/install: cannot create regular file `/home/janne/src/gfortran/trunk/install/bin/gcc-ar': No such file or directory
make[2]: *** [install-gcc-ar] Error 1

As a result, my install/bin directory is empty. Running "make install" fixes the problem, suggesting a race somewhere. Perhaps due to Rainer's recent move of configury logic to libgcc?

My configure line:

nice ../$SRCDIR/configure --enable-checking \
    --prefix=/home/janne/src/gfortran/$GCCDIR/install --enable-languages=fortran \
    --enable-maintainer-mode \
    --enable-__cxa_atexit \
    --disable-bootstrap \
    --enable-threads=posix \
    --disable-multilib

Target & host: x86_64-unknown-linux-gnu
Comment 18 Joseph S. Myers 2017-01-18 17:50:47 UTC
I've observed the same problem with my glibc buildbot, where installing GCC (GCC 6 branch) with parallel make tried to install omp.h in the same directory from more than one multilib in parallel.

install: cannot create regular file '/scratch/jmyers/glibc-bot/install/compilers/mips64el-linux-gnu-nan2008/lib/gcc/mips64el-glibc-linux-gnu/6.3.1/include/omp.h': File exists
Makefile:849: recipe for target 'install-nodist_libsubincludeHEADERS' failed
make[9]: *** [install-nodist_libsubincludeHEADERS] Error 1
make[9]: Leaving directory '/scratch/jmyers/glibc-bot/build/compilers/mips64el-linux-gnu-nan2008/gcc/mips64el-glibc-linux-gnu/32/libgomp'
Makefile:1040: recipe for target 'install-am' failed
make[8]: *** [install-am] Error 2