Bug 63827 - parallel make in libjava broken at r217383
Summary: parallel make in libjava broken at r217383
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: bootstrap (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-11 23:45 UTC by Jack Howarth
Modified: 2015-10-24 19:20 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Howarth 2014-11-11 23:45:55 UTC
At r217383, the parallel make of libjava is broken and fails at...

libtool: compile:  /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/./gcc/gcj -B/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava/ -B/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/./gcc/ -B/sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/bin/ -B/sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/lib/ -isystem /sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/include -isystem /sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/sys-include -fomit-frame-pointer -Usun -fclasspath= -fbootclasspath=../../../gcc-5-20141111/libjava/classpath/lib --encoding=UTF-8 -Wno-deprecated -fbootstrap-classes -g -O2 -fsource-filename=/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava/classpath/lib/classes -fjni -findirect-dispatch -fno-indirect-classes -c @gnu-xml-libxmlj.list  -fno-common -o .libs/gnu-xml-libxmlj.o
libtool: compile:  /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/./gcc/gcj -B/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava/ -B/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/./gcc/ -B/sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/bin/ -B/sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/lib/ -isystem /sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/include -isystem /sw/lib/gcc5.0/x86_64-apple-darwin13.4.0/sys-include -fomit-frame-pointer -Usun -fclasspath= -fbootclasspath=../../../gcc-5-20141111/libjava/classpath/lib --encoding=UTF-8 -Wno-deprecated -fbootstrap-classes -g -O2 -fsource-filename=/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava/classpath/lib/classes -fjni -findirect-dispatch -fno-indirect-classes -c @gnu-xml-pipeline.list  -fno-common -o .libs/gnu-xml-pipeline.o
Makefile:10240: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava'
Makefile:17122: recipe for target 'all-target-libjava' failed
make[1]: *** [all-target-libjava] Error 2
make[1]: Leaving directory '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir'
Makefile:20722: recipe for target 'bootstrap' failed
make: *** [bootstrap] Error 2
make: INTERNAL: Exiting with 1 jobserver tokens available; should be 16!
### execution of /tmp/fink.XbwhN failed, exit code 2
### execution of /tmp/fink.hta2O failed, exit code 2

suspect this is from r217374.
Comment 1 Jack Howarth 2014-11-12 02:31:41 UTC
This is for a build on x86_64-apple-darwin13 at r217383 with...

 ../gcc-5-20141111/configure --prefix=/sw --prefix=/sw/lib/gcc5.0 --mandir=/sw/share/man --infodir=/sw/lib/gcc5.0/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-isl=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-5.0

make -j 16 bootstrap
Comment 2 Jack Howarth 2014-11-12 03:46:01 UTC
Parallel make works in r217382.
Comment 3 Richard Biener 2014-11-12 09:58:03 UTC
I've also seen parallel bootstrap break once last week.  Of course hard to track...
Comment 4 Manuel López-Ibáñez 2014-11-12 12:23:18 UTC
I haven't tried to reproduce this yet, but I don't see how that patch could lead to this. What is actually the error that triggers that failure in make?
Comment 5 Jack Howarth 2014-11-12 12:38:05 UTC
This is extremely reproducible at r217383 on darwin and no other breakage in the parallel make has been seen this week prior to this commit. The accumulated error messages in the failing build are...

make[3]: *** read jobs pipe: No such file or directory.  Stop.
make[3]: *** Waiting for unfinished jobs....
...
ake[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/libjava'
Makefile:17122: recipe for target 'all-target-libjava' failed
make[1]: *** [all-target-libjava] Error 2
make[1]: Leaving directory '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir'
Makefile:20722: recipe for target 'bootstrap' failed
make: *** [bootstrap] Error 2
make: INTERNAL: Exiting with 1 jobserver tokens available; should be 16!

It looks like the failure occurs when the java classes are being compiled. Can we revert r217383 until the flaw in its handling of the parallel make is resolved? Having to build gcc serially is very painful.
Comment 6 Jack Howarth 2014-11-12 12:44:25 UTC
(In reply to Manuel López-Ibáñez from comment #4)
> I haven't tried to reproduce this yet, but I don't see how that patch could
> lead to this. What is actually the error that triggers that failure in make?

The only "trigger" I see is r217383 and 'make -j 16 bootstrap'. I have been building darwin bootstraps about 10 times a day for the past week and this failure never appeared prior to r217383. I do have GNU make 4.1 installed on this machine. I'll remove that in favor of the darwin system GNU make 3.81 and see if that suppresses the failure.
Comment 7 Manuel López-Ibáñez 2014-11-12 12:53:00 UTC
(In reply to howarth from comment #5)
> It looks like the failure occurs when the java classes are being compiled.
> Can we revert r217383 until the flaw in its handling of the parallel make is
> resolved? Having to build gcc serially is very painful.

Can you try reverting it locally and making a diff of the build logs? It doesn't make sense that a patch that only affects Fortran and does not change any build files breaks libjava, but I guess everything is possible. The errors that you mention do not clarify what is "missing/broken". Isn't there any error earlier produced by gcc or libtool?
Comment 8 Jack Howarth 2014-11-12 13:01:20 UTC
(In reply to Manuel López-Ibáñez from comment #7)
> (In reply to howarth from comment #5)
> > It looks like the failure occurs when the java classes are being compiled.
> > Can we revert r217383 until the flaw in its handling of the parallel make is
> > resolved? Having to build gcc serially is very painful.
> 
> Can you try reverting it locally and making a diff of the build logs? It
> doesn't make sense that a patch that only affects Fortran and does not
> change any build files breaks libjava, but I guess everything is possible.
> The errors that you mention do not clarify what is "missing/broken". Isn't
> there any error earlier produced by gcc or libtool?

Sorry about accidentally cc'ing you on this PR. This is breakage in the libjava parallel make introduced by r217383.
Comment 9 Jack Howarth 2014-11-12 13:11:24 UTC
Now I see why I accidentally cc'd Manu. This breakage occurred in the commit just prior to the jit commit which, as a fortune commit, indeed doesn't make much sense.
Comment 10 David Malcolm 2014-11-12 13:33:43 UTC
(In reply to howarth from comment #9)
> Now I see why I accidentally cc'd Manu. This breakage occurred in the commit
> just prior to the jit commit which, as a fortune commit, indeed doesn't make
> much sense.

Sorry, I'm getting confused.  Can you clarify:
* the latest revision on trunk you know of that succeeds, and
* the earliest revision on trunk you know of that fails?
to narrow things down.

[alternatively, do you still think this the fault of my jit commit? (r217374)]

Thanks
Comment 11 Jack Howarth 2014-11-12 13:48:57 UTC
(In reply to dmalcolm from comment #10)
> (In reply to howarth from comment #9)
> > Now I see why I accidentally cc'd Manu. This breakage occurred in the commit
> > just prior to the jit commit which, as a fortune commit, indeed doesn't make
> > much sense.
> 
> Sorry, I'm getting confused.  Can you clarify:
> * the latest revision on trunk you know of that succeeds, and
> * the earliest revision on trunk you know of that fails?
> to narrow things down.
> 
> [alternatively, do you still think this the fault of my jit commit?
> (r217374)]
> 
> Thanks

I am rerunning my tests. I can definitely say the bootstrap is broken post-r217814 in the libjava parallel make. Retesting if r217813 is in play. Also trying to make sure this problem is seen in a reduced build of --enable-languages=c,c++,java compared with the --enable-languages=c,c++,fortran,lto,objc,obj-c++,java which I normally test.
Comment 12 Manuel López-Ibáñez 2014-11-12 15:29:06 UTC
(In reply to howarth from comment #11)
> I am rerunning my tests. I can definitely say the bootstrap is broken
> post-r217814 in the libjava parallel make. Retesting if r217813 is in play.
> Also trying to make sure this problem is seen in a reduced build of
> --enable-languages=c,c++,java compared with the
> --enable-languages=c,c++,fortran,lto,objc,obj-c++,java which I normally test.

If you are using --enable-checking=release, perhaps you are seeing: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01240.html
Comment 13 Jack Howarth 2014-11-12 15:37:47 UTC
(In reply to Manuel López-Ibáñez from comment #12)
> (In reply to howarth from comment #11)
>
> If you are using --enable-checking=release, perhaps you are seeing:
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01240.html

I'm not using --enable-checking= at all so for trunk that should default to checking, no? So far, at r217382 and r217383 for the reduced language set of c,c++,java, I don't see the bootstrap failure. I am currently testing c,c++,fortran,java but my suspicion is that I may need the full language set of c,c++,fortran,lto,objc,obj-c++,java to reproduce this here. If so, then would imagine I am seeing a slight shift in the timing of the build what tickles a latent parallel make issue in libjava. Certainly every failure I have seem in...

make: INTERNAL: Exiting with 1 jobserver tokens available; should be 16!

appears associated with the build in the libjava directory.
Comment 14 Jack Howarth 2014-11-12 20:22:50 UTC
Closing as the parallel make issue seems to have gone latent again in todays svn pulls of gcc trunk.
Comment 15 Jack Howarth 2015-10-24 19:20:24 UTC
(In reply to Jack Howarth from comment #5)
> This is extremely reproducible at r217383 on darwin and no other breakage in
> the parallel make has been seen this week prior to this commit. The
> accumulated error messages in the failing build are...
> 
> make[3]: *** read jobs pipe: No such file or directory.  Stop.
> make[3]: *** Waiting for unfinished jobs....
> ...
> ake[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory
> '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/x86_64-apple-darwin13.4.0/
> libjava'
> Makefile:17122: recipe for target 'all-target-libjava' failed
> make[1]: *** [all-target-libjava] Error 2
> make[1]: Leaving directory
> '/sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir'
> Makefile:20722: recipe for target 'bootstrap' failed
> make: *** [bootstrap] Error 2
> make: INTERNAL: Exiting with 1 jobserver tokens available; should be 16!
> 
> It looks like the failure occurs when the java classes are being compiled.
> Can we revert r217383 until the flaw in its handling of the parallel make is
> resolved? Having to build gcc serially is very painful.

This issue has reappeared OS X 10.11 for make 4.1 built with NLS support when executed under the fink package manager using the system perl. The cause appears to be the indirect linkage of the CoreFoundation framework via libintl. The CoreFoundation frameworks sources don't contain any EINTR handling for interruptible system calls like read(), etc so there will be potential race conditions for programs using fork()/exec() like make.

radr://23248551 "The CoreFoundation framework and associated libraries aren't fork()/exec() safe"