I upgraded recently to gcc-4.3 and I'm finding trouble to execute my MPI programs. Indeed, when executing an MPI program with mpiexec sometimes it terminates correctly and sometimes it shows different error messages such as: rank 2 in job 1 mahmoud-desktop_33023 caused collective abort of all ranks exit status of rank 2: killed by signal 9 or [cli_1]: aborting job: Fatal error in MPI_Allreduce: Other MPI error, error stack: MPI_Allreduce(696)........................: MPI_Allreduce(sbuf=0x8103344, rbuf=0x8103348, count=1, MPI_UNSIGNED, MPI_SUM, MPI_COMM_WORLD) failed MPIR_Allreduce(285).......................: MPIC_Sendrecv(161)........................: MPIC_Wait(321)............................: MPIDI_CH3_Progress_wait(199)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(422): MPIDU_Socki_handle_read(649)..............: connection failure (set=0,sock=3,errno=104:(strerror() not found)) [cli_1]: aborting job: Fatal error in MPI_Finalize: Other MPI error, error stack: MPI_Finalize(220).........................: MPI_Finalize failed MPI_Finalize(146).........................: MPID_Finalize(206)........................: an error occurred while the device was waiting for all open connections to close MPIDI_CH3_Progress_wait(199)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait() MPIDI_CH3I_Progress_handle_sock_event(422): MPIDU_Socki_handle_read(649)..............: connection failure (set=0,sock=4,errno=104:(strerror() not found)) rank 3 in job 4 mahmoud-desktop_33023 caused collective abort of all ranks exit status of rank 3: killed by signal 9 rank 0 in job 4 mahmoud-desktop_33023 caused collective abort of all ranks exit status of rank 0: killed by signal 11 I'm failing to find a reason as my programs work fine with gcc 4.2. If it is a known bug that has been already fixed please send tell me how to fix it on my own machine. Best regards, Yours faithfully.
Sorry, I forgot to precise that I'm using a linux distribution ubuntu 8.10 on a DELL XPS desktop. The compiler was built with the following options: Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.2-1ubuntu11' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-targets=all --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11)
Without a self contained testcase, it is hard to decide if this is a bug in GCC or MPICH2.
Feedback not forthcoming.