User account creation filtered due to spam.
The new gcc.dg/atomic/c11-atomic-exec-5.c execution test fails on x86_64-apple-darwin13 for all optimization levels... Running /sw/src/fink.build/gcc49-4.9.0-1000/gcc-4.9-20131126/gcc/testsuite/gcc.dg/atomic/atomic.exp ... WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O0 execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O1 execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O2 execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O3 -fomit-frame-pointer execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O3 -fomit-frame-pointer -funroll-loops execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O3 -g execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -Os execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O2 -flto -flto-partition=none execution test WARNING: program timed out. FAIL: gcc.dg/atomic/c11-atomic-exec-5.c -O2 -flto execution test for gcc trunk at r205392 built with... Using built-in specs. COLLECT_GCC=gcc-fsf-4.9 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.9/libexec/gcc/x86_64-apple-darwin13.0.0/4.9.0/lto-wrapper Target: x86_64-apple-darwin13.0.0 Configured with: ../gcc-4.9-20131126/configure --prefix=/sw --prefix=/sw/lib/gcc4.9 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.9/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-isl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --enable-checking=yes --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.9 Thread model: posix gcc version 4.9.0 20131126 (experimental) (GCC)
I see these failures on darwin10 as well when doing a serial regtesting. They disappear if the tests are run in parallel (-j2 on a Core2Duo and -j8 on a Core i7). If I run the test c11-atomic-exec-5.c manually, the run time varies from ~10s on a loaded machine up to more than 50 minutes on an idle one (hence the time out). Note that the test fails on powerpc*-*-*, e.g., on powerpc-apple-darwin9: float_add_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_invalid_prev (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_overflow_prev (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_overflow_double (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_overflow_long_double (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_add_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_preinc_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_postinc_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_add_float_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_float_add_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_sub_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_sub_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_sub_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_sub_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_predec_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_postdec_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_sub_float_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_float_sub_overflow (a) 4999 pass, 0 fail; (b) 5000 pass, 1 fail float_mul_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_mul_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_mul_underflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_mul_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_mul_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_mul_float_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_float_mul_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_div_invalid_divbyzero (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_div_underflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_div_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail float_div_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail int_div_float_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_float_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_add_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_add_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_add_overflow_long_double (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail double_add_inexact (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail double_add_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_preinc_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_postinc_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_long_add_double_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_double_add_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_sub_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_sub_overflow (a) 5001 pass, 0 fail; (b) 4999 pass, 0 fail double_sub_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_sub_inexact_int (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail double_predec_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_postdec_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_long_sub_double_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_double_sub_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_mul_invalid (a) 4999 pass, 1 fail; (b) 5000 pass, 0 fail double_mul_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_mul_overflow_float (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_mul_underflow (a) 4999 pass, 0 fail; (b) 5000 pass, 1 fail double_mul_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_mul_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_long_mul_double_inexact (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail complex_double_mul_overflow (a) 4999 pass, 0 fail; (b) 5000 pass, 1 fail double_div_invalid_divbyzero (a) 4999 pass, 1 fail; (b) 5000 pass, 0 fail double_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_div_underflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail double_div_inexact (a) 5001 pass, 0 fail; (b) 4999 pass, 0 fail double_div_inexact_int (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail int_div_double_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_double_div_overflow (a) 4999 pass, 0 fail; (b) 5001 pass, 0 fail long_double_add_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_add_overflow (a) 5000 pass, 0 fail; (b) 0 pass, 5000 fail complex_long_double_add_overflow (a) 5000 pass, 0 fail; (b) 0 pass, 5000 fail long_double_sub_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_sub_overflow (a) 5001 pass, 0 fail; (b) 0 pass, 4999 fail complex_long_double_sub_overflow (a) 5000 pass, 0 fail; (b) 0 pass, 5000 fail long_double_mul_invalid (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_mul_overflow (a) 5001 pass, 0 fail; (b) 4999 pass, 0 fail long_double_mul_overflow_float (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_mul_overflow_double (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail long_double_mul_underflow (a) 4999 pass, 0 fail; (b) 5000 pass, 1 fail complex_long_double_mul_overflow (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail long_double_div_invalid_divbyzero (a) 4999 pass, 1 fail; (b) 5000 pass, 0 fail long_double_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_div_underflow (a) 5000 pass, 0 fail; (b) 4999 pass, 1 fail long_double_div_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail long_double_div_inexact_int (a) 4999 pass, 0 fail; (b) 5000 pass, 1 fail int_div_long_double_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail complex_long_double_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail
For powerpc see: http://gcc.gnu.org/ml/gcc/2013-11/msg00131.html - the failures indicate the architecture maintainers have not yet added this hook.
Created attachment 31451 [details] preprocessed source for gcc.dg/atomic/c11-atomic-exec-5.c -O0 on darwin12
Created attachment 31452 [details] assembly file for gcc.dg/atomic/c11-atomic-exec-5.c -O0 on darwin12
Added preprocessed source and assembly file generated with... /sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/gcc/ /sw/src/fink.build/gcc49-4.9.0-1000/gcc-4.9-20131216/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c -B/sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/x86_64-apple-darwin12.5.0/i386/libatomic/ -L/sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/x86_64-apple-darwin12.5.0/i386/libatomic/.libs -latomic -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L -lm -m32 -o ./c11-atomic-exec-5.exe --save-temps on x86_64-apple-darwin12. Can someone confirm that we have both support for floating-point exceptions and the required hook on darwin? If so, I suspect we are tickling a pthread bug on darwin.
> on x86_64-apple-darwin12. Can someone confirm that we have both support > for floating-point exceptions and the required hook on darwin? I cannot answer these questions. > If so, I suspect we are tickling a pthread bug on darwin. But as said in comment 2, the test succeeds on a loaded machine, i.e., with -j=n, where n is the number of available threads. So I share the suspicion.
Created attachment 31458 [details] reduced test case c11-atomic-exec_5.c reduced to the test of complex_long_double_add_overflow only. It takes between 35 to 40s on an unloaded machine (less than 1s for the full test on a fully loaded machine).
Created attachment 31459 [details] test without the complex instances The running time fluctuates between 1.6 and 7.5s on an unloaded machine.
I see the same issue on some Solaris 10/SPARC systems on UltraSPARC T2: The 32-bit -O0 test execution takes between an 12 min and an hour: yoda apoc mayon real 30:46.09 11.80 39.49 user 1:01:31.87 20.12 1:18.84 sys 0.25 1.91 0.07 The first numbers (yoda) are for 1.2 GHz UltraSPARC T2 running Solaris 10, the second set (apoc) for 1.35 GHz UltraSPARC IV running Solaris 11, and the third set (mayon) for 1.2 GHz UltraSPARC T2 running Solaris 11. When I run the tests manually, I see that only a few tests are very slow: complex_double_add_overflow, complex_double_sub_overflow, complex_double_mul_overflow, complex_double_div_overflow, and some more. Even if I reduce the test to just the complex_double_add_overflow case, it takes yoda mayon real 2:04.11 0.33 user 4:08.14 0.60 sys 0.03 0.01 Timing on mayon is quite varied, though, some runs taking 12.83 s. Running the test under plockstat to check for locking issues, the picture changes again: yoda between 5.48 s and 1:10.00 min, mayon between 10.14 s and 56.27 s. Not very conclusive yet; maybe something changes with adapting libatomic/config/posix/lock.c (PAGE_SIZE and CACHLINE_SIZE) to values appropriate for Solaris/SPARC. Rainer
(In reply to Rainer Orth from comment #9) > I see the same issue on some Solaris 10/SPARC systems on UltraSPARC T2: do you use the default mutex-based implementation for lib atomic? (I suspect that this is where the darwin slowness originates) if I configure --with-cpu=core2 (which allows 16b exchanges) the time drops from ~50m => 5m with the complex double tests dominating as you have.
> --- Comment #10 from Iain Sandoe <iains at gcc dot gnu.org> --- > (In reply to Rainer Orth from comment #9) >> I see the same issue on some Solaris 10/SPARC systems on UltraSPARC T2: > > do you use the default mutex-based implementation for lib atomic? I do, since this is the only option on SPARC. > (I suspect that this is where the darwin slowness originates) > > if I configure --with-cpu=core2 (which allows 16b exchanges) the time drops > from ~50m => 5m with the complex double tests dominating as you have. Even that seems to require ifunc support, which isn't supported on Solaris even with gld. Rainer
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #11) > > --- Comment #10 from Iain Sandoe <iains at gcc dot gnu.org> --- > > (In reply to Rainer Orth from comment #9) > >> I see the same issue on some Solaris 10/SPARC systems on UltraSPARC T2: > > > > do you use the default mutex-based implementation for lib atomic? > > I do, since this is the only option on SPARC. Do you repeat the findings we see on Darwin, where a heavily loaded system does not exhibit the slow-down? I was part-way through investigating whether spin locks would be a better solution for these very short code-segements. Essentially, the duration should only be a few insns. Available time is ever the killer.
> --- Comment #12 from Iain Sandoe <iains at gcc dot gnu.org> --- [...] > Do you repeat the findings we see on Darwin, where a heavily loaded system does > not exhibit the slow-down? no, I see it both on unloaded and heavily loaded systems. Even on an idle system, the runtime varies by a magnitude or more. Rainer
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #13) > > --- Comment #12 from Iain Sandoe <iains at gcc dot gnu.org> --- > [...] > > Do you repeat the findings we see on Darwin, where a heavily loaded system does > > not exhibit the slow-down? > > no, I see it both on unloaded and heavily loaded systems. Even on an > idle system, the runtime varies by a magnitude or more. so the open question is whether there's a fault in the fall-back solution - or whether it's fundamentally incapable of delivering reasonable performance (at least on some non-linux platforms).
> --- Comment #14 from Iain Sandoe <iains at gcc dot gnu.org> --- > (In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #13) [...] > so the open question is whether there's a fault in the fall-back solution - or > whether it's fundamentally incapable of delivering reasonable performance (at > least on some non-linux platforms). I don't think so: on identical hardware, the test performs reasonably well on Solaris 11, but is sometimes slow as molasses on Solaris 10. Rather looks like a libc bug there. Rainer
> Even that seems to require ifunc support, which isn't supported on Solaris > even with gld. AFAICR ifunc is not supported on darwin. I have posted at http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg00228.html the results (serial tests) for x86_64-apple-darwin13 configured with '--with-arch=core2 --with-cpu=core' and the failures are still there.
Created attachment 32056 [details] sampling of one of the runs.
sparc-sun-solaris2.10 is a primary arch, making P1 for now. As sparc implements the hook Joseph mentions is this merely a testsuite issue (sparc being "slow")?
(In reply to Richard Biener from comment #18) > sparc-sun-solaris2.10 is a primary arch, making P1 for now. As sparc > implements > the hook Joseph mentions is this merely a testsuite issue (sparc being > "slow")? In Darwin's case, I don't believe it is (simply) a test-suite issue; Rather it is connected with the implementation of pthread-based locking in libatomic when entities larger than those natively-supported are used. So, for example, if libatomic is configured to use a machine supporting cmpxchg16b, then test-time drops from 50mins -> 1min (c.f. configuring without cmpxchg16b). Probing the stalled cases, shows that things are stuck in mutex code. I started looking at the (default) posix implementation of the locking in libatomic (partly to see if there was a more BSD-esque way to do it). However, I'm out of time for the next couple of weeks. Two things (in the posix libatomic implementation) that might bear more examination: 1) adjacent entities that happen to fall within one cache line size (which would apply to two 32byte numbers stored consecutively, for x86) get the same hash ID. I wonder if that can introduce a vulnerability. 2) If the alignment of an entity is < its size, AFAICT the entity could span two hash IDs without this being detected [the evaluation is carried out modulo size without considering alignment]. === On darwin it's possible to resolve the issue by replacing the pthread_mutex_lock()s with while ((err = pthead_mutex_trylock(…)) != 0) if (err == …) abort(); .. which might indicate an underlying issue with the implementation of pthreads (or it might simply modify the behaviour enough to cause some other vulnerability to be hidden). -- I don't know if the same approach (spinning on try lock) would resolve the issue on Solaris, or (particularly) how to interpret the findings yet.
> sparc-sun-solaris2.10 is a primary arch, making P1 for now. As sparc > implements > the hook Joseph mentions is this merely a testsuite issue (sparc being > "slow")? Yes, it passes on my machines, except if they are under heavy load, please downgrade this back to P3.
Then it's P4 for x86_64-apple-darwin13. Please file a separate bug for the -sparc case then.
GCC 4.9.0 has been released
GCC 4.9.1 has been released.
For the record, this PR is not fixed by the patch posted at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01127.html.
GCC 4.9.2 has been released.
GCC 4.9.3 has been released.
I've noticed timeouts on aarch64 too, where the hook is implemented IIUC. Run-time varies a lot too: from 5 minutes to ~1h, with the same binary, same machine, where I am the only user.
(In reply to Christophe Lyon from comment #27) > I've noticed timeouts on aarch64 too, where the hook is implemented IIUC. > > Run-time varies a lot too: from 5 minutes to ~1h, with the same binary, same > machine, where I am the only user. Me too on a ThunderX, I thought it was due to an hardware errata too (where load acquire was not a memory barrier after a store release).
(In reply to Andrew Pinski from comment #28) > Me too on a ThunderX, I thought it was due to an hardware errata too (where > load acquire was not a memory barrier after a store release). The problem turns out that pthread_mutex_lock/unlock is not fair. So what is happening is the newly created thread (which does the stores) will happen to get the lock more often than the other thread which is doing the arithmetic operations and is the time thread which is keeping count. There are a few ways of fixing this. One is to loop on try lock for a few thousand times before falling through to the full mutex_lock [Really this should be done this way in libc]. The other way is to use spin locks (which does not fix darwin as darwin does not have pthread spinlocks).
GCC 4.9 branch is being closed