Since r245745 I see the following failure on x86_64-apple-darwin16 with -m32/64 FAIL: libgomp.fortran/strassen.f90 -O execution test This is a latent bug exposed by this revisions and which appeared between revisions r242391 (2016-11-14, OK) and r242872 (2016-11-25, Illegal instruction) when the test is compiled with -finline-matmul-limit<32 (I didn't test the 32 values, but only 0, 16, and 31). % /opt/gcc/gcc7p-242872p3/bin/gfortran -O2 strassen_red.f90 -finline-matmul-limit=31 -fopenmp % ./a.out Illegal instruction The test succeeds with OMP_NUM_THREADS set to 1 on a machine with four cores, eight threads. I don't see the failure on another machine with darwin10 and two cores/threads. The test strassen_red.f90 is program strassen_matmul use omp_lib integer, parameter :: N = 1024 double precision, save :: A(N,N), B(N,N), C(N,N), D(N,N) double precision :: start, end call random_seed call random_number (A) contains recursive subroutine strassen (A, B, C, N) integer, intent(in) :: N double precision, intent(in) :: A(N,N), B(N,N) double precision, intent(out) :: C(N,N) double precision :: T(N/2,N/2,7) integer :: K, L if (iand (N,1) .ne. 0 .or. N < 64) then C = matmul (A, B) return end if K = N / 2 L = N / 2 + 1 !$omp task shared (A, B, T) call strassen (A(:K,:K) + A(L:,L:), B(:K,:K) + B(L:,L:), T(:,:,1), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(L:,:K) + A(L:,L:), B(:K,:K), T(:,:,2), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(:K,:K), B(:K,L:) - B(L:,L:), T(:,:,3), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(L:,L:), B(L:,:K) - B(:K,:K), T(:,:,4), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(:K,:K) + A(:K,L:), B(L:,L:), T(:,:,5), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(L:,:K) - A(:K,:K), B(:K,:K) + B(:K,L:), T(:,:,6), K) !$omp end task !$omp task shared (A, B, T) call strassen (A(:K,L:) - A(L:,L:), B(L:,:K) + B(L:,L:), T(:,:,7), K) !$omp end task !$omp taskwait C(:K,:K) = T(:,:,1) + T(:,:,4) - T(:,:,5) + T(:,:,7) C(L:,:K) = T(:,:,2) + T(:,:,4) C(:K,L:) = T(:,:,3) + T(:,:,5) C(L:,L:) = T(:,:,1) - T(:,:,2) + T(:,:,3) + T(:,:,6) end subroutine strassen end
Revisions r242462 and r242518 are within the range. The failure triggers also if the tests are compiled with -O0 or with -fno-frontend-optimize.
(In reply to Dominique d'Humieres from comment #1) > Revisions r242462 and r242518 are within the range. > > The failure triggers also if the tests are compiled with -O0 or with > -fno-frontend-optimize. A few questions: - Still happening after r245839 (which fixed a potential race condition)? - What / where is the illegal instruction happening? Is it in matmul_*_avx, matmul_*_avx2, ...? - What CPU are you running? - If you print __cpu_model.__cpu_vendor and __cpu_model.__cpu_features[0] from a debugger, what is the output? - What is the stack size for each thread? Are you maybe running into a stack limitation?
On my Linux system, I can get a crash with OMP_STACKSIZE=500k ./a.out and successfull execution with OMP_STACKSIZE=1M ./a.out What happens if you try these commands?
> Le 6 mars 2017 à 16:55, tkoenig at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org> a écrit : > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79876 > > --- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- > On my Linux system, I can get a crash with > > OMP_STACKSIZE=500k ./a.out > > and successfull execution with > > OMP_STACKSIZE=1M ./a.out > > What happens if you try these commands? I see the same behavior: crash up to 799k, successful execution above 800k. Dominique > > -- > You are receiving this mail because: > You reported the bug.
Does Darwin have so ridiculously low stack size limits by default, or is that just some setting you've done?
> Does Darwin have so ridiculously low stack size limits by default, > or is that just some setting you've done? On darwin1(0|6) ulimit gives stack size (kbytes, -s) 65532 and the environment variable OMP_STACKSIZE is empty. Note that on darwin10 (revision r245900) the reduced test succeeds with OMP_STACKSIZE set to 80k and for smaller values I get an error libgomp: Stack size larger than system limit at run time.
stack size (kbytes, -s) 65532 is 64M, that should be more than enough. Of course unless darwin pthread_create does something dumb, like using far smaller stack sizes for threads by default. In that case, the question is if we shouldn't override that on darwin to something larger, or just use ! { dg-set-target-env-var OMP_STACKSIZE "8M" } on the test. Note that dg-set-target-env-var doesn't (yet?) support target specification as 3rd argument, so I'd prefer not to add that. Or we could add -finline-matmul-limit=64 or whatever works.
I have read people complaining about very low OMP stack sizes on OSX. Should we set this to a more reasonable default value in libgfortran? Less than 800k is quite ridiculous. Alternatively, is there a way to detect stack overrun in a case like this, and abort with something more helpful?
> I have read people complaining about very low OMP stack sizes > on OSX. What is setting the limit? > Should we set this to a more reasonable default value in libgfortran? > Less than 800k is quite ridiculous. If possible, it would be a great idea. > Alternatively, is there a way to detect stack overrun in a case like > this, and abort with something more helpful? Indeed, but is it worth the pain? Final note, it would be interesting to understand why r242391 is OK, but not r242872.
(In reply to Dominique d'Humieres from comment #9) > > I have read people complaining about very low OMP stack sizes > > on OSX. > > What is setting the limit? Probably OSX thread library? Can you try something like: #define _GNU_SOURCE #include <limits.h> #include <stdlib.h> #include <stdio.h> #include <sys/time.h> #include <sys/resource.h> #include <pthread.h> static void * tf (void *arg) { #ifdef __APPLE__ fprintf (stderr, "other thread pthread_get_stacksize_np %lld\n", pthread_get_stacksize_np (pthread_self ())); #else pthread_attr_t at; pthread_getattr_np (pthread_self (), &at); size_t sz; pthread_attr_getstacksize (&at, &sz); fprintf (stderr, "other thread pthread_getattr_np + " "pthread_attr_getstacksize %zd\n", sz); pthread_attr_destroy (&at); #endif return NULL; } int main () { struct rlimit rl; if (getrlimit (RLIMIT_STACK, &rl)) fprintf (stderr, "err1\n"); else fprintf (stderr, "RLIMIT_STACK cur %lld max %lld\n", (long long) rl.rlim_cur, (long long) rl.rlim_max); fprintf (stderr, "PTHREAD_STACK_MIN %lld\n", (long long) PTHREAD_STACK_MIN); #ifdef __APPLE__ fprintf (stderr, "main thread pthread_get_stacksize_np %lld\n", pthread_get_stacksize_np (pthread_self ())); #else pthread_attr_t at; pthread_getattr_np (pthread_self (), &at); size_t sz; pthread_attr_getstacksize (&at, &sz); fprintf (stderr, "main thread pthread_getattr_np + " "pthread_attr_getstacksize %zd\n", sz); pthread_attr_destroy (&at); #endif pthread_t th; pthread_create (&th, NULL, tf, NULL); pthread_join (th, NULL); pthread_attr_t at2; pthread_attr_init (&at2); pthread_attr_setstacksize (&at2, 2048 * 1024); pthread_create (&th, &at2, tf, NULL); pthread_join (th, NULL); pthread_attr_destroy (&at2); return 0; } and see what it prints?
> and see what it prints? RLIMIT_STACK cur 67104768 max 67104768 PTHREAD_STACK_MIN 8192 main thread pthread_get_stacksize_np 67104768 other thread pthread_get_stacksize_np 524288 other thread pthread_get_stacksize_np 2097152 and it does not depend on OMP_STACKSIZE.
Of course, the testcase was not using libgomp at all and OMP_STACKSIZE is an env var used by libgomp only. So, this means whatever darwin libpthread are using is using extremely small default for the pthread_create stack size (.5MB) unless overridden (i.e. when pthread_create is called with NULL as attribute; maybe also when called with pthread_attr_init created attr without pthread_attr_setstacksize - you could test this by commenting out pthread_attr_setstacksize (&at2, 2048 * 1024); line, recompile/relink/rerun the testcase. Can you try some other darwin versions if you have access to them?
Seems libgomp has a bug that without {,G}OMP_STACKSIZE and with OMP_DISPLAY_ENV it displays random (uninitialized) value for the stack size, so I need to test something like: 2017-03-30 Jakub Jelinek <jakub@redhat.com> * env.c (initialize_env): Initialize stacksize to 0. --- libgomp/env.c.jj 2017-01-01 12:45:52.000000000 +0100 +++ libgomp/env.c 2017-03-30 16:07:29.800247913 +0200 @@ -1187,7 +1187,7 @@ handle_omp_display_env (unsigned long st static void __attribute__((constructor)) initialize_env (void) { - unsigned long thread_limit_var, stacksize; + unsigned long thread_limit_var, stacksize = 0; int wait_policy; /* Do a compile time check that mkomp_h.pl did good job. */ If we figure out what Darwin versions have so insanely low defaults and agree on some more reasonable one (1M or 2M?), then we could change this to something like: stacksize = GOMP_DEFAULT_STACKSIZE; and later on where it parse_stacksize add || stacksize to the condition last, then define GOMP_DEFAULT_STACKSIZE 0 by default and let some config/darwin/ header override it.
Created attachment 41098 [details] gcc7-pr79876.patch Found https://developer.apple.com/library/content/qa/qa1419/_index.html that documents the default to be 512KB which is way too low (though it is unclear if it is the same on all Darwin versions). Anyway, here is an untested patch to bump the default on Darwin only.
> Anyway, here is an untested patch to bump the default on Darwin only. The patch fixes the issue on darwin16. Testing on darwin10 in progress.
Author: jakub Date: Tue Apr 4 10:41:33 2017 New Revision: 246675 URL: https://gcc.gnu.org/viewcvs?rev=246675&root=gcc&view=rev Log: PR libgomp/79876 * config/posix/thread-stacksize.h: New file. * config/darwin/thread-stacksize.h: New file. * config/nvptx/thread-stacksize.h: New file. * env.c: Include thread-stacksize.h. (initialize_env): Initialize stacksize to GOMP_DEFAULT_STACKSIZE instead of 0. Call pthread_attr_setstacksize even if GOMP_DEFAULT_STACKSIZE is non-zero. Added: trunk/libgomp/config/darwin/thread-stacksize.h trunk/libgomp/config/nvptx/thread-stacksize.h trunk/libgomp/config/posix/thread-stacksize.h Modified: trunk/libgomp/ChangeLog trunk/libgomp/env.c
Fixed.