[gomp4] Target fallback ICV handling, ICV fixes

Jakub Jelinek jakub@redhat.com
Fri Oct 4 20:15:00 GMT 2013


Hi!

I've committed the following patch to gomp-4.0-branch.

The omp-low.c changes are to fix some bugs with if clause on
#pragma omp target{, data, update}.
The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++.

The libgomp changes are:
1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment
   ICV, rather than global var
2) gomp_remaining_threads_count has been removed, instead as required by the
   spec ThreadsBusy from the spec is tracked per contention group inside
   of thread_pool; if there is just one contention group, then the new
   thr->thread_pool->threads_busy should be the difference between
   icv->thread_limit_var and the old gomp_remaining_threads_count;
   so, in gomp_resolve_num_threads now we add nthreads - 1 to it
   rather than subtracting it
3) apparently the old OMP_THREAD_LIMIT code was buggy, because
   GOMP_parallel_end was also subtracting from gomp_remaining_threads_count
   rather than adding to it (with the new code it is correct to subtract;
   when I get spare time I'll write a small alternative patch for the
   release branches together with thread-limit-1.c testcase)
4) as the threads_busy count is now per-contention group, if a parallel
   isn't nested, we actually don't need to atomically update the counter,
   because there is just one thread in the contention group
5) gomp_managed_threads counter remains to be a global var, that is used
   to decide about spinning length, that is desirable to be global and
   is not user observable thing covered by the standard; I've just
   renamed the mutex guarding it
6) for GOMP_target host fallback, the function will create a new initial
   thread by making a copy of the old TLS *gomp_thread () and clearing it
   (except for affinity place and reinitializing it's place var to the
   whole place list), then restoring back
7) I've noticed that &thr->release semaphore is never used for the master
   threads, so there is no point initializing it; we were initializing
   it just for the first initial thread, e.g. not in subsequent user
   pthread_create created threads that encounter #pragma omp constructs;
   and the semaphore wasn't ever destroyed
8) GOMP_teams is now implemented for the host fallback just by adjusting
   icv->thread_limit_var
9) on the target-7.c testcase I found several issues in the var remapping
   code (some fields could be uninitialized in certain cases)

Tested on x86_64-linux, committed.

2013-10-04  Jakub Jelinek  <jakub@redhat.com>

	* omp-low.c (expand_omp_target): When handling IF clause on
	#pragma omp target, split new_bb rather than entry_bb.  If
	not GF_OMP_TARGET_KIND_REGION, split new_bb right before
	the GOMP_TARGET stmt, rather than after labels.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to
	201307 instead of 201107.
libgomp/
	* libgomp.h (struct gomp_task_icv): Add thread_limit_var.
	(gomp_thread_limit_var, gomp_remaining_threads_count,
	gomp_remaining_threads_lock): Remove.
	(gomp_managed_threads_lock): New variable.
	(struct gomp_thread_pool): Add threads_busy field.
	(gomp_free_thread): New prototype.
	* parallel.c (gomp_resolve_num_threads): Adjust for
	thread_limit now being in icv->thread_limit_var.  Use
	UINT_MAX instead of ULONG_MAX as infinity.  If not nested,
	just return minimum of max_num_threads and icv->thread_limit_var
	and if thr->thread_pool, set threads_busy to the returned value.
	Otherwise, don't update atomically gomp_remaining_threads_count,
	but instead thr->thread_pool->threads_busy.
	(GOMP_parallel_end): Adjust for thread_limit now being in
	icv->thread_limit_var.  Use UINT_MAX instead of ULONG_MAX as
	infinity.  Adjust threads_busy in the pool rather than
	gomp_remaining_threads_count.  Remember team->nthreads and call
	gomp_team_end before adjusting threads_busy, if not nested
	afterwards, just set it to 1 non-atomically.
	* team.c (gomp_thread_start): Clear thr->thread_pool and
	thr->task before returning.
	(gomp_free_pool_helper): Clear thr->thread_pool and thr->task
	before calling pthread_exit.
	(gomp_free_thread): No longer static.  Use
	gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
	(gomp_team_start): Set thr->thread_pool->threads_busy to
	nthreads immediately after creating new pool.  Use
	gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
	(gomp_team_end): Use gomp_managed_threads_lock instead of
	gomp_remaining_threads_lock.
	(initialize_team): Don't call gomp_sem_init here.
	* env.c (gomp_global_icv): Initialize thread_limit_var field.
	(gomp_thread_limit_var, gomp_remaining_threads_count,
	gomp_remaining_threads_lock): Remove.
	(gomp_managed_threads_locks): New variable.
	(handle_omp_display_env): Adjust for thread_limit var being
	in gomp_global_icv.
	(initialize_env): Likewise.  If user provided value is > INT_MAX,
	set gomp_global_icv.thread_limit_var to UINT_MAX (infinity).
	Initialize gomp_managed_threads_lock instead of
	gomp_remaining_threads_lock.
	(omp_get_thread_limit): Adjust for thread_limit var in
	icv->thread_limit_var.
	* target.c: Include limits.h.
	(resolve_device): If device_id is < -1, return NULL.
	(gomp_map_vars): Make sure tgt->array, tgt->to_free, tgt->tgt_start,
	tgt->tgt_end and k->copy_from aren't left uninitialized.
	(GOMP_target): Arrange for host callback to be performed in a
	separate initial thread and contention group, inheriting ICVs from
	gomp_global_icv etc.
	(GOMP_teams): Adjust thread_limit_var ICV.
	* testsuite/libgomp.c/affinity-1.c (main): Call omp_set_dynamic (0).
	* testsuite/libgomp.c/target-3.c: New test.
	* testsuite/libgomp.c/target-4.c: New test.
	* testsuite/libgomp.c/target-5.c: New test.
	* testsuite/libgomp.c/target-6.c: New test.
	* testsuite/libgomp.c/target-7.c: New test.
	* testsuite/libgomp.c/thread-limit-1.c: New test.
	* testsuite/libgomp.c/thread-limit-2.c: New test.
	* testsuite/libgomp.c/thread-limit-3.c: New test.

--- gcc/omp-low.c.jj	2013-09-26 09:53:03.000000000 +0200
+++ gcc/omp-low.c	2013-10-04 21:39:19.927947940 +0200
@@ -7881,9 +7881,16 @@ expand_omp_target (struct omp_region *re
       tree tmp_var;
 
       tmp_var = create_tmp_var (TREE_TYPE (device), NULL);
-      e = split_block (entry_bb, NULL);
+      if (kind != GF_OMP_TARGET_KIND_REGION)
+	{
+	  gsi = gsi_last_bb (new_bb);
+	  gsi_prev (&gsi);
+	  e = split_block (new_bb, gsi_stmt (gsi));
+	}
+      else
+	e = split_block (new_bb, NULL);
       cond_bb = e->src;
-      entry_bb = e->dest;
+      new_bb = e->dest;
       remove_edge (e);
 
       then_bb = create_empty_bb (cond_bb);
@@ -7892,7 +7899,7 @@ expand_omp_target (struct omp_region *re
       set_immediate_dominator (CDI_DOMINATORS, else_bb, cond_bb);
 
       stmt = gimple_build_cond_empty (cond);
-      gsi = gsi_start_bb (cond_bb);
+      gsi = gsi_last_bb (cond_bb);
       gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
 
       gsi = gsi_start_bb (then_bb);
@@ -7911,8 +7918,8 @@ expand_omp_target (struct omp_region *re
 	  add_bb_to_loop (then_bb, cond_bb->loop_father);
 	  add_bb_to_loop (else_bb, cond_bb->loop_father);
 	}
-      make_edge (then_bb, entry_bb, EDGE_FALLTHRU);
-      make_edge (else_bb, entry_bb, EDGE_FALLTHRU);
+      make_edge (then_bb, new_bb, EDGE_FALLTHRU);
+      make_edge (else_bb, new_bb, EDGE_FALLTHRU);
 
       device = tmp_var;
     }
--- gcc/c-family/c-cppbuiltin.c.jj	2013-06-26 12:13:47.000000000 +0200
+++ gcc/c-family/c-cppbuiltin.c	2013-10-04 21:10:43.884067723 +0200
@@ -896,7 +896,7 @@ c_cpp_builtins (cpp_reader *pfile)
     cpp_define (pfile, "__SSP__=1");
 
   if (flag_openmp)
-    cpp_define (pfile, "_OPENMP=201107");
+    cpp_define (pfile, "_OPENMP=201307");
 
   if (int128_integer_type_node != NULL_TREE)
     builtin_define_type_sizeof ("__SIZEOF_INT128__",
--- libgomp/libgomp.h.jj	2013-10-04 10:28:53.000000000 +0200
+++ libgomp/libgomp.h	2013-10-04 13:48:39.460550732 +0200
@@ -232,6 +232,7 @@ struct gomp_task_icv
   enum gomp_schedule_type run_sched_var;
   int run_sched_modifier;
   int default_device_var;
+  unsigned int thread_limit_var;
   bool dyn_var;
   bool nest_var;
   char bind_var;
@@ -240,10 +241,8 @@ struct gomp_task_icv
 };
 
 extern struct gomp_task_icv gomp_global_icv;
-extern unsigned long gomp_thread_limit_var;
-extern unsigned long gomp_remaining_threads_count;
 #ifndef HAVE_SYNC_BUILTINS
-extern gomp_mutex_t gomp_remaining_threads_lock;
+extern gomp_mutex_t gomp_managed_threads_lock;
 #endif
 extern unsigned long gomp_max_active_levels_var;
 extern bool gomp_cancel_var;
@@ -431,6 +430,8 @@ struct gomp_thread_pool
   unsigned threads_size;
   unsigned threads_used;
   struct gomp_team *last_team;
+  /* Number of threads running in this contention group.  */
+  unsigned long threads_busy;
 
   /* This barrier holds and releases threads waiting in threads.  */
   gomp_barrier_t threads_dock;
@@ -580,6 +581,7 @@ extern struct gomp_team *gomp_new_team (
 extern void gomp_team_start (void (*) (void *), void *, unsigned,
 			     unsigned, struct gomp_team *);
 extern void gomp_team_end (void);
+extern void gomp_free_thread (void *);
 
 /* target.c */
 
--- libgomp/parallel.c.jj	2013-10-02 16:01:09.000000000 +0200
+++ libgomp/parallel.c	2013-10-04 14:54:47.843337279 +0200
@@ -37,18 +37,19 @@
 unsigned
 gomp_resolve_num_threads (unsigned specified, unsigned count)
 {
-  struct gomp_thread *thread = gomp_thread();
+  struct gomp_thread *thr = gomp_thread ();
   struct gomp_task_icv *icv;
   unsigned threads_requested, max_num_threads, num_threads;
-  unsigned long remaining;
+  unsigned long busy;
+  struct gomp_thread_pool *pool;
 
   icv = gomp_icv (false);
 
   if (specified == 1)
     return 1;
-  else if (thread->ts.active_level >= 1 && !icv->nest_var)
+  else if (thr->ts.active_level >= 1 && !icv->nest_var)
     return 1;
-  else if (thread->ts.active_level >= gomp_max_active_levels_var)
+  else if (thr->ts.active_level >= gomp_max_active_levels_var)
     return 1;
 
   /* If NUM_THREADS not specified, use nthreads_var.  */
@@ -72,30 +73,46 @@ gomp_resolve_num_threads (unsigned speci
 	max_num_threads = count;
     }
 
-  /* ULONG_MAX stands for infinity.  */
-  if (__builtin_expect (gomp_thread_limit_var == ULONG_MAX, 1)
+  /* UINT_MAX stands for infinity.  */
+  if (__builtin_expect (icv->thread_limit_var == UINT_MAX, 1)
       || max_num_threads == 1)
     return max_num_threads;
 
+  /* The threads_busy counter lives in thread_pool, if there
+     isn't a thread_pool yet, there must be just one thread
+     in the contention group.  If thr->team is NULL, this isn't
+     nested parallel, so there is just one thread in the
+     contention group as well, no need to handle it atomically.  */
+  pool = thr->thread_pool;
+  if (thr->ts.team == NULL)
+    {
+      num_threads = max_num_threads;
+      if (num_threads > icv->thread_limit_var)
+	num_threads = icv->thread_limit_var;
+      if (pool)
+	pool->threads_busy = num_threads;
+      return num_threads;
+    }
+
 #ifdef HAVE_SYNC_BUILTINS
   do
     {
-      remaining = gomp_remaining_threads_count;
+      busy = pool->threads_busy;
       num_threads = max_num_threads;
-      if (num_threads > remaining)
-	num_threads = remaining + 1;
+      if (icv->thread_limit_var - busy + 1 < num_threads)
+	num_threads = icv->thread_limit_var - busy + 1;
     }
-  while (__sync_val_compare_and_swap (&gomp_remaining_threads_count,
-				      remaining, remaining - num_threads + 1)
-	 != remaining);
+  while (__sync_val_compare_and_swap (&pool->threads_busy,
+				      busy, busy + num_threads - 1)
+	 != busy);
 #else
-  gomp_mutex_lock (&gomp_remaining_threads_lock);
+  gomp_mutex_lock (&gomp_managed_threads_lock);
   num_threads = max_num_threads;
-  remaining = gomp_remaining_threads_count;
-  if (num_threads > remaining)
-    num_threads = remaining + 1;
-  gomp_remaining_threads_count -= num_threads - 1;
-  gomp_mutex_unlock (&gomp_remaining_threads_lock);
+  busy = pool->threads_busy;
+  if (icv->thread_limit_var - busy + 1 < num_threads)
+    num_threads = icv->thread_limit_var - busy + 1;
+  pool->threads_busy += num_threads - 1;
+  gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
 
   return num_threads;
@@ -111,23 +128,34 @@ GOMP_parallel_start (void (*fn) (void *)
 void
 GOMP_parallel_end (void)
 {
-  if (__builtin_expect (gomp_thread_limit_var != ULONG_MAX, 0))
+  struct gomp_task_icv *icv = gomp_icv (false);
+  if (__builtin_expect (icv->thread_limit_var != UINT_MAX, 0))
     {
       struct gomp_thread *thr = gomp_thread ();
       struct gomp_team *team = thr->ts.team;
-      if (team && team->nthreads > 1)
+      unsigned int nthreads = team ? team->nthreads : 1;
+      gomp_team_end ();
+      if (nthreads > 1)
 	{
+	  /* If not nested, there is just one thread in the
+	     contention group left, no need for atomicity.  */
+	  if (thr->ts.team == NULL)
+	    thr->thread_pool->threads_busy = 1;
+	  else
+	    {
 #ifdef HAVE_SYNC_BUILTINS
-	  __sync_fetch_and_add (&gomp_remaining_threads_count,
-				1UL - team->nthreads);
+	      __sync_fetch_and_add (&thr->thread_pool->threads_busy,
+				    1UL - nthreads);
 #else
-	  gomp_mutex_lock (&gomp_remaining_threads_lock);
-	  gomp_remaining_threads_count -= team->nthreads - 1;
-	  gomp_mutex_unlock (&gomp_remaining_threads_lock);
+	      gomp_mutex_lock (&gomp_managed_threads_lock);
+	      thr->thread_pool->threads_busy -= nthreads - 1;
+	      gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
+	    }
 	}
     }
-  gomp_team_end ();
+  else
+    gomp_team_end ();
 }
 ialias (GOMP_parallel_end)
 
--- libgomp/team.c.jj	2013-10-03 16:27:34.000000000 +0200
+++ libgomp/team.c	2013-10-04 18:44:31.100494492 +0200
@@ -128,6 +128,8 @@ gomp_thread_start (void *xdata)
     }
 
   gomp_sem_destroy (&thr->release);
+  thr->thread_pool = NULL;
+  thr->task = NULL;
   return NULL;
 }
 
@@ -204,16 +206,19 @@ static struct gomp_thread_pool *gomp_new
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
+  struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool
     = (struct gomp_thread_pool *) thread_pool;
   gomp_barrier_wait_last (&pool->threads_dock);
-  gomp_sem_destroy (&gomp_thread ()->release);
+  gomp_sem_destroy (&thr->release);
+  thr->thread_pool = NULL;
+  thr->task = NULL;
   pthread_exit (NULL);
 }
 
 /* Free a thread pool and release its threads. */
 
-static void
+void
 gomp_free_thread (void *arg __attribute__((unused)))
 {
   struct gomp_thread *thr = gomp_thread ();
@@ -241,9 +246,9 @@ gomp_free_thread (void *arg __attribute_
 	  __sync_fetch_and_add (&gomp_managed_threads,
 				1L - pool->threads_used);
 #else
-	  gomp_mutex_lock (&gomp_remaining_threads_lock);
+	  gomp_mutex_lock (&gomp_managed_threads_lock);
 	  gomp_managed_threads -= pool->threads_used - 1L;
-	  gomp_mutex_unlock (&gomp_remaining_threads_lock);
+	  gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
 	}
       free (pool->threads);
@@ -285,6 +290,7 @@ gomp_team_start (void (*fn) (void *), vo
   if (__builtin_expect (thr->thread_pool == NULL, 0))
     {
       thr->thread_pool = gomp_new_thread_pool ();
+      thr->thread_pool->threads_busy = nthreads;
       pthread_setspecific (gomp_thread_destructor, thr);
     }
   pool = thr->thread_pool;
@@ -678,9 +684,9 @@ gomp_team_start (void (*fn) (void *), vo
 #ifdef HAVE_SYNC_BUILTINS
       __sync_fetch_and_add (&gomp_managed_threads, diff);
 #else
-      gomp_mutex_lock (&gomp_remaining_threads_lock);
+      gomp_mutex_lock (&gomp_managed_threads_lock);
       gomp_managed_threads += diff;
-      gomp_mutex_unlock (&gomp_remaining_threads_lock);
+      gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
     }
 
@@ -822,9 +828,9 @@ gomp_team_start (void (*fn) (void *), vo
 #ifdef HAVE_SYNC_BUILTINS
       __sync_fetch_and_add (&gomp_managed_threads, diff);
 #else
-      gomp_mutex_lock (&gomp_remaining_threads_lock);
+      gomp_mutex_lock (&gomp_managed_threads_lock);
       gomp_managed_threads += diff;
-      gomp_mutex_unlock (&gomp_remaining_threads_lock);
+      gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
     }
   if (__builtin_expect (affinity_thr != NULL, 0)
@@ -871,9 +877,9 @@ gomp_team_end (void)
 #ifdef HAVE_SYNC_BUILTINS
       __sync_fetch_and_add (&gomp_managed_threads, 1L - team->nthreads);
 #else
-      gomp_mutex_lock (&gomp_remaining_threads_lock);
+      gomp_mutex_lock (&gomp_managed_threads_lock);
       gomp_managed_threads -= team->nthreads - 1L;
-      gomp_mutex_unlock (&gomp_remaining_threads_lock);
+      gomp_mutex_unlock (&gomp_managed_threads_lock);
 #endif
       /* This barrier has gomp_barrier_wait_last counterparts
 	 and ensures the team can be safely destroyed.  */
@@ -914,8 +920,6 @@ gomp_team_end (void)
 static void __attribute__((constructor))
 initialize_team (void)
 {
-  struct gomp_thread *thr;
-
 #ifndef HAVE_TLS
   static struct gomp_thread initial_thread_tls_data;
 
@@ -925,13 +929,6 @@ initialize_team (void)
 
   if (pthread_key_create (&gomp_thread_destructor, gomp_free_thread) != 0)
     gomp_fatal ("could not create thread pool destructor.");
-
-#ifdef HAVE_TLS
-  thr = &gomp_tls_data;
-#else
-  thr = &initial_thread_tls_data;
-#endif
-  gomp_sem_init (&thr->release, 0);
 }
 
 static void __attribute__((destructor))
--- libgomp/env.c.jj	2013-10-04 10:28:53.000000000 +0200
+++ libgomp/env.c	2013-10-04 14:54:47.843337279 +0200
@@ -54,6 +54,7 @@
 
 struct gomp_task_icv gomp_global_icv = {
   .nthreads_var = 1,
+  .thread_limit_var = UINT_MAX,
   .run_sched_var = GFS_DYNAMIC,
   .run_sched_modifier = 1,
   .default_device_var = 0,
@@ -64,11 +65,9 @@ struct gomp_task_icv gomp_global_icv = {
 };
 
 unsigned long gomp_max_active_levels_var = INT_MAX;
-unsigned long gomp_thread_limit_var = ULONG_MAX;
 bool gomp_cancel_var = false;
-unsigned long gomp_remaining_threads_count;
 #ifndef HAVE_SYNC_BUILTINS
-gomp_mutex_t gomp_remaining_threads_lock;
+gomp_mutex_t gomp_managed_threads_lock;
 #endif
 unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1;
 unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
@@ -1126,8 +1125,8 @@ handle_omp_display_env (unsigned long st
   /* GOMP's default value is actually neither active nor passive.  */
   fprintf (stderr, "  OMP_WAIT_POLICY = '%s'\n",
 	   wait_policy > 0 ? "ACTIVE" : "PASSIVE");
-  fprintf (stderr, "  OMP_THREAD_LIMIT = '%lu'\n",
-	   gomp_thread_limit_var);
+  fprintf (stderr, "  OMP_THREAD_LIMIT = '%u'\n",
+	   gomp_global_icv.thread_limit_var);
   fprintf (stderr, "  OMP_MAX_ACTIVE_LEVELS = '%lu'\n",
 	   gomp_max_active_levels_var);
 
@@ -1156,7 +1155,7 @@ handle_omp_display_env (unsigned long st
 static void __attribute__((constructor))
 initialize_env (void)
 {
-  unsigned long stacksize;
+  unsigned long thread_limit_var, stacksize;
   int wait_policy;
 
   /* Do a compile time check that mkomp_h.pl did good job.  */
@@ -1169,11 +1168,13 @@ initialize_env (void)
   parse_int ("OMP_DEFAULT_DEVICE", &gomp_global_icv.default_device_var, true);
   parse_unsigned_long ("OMP_MAX_ACTIVE_LEVELS", &gomp_max_active_levels_var,
 		       true);
-  parse_unsigned_long ("OMP_THREAD_LIMIT", &gomp_thread_limit_var, false);
-  if (gomp_thread_limit_var != ULONG_MAX)
-    gomp_remaining_threads_count = gomp_thread_limit_var - 1;
+  if (parse_unsigned_long ("OMP_THREAD_LIMIT", &thread_limit_var, false))
+    {
+      gomp_global_icv.thread_limit_var
+	= thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
+    }
 #ifndef HAVE_SYNC_BUILTINS
-  gomp_mutex_init (&gomp_remaining_threads_lock);
+  gomp_mutex_init (&gomp_managed_threads_lock);
 #endif
   gomp_init_num_threads ();
   gomp_available_cpus = gomp_global_icv.nthreads_var;
@@ -1325,7 +1326,8 @@ omp_get_max_threads (void)
 int
 omp_get_thread_limit (void)
 {
-  return gomp_thread_limit_var > INT_MAX ? INT_MAX : gomp_thread_limit_var;
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->thread_limit_var > INT_MAX ? INT_MAX : icv->thread_limit_var;
 }
 
 void
--- libgomp/target.c.jj	2013-09-23 16:07:25.000000000 +0200
+++ libgomp/target.c	2013-10-04 21:20:22.013998048 +0200
@@ -26,6 +26,7 @@
    creation and termination.  */
 
 #include "libgomp.h"
+#include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
@@ -144,8 +145,9 @@ resolve_device (int device_id)
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
-  if (device_id >= gomp_get_num_devices ()
-      && device_id != 257)
+  if (device_id < 0
+      || (device_id >= gomp_get_num_devices ()
+	  && device_id != 257))
     return NULL;
 
   /* FIXME: Temporary hack for testing non-shared address spaces on host.  */
@@ -239,11 +241,18 @@ gomp_map_vars (struct gomp_device_descr
       tgt->tgt_start = (tgt->tgt_start + tgt_align - 1) & ~(tgt_align - 1);
       tgt->tgt_end = tgt->tgt_start + tgt_size;
     }
+  else
+    {
+      tgt->to_free = NULL;
+      tgt->tgt_start = 0;
+      tgt->tgt_end = 0;
+    }
 
   tgt_size = 0;
   if (is_target)
     tgt_size = mapnum * sizeof (void *);
 
+  tgt->array = NULL;
   if (not_found_cnt)
     {
       tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -273,6 +282,7 @@ gomp_map_vars (struct gomp_device_descr
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
+		k->copy_from = false;
 		if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
 		  k->copy_from = true;
 		k->refcount = 1;
@@ -475,13 +485,33 @@ GOMP_target (int device, void (*fn) (voi
   if (devicep == NULL)
     {
       /* Host fallback.  */
+      struct gomp_thread old_thr, *thr = gomp_thread ();
+      old_thr = *thr;
+      memset (thr, '\0', sizeof (*thr));
+      if (gomp_places_list)
+	{
+	  thr->place = old_thr.place;
+	  thr->ts.place_partition_len = gomp_places_list_len;
+	}
       fn (hostaddrs);
+      gomp_free_thread (thr);
+      *thr = old_thr;
       return;
     }
 
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+  struct gomp_thread old_thr, *thr = gomp_thread ();
+  old_thr = *thr;
+  memset (thr, '\0', sizeof (*thr));
+  if (gomp_places_list)
+    {
+      thr->place = old_thr.place;
+      thr->ts.place_partition_len = gomp_places_list_len;
+    }
   fn ((void *) tgt->tgt_start);
+  gomp_free_thread (thr);
+  *thr = old_thr;
   gomp_unmap_vars (tgt);
 }
 
@@ -541,6 +571,13 @@ GOMP_target_update (int device, const vo
 void
 GOMP_teams (unsigned int num_teams, unsigned int thread_limit)
 {
+  if (thread_limit)
+    {
+      struct gomp_task_icv *icv = gomp_icv (true);
+      icv->thread_limit_var
+	= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
+    }
+  (void) num_teams;
 }
 
 #ifdef PLUGIN_SUPPORT
--- libgomp/testsuite/libgomp.c/affinity-1.c.jj	2013-10-04 09:51:49.000000000 +0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c	2013-10-04 15:58:13.639326486 +0200
@@ -244,6 +244,7 @@ main ()
   print_affinity (places_array[test_places].places[0]);
   printf ("\n");
   omp_set_nested (1);
+  omp_set_dynamic (0);
 
   #pragma omp parallel if (0)
   {
--- libgomp/testsuite/libgomp.c/target-3.c.jj	2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-3.c	2013-10-04 17:14:07.560142600 +0200
@@ -0,0 +1,17 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  if (omp_get_level ())
+    abort ();
+  #pragma omp target if (0)
+  if (omp_get_level ())
+    abort ();
+  #pragma omp target if (0)
+  #pragma omp teams
+  if (omp_get_level ())
+    abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/target-4.c.jj	2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-4.c	2013-10-04 15:57:32.422553332 +0200
@@ -0,0 +1,14 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  omp_set_dynamic (0);
+  #pragma omp parallel num_threads (4)
+  #pragma omp target if (0)
+  #pragma omp single
+  if (omp_get_num_threads () != 1)
+    abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/target-5.c.jj	2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-5.c	2013-10-04 17:05:23.632931518 +0200
@@ -0,0 +1,83 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  int d_o = omp_get_dynamic ();
+  int n_o = omp_get_nested ();
+  omp_sched_t s_o;
+  int c_o;
+  omp_get_schedule (&s_o, &c_o);
+  int m_o = omp_get_max_threads ();
+  omp_set_dynamic (1);
+  omp_set_nested (1);
+  omp_set_schedule (omp_sched_static, 2);
+  omp_set_num_threads (4);
+  int d = omp_get_dynamic ();
+  int n = omp_get_nested ();
+  omp_sched_t s;
+  int c;
+  omp_get_schedule (&s, &c);
+  int m = omp_get_max_threads ();
+  if (!omp_is_initial_device ())
+    abort ();
+  #pragma omp target if (0)
+  {
+    omp_sched_t s_c;
+    int c_c;
+    omp_get_schedule (&s_c, &c_c);
+    if (d_o != omp_get_dynamic ()
+	|| n_o != omp_get_nested ()
+	|| s_o != s_c
+	|| c_o != c_c
+	|| m_o != omp_get_max_threads ())
+      abort ();
+    omp_set_dynamic (0);
+    omp_set_nested (0);
+    omp_set_schedule (omp_sched_dynamic, 4);
+    omp_set_num_threads (2);
+    if (!omp_is_initial_device ())
+      abort ();
+  }
+  if (!omp_is_initial_device ())
+    abort ();
+  omp_sched_t s_c;
+  int c_c;
+  omp_get_schedule (&s_c, &c_c);
+  if (d != omp_get_dynamic ()
+      || n != omp_get_nested ()
+      || s != s_c
+      || c != c_c
+      || m != omp_get_max_threads ())
+    abort ();
+  #pragma omp target if (0)
+  #pragma omp teams
+  {
+    omp_sched_t s_c;
+    int c_c;
+    omp_get_schedule (&s_c, &c_c);
+    if (d_o != omp_get_dynamic ()
+	|| n_o != omp_get_nested ()
+	|| s_o != s_c
+	|| c_o != c_c
+	|| m_o != omp_get_max_threads ())
+      abort ();
+    omp_set_dynamic (0);
+    omp_set_nested (0);
+    omp_set_schedule (omp_sched_dynamic, 4);
+    omp_set_num_threads (2);
+    if (!omp_is_initial_device ())
+      abort ();
+  }
+  if (!omp_is_initial_device ())
+    abort ();
+  omp_get_schedule (&s_c, &c_c);
+  if (d != omp_get_dynamic ()
+      || n != omp_get_nested ()
+      || s != s_c
+      || c != c_c
+      || m != omp_get_max_threads ())
+    abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/target-6.c.jj	2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-6.c	2013-10-04 18:12:54.097493969 +0200
@@ -0,0 +1,68 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+  omp_set_dynamic (0);
+  omp_set_nested (1);
+  if (omp_in_parallel ())
+    abort ();
+  #pragma omp parallel num_threads (3)
+    if (omp_get_thread_num () == 2)
+      {
+	if (!omp_in_parallel ())
+	  abort ();
+	#pragma omp parallel num_threads (3)
+	  if (omp_get_thread_num () == 1)
+	    {
+	      if (!omp_in_parallel ()
+		  || omp_get_level () != 2
+		  || omp_get_ancestor_thread_num (0) != 0
+		  || omp_get_ancestor_thread_num (1) != 2
+		  || omp_get_ancestor_thread_num (2) != 1
+		  || omp_get_ancestor_thread_num (3) != -1)
+		abort ();
+	      #pragma omp target if (0)
+		{
+		  if (omp_in_parallel ()
+		      || omp_get_level () != 0
+		      || omp_get_ancestor_thread_num (0) != 0
+		      || omp_get_ancestor_thread_num (1) != -1)
+		    abort ();
+		  #pragma omp parallel num_threads (2)
+		  {
+		    if (!omp_in_parallel ()
+			|| omp_get_level () != 1
+			|| omp_get_ancestor_thread_num (0) != 0
+			|| omp_get_ancestor_thread_num (1)
+			   != omp_get_thread_num ()
+			|| omp_get_ancestor_thread_num (2) != -1)
+		      abort ();
+		  }
+		}
+	      #pragma omp target if (0)
+		{
+		  #pragma omp teams thread_limit (2)
+		    {
+		      if (omp_in_parallel ()
+			  || omp_get_level () != 0
+			  || omp_get_ancestor_thread_num (0) != 0
+			  || omp_get_ancestor_thread_num (1) != -1)
+			abort ();
+		      #pragma omp parallel num_threads (2)
+		      {
+			if (!omp_in_parallel ()
+			    || omp_get_level () != 1
+			    || omp_get_ancestor_thread_num (0) != 0
+			    || omp_get_ancestor_thread_num (1)
+			       != omp_get_thread_num ()
+			    || omp_get_ancestor_thread_num (2) != -1)
+			  abort ();
+		      }
+		    }
+		}
+	    }
+      }
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/target-7.c.jj	2013-10-04 19:05:13.452909674 +0200
+++ libgomp/testsuite/libgomp.c/target-7.c	2013-10-04 21:47:21.814387228 +0200
@@ -0,0 +1,111 @@
+#include <omp.h>
+#include <stdlib.h>
+
+volatile int v;
+
+void
+foo (int f)
+{
+  int d = f ? omp_get_num_devices () : omp_get_default_device ();
+  int h = 5;
+  #pragma omp target device (d)
+  if (omp_get_level () != 0)
+    abort ();
+  #pragma omp target if (v > 1)
+  if (omp_get_level () != 0 || !omp_is_initial_device ())
+    abort ();
+  #pragma omp target device (d) if (v > 1)
+  if (omp_get_level () != 0 || !omp_is_initial_device ())
+    abort ();
+  #pragma omp target if (v <= 1)
+  if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+    abort ();
+  #pragma omp target device (d) if (v <= 1)
+  if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+    abort ();
+  #pragma omp target if (0)
+  if (omp_get_level () != 0 || !omp_is_initial_device ())
+    abort ();
+  #pragma omp target device (d) if (0)
+  if (omp_get_level () != 0 || !omp_is_initial_device ())
+    abort ();
+  #pragma omp target if (1)
+  if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+    abort ();
+  #pragma omp target device (d) if (1)
+  if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+    abort ();
+  #pragma omp target data device (d) map (to: h)
+  {
+    #pragma omp target device (d)
+    if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 5)
+      abort ();
+    #pragma omp target update device (d) from (h)
+  }
+  #pragma omp target data if (v > 1) map (to: h)
+  {
+    #pragma omp target if (v > 1)
+    if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 6)
+      abort ();
+    #pragma omp target update if (v > 1) from (h)
+  }
+  #pragma omp target data device (d) if (v > 1) map (to: h)
+  {
+    #pragma omp target device (d) if (v > 1)
+    if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 7)
+      abort ();
+    #pragma omp target update device (d) if (v > 1) from (h)
+  }
+  #pragma omp target data if (v <= 1) map (to: h)
+  {
+    #pragma omp target if (v <= 1)
+    if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 8)
+      abort ();
+    #pragma omp target update if (v <= 1) from (h)
+  }
+  #pragma omp target data device (d) if (v <= 1) map (to: h)
+  {
+    #pragma omp target device (d) if (v <= 1)
+    if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 9)
+      abort ();
+    #pragma omp target update device (d) if (v <= 1) from (h)
+  }
+  #pragma omp target data if (0) map (to: h)
+  {
+    #pragma omp target if (0)
+    if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 10)
+      abort ();
+    #pragma omp target update if (0) from (h)
+  }
+  #pragma omp target data device (d) if (0) map (to: h)
+  {
+    #pragma omp target device (d) if (0)
+    if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 11)
+      abort ();
+    #pragma omp target update device (d) if (0) from (h)
+  }
+  #pragma omp target data if (1) map (to: h)
+  {
+    #pragma omp target if (1)
+    if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 12)
+      abort ();
+    #pragma omp target update if (1) from (h)
+  }
+  #pragma omp target data device (d) if (1) map (to: h)
+  {
+    #pragma omp target device (d) if (1)
+    if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 13)
+      abort ();
+    #pragma omp target update device (d) if (1) from (h)
+  }
+  if (h != 14)
+    abort ();
+}
+
+int
+main ()
+{
+  foo (0);
+  foo (1);
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-1.c.jj	2013-10-04 15:36:59.363996164 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-1.c	2013-10-04 15:41:48.479547631 +0200
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_THREAD_LIMIT "6" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+
+int
+main ()
+{
+  if (omp_get_thread_limit () != 6)
+    return 0;
+  omp_set_dynamic (0);
+  omp_set_nested (1);
+  #pragma omp parallel num_threads (3)
+  if (omp_get_num_threads () != 3)
+    abort ();
+  #pragma omp parallel num_threads (3)
+  if (omp_get_num_threads () != 3)
+    abort ();
+  #pragma omp parallel num_threads (8)
+  if (omp_get_num_threads () > 6)
+    abort ();
+  #pragma omp parallel num_threads (6)
+  if (omp_get_num_threads () != 6)
+    abort ();
+  int cnt = 0;
+  #pragma omp parallel num_threads (5)
+  #pragma omp parallel num_threads (5)
+  #pragma omp parallel num_threads (2)
+  #pragma omp atomic
+  cnt++;
+  if (cnt > 6)
+    abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-2.c.jj	2013-10-04 15:42:52.229212860 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-2.c	2013-10-04 15:48:28.230486490 +0200
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_THREAD_LIMIT "9" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+
+int
+main ()
+{
+  if (omp_get_thread_limit () != 9)
+    return 0;
+  omp_set_dynamic (0);
+  #pragma omp parallel num_threads (8)
+  if (omp_get_num_threads () != 8)
+    abort ();
+  #pragma omp parallel num_threads (16)
+  if (omp_get_num_threads () > 9)
+    abort ();
+  #pragma omp target if (0)
+  #pragma omp teams thread_limit (6)
+  {
+    if (omp_get_thread_limit () > 6)
+      abort ();
+    if (omp_get_thread_limit () == 6)
+      {
+	omp_set_dynamic (0);
+	omp_set_nested (1);
+	#pragma omp parallel num_threads (3)
+	if (omp_get_num_threads () != 3)
+	  abort ();
+	#pragma omp parallel num_threads (3)
+	if (omp_get_num_threads () != 3)
+	  abort ();
+	#pragma omp parallel num_threads (8)
+	if (omp_get_num_threads () > 6)
+	  abort ();
+	#pragma omp parallel num_threads (6)
+	if (omp_get_num_threads () != 6)
+	  abort ();
+	int cnt = 0;
+	#pragma omp parallel num_threads (5)
+	#pragma omp parallel num_threads (5)
+	#pragma omp parallel num_threads (2)
+	#pragma omp atomic
+	cnt++;
+	if (cnt > 6)
+	  abort ();
+      }
+  }
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-3.c.jj	2013-10-04 15:54:02.289657212 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-3.c	2013-10-04 15:55:04.677326917 +0200
@@ -0,0 +1,12 @@
+#include <stdlib.h>
+#include <omp.h>
+
+int
+main ()
+{
+  #pragma omp target if (0)
+  #pragma omp teams thread_limit (1)
+  if (omp_get_thread_limit () != 1)
+    abort ();
+  return 0;
+}

	Jakub



More information about the Gcc-patches mailing list