This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[gomp4] OpenMP 4.0 affinity - OMP_PLACES and OMP_PROC_BIND


Hi!

I've committed the following patch to implement OpenMP 4.0 (as amended
by ticket #268) affinity.  For non-Linux (and --disable-linux-futex)
the affinity support is just stubbed, interested maintainers can add
similar support if their OSes provide something.
As the env var parsing needs to work with cpu lists (affinity bitmasks),
but I didn't want to create yet another data structures for it for
generic representation and then transform to target specific affinity masks,
all the details are hidden inside of affinity.c, there is
gomp_affinity_alloc function that returns a void * array of COUNT cpus
if successful and the generic code just passes the void * pointers back
to affinity.c routines to set cpus/remove cpu/copy with adjustment/install
the affinity mask, the generic code can reorder the pointers.  It is
expected that affinity.c allocates whatever data structures it needs
follow the void * pointers in the payload after the array, so that a simple
free will free it all (if needed, we could add some gomp_affinity_free
to destruct it instead).

GOMP_CPU_AFFINITY is rewritten into treating each of the cpus in the list as
a single place containing just that CPU, and omp_proc_bind_true is
implemented similarly to omp_proc_bind_close, except that if we don't find
an old thread at the right place, we don't make a big deal of it and just
use it anyway (unless it has place outside of its subpartition list).

Tested on x86_64-linux, additionally tested with HAVE_PTHREAD_AFFINITY_NP
undefined in config.h and additionally tested with --disable-linux-futex.

2013-10-04  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_team_state): Add place_partition_off
	and place_partition_len fields.
	(struct gomp_task_icv): Add bind_var field.
	(gomp_bind_var_list, gomp_bind_var_list_len, gomp_places_list,
	gomp_places_list_len): New extern decls.
	(struct gomp_thread): Add place field.
	(gomp_cpu_affinity, gomp_cpu_affinity_len): Remove.
	(gomp_init_thread_affinity): Add place argument.
	(gomp_affinity_alloc, gomp_affinity_init_place, gomp_affinity_add_cpus,
	gomp_affinity_remove_cpu, gomp_affinity_copy_place,
	gomp_affinity_same_place, gomp_affinity_finalize_place_list,
	gomp_affinity_init_level, gomp_affinity_print_place): New prototypes.
	(gomp_team_start): Add flags argument.
	* team.c (struct gomp_thread_start_data): Add place field.
	(gomp_thread_start): Initialize thr->place.
	(gomp_team_start): Add flags argument.  Handle OpenMP 4.0 affinity.
	* env.c (gomp_global_icv): Initialize bind_var field.
	(gomp_cpu_affinity, gomp_cpu_affinity_len): Remove.
	(gomp_bind_var_list, gomp_bind_var_list_len, gomp_places_list,
	gomp_places_list_len): New variables.
	(parse_bind_var, parse_one_place, parse_places_var): New functions.
	(parse_affinity): Rewritten to construct OMP_PLACES list with unit
	sized places.
	(handle_omp_display_env): Remove proc_bind argument.  Set display
	to true for OMP_DISPLAY_ENV=true.  Print 201307 instead of 201107
	as _OPENMP.  Enhance printing of OMP_PROC_BIND, add printing
	of OMP_PLACES, don't print anything for GOMP_CPU_AFFINITY.
	(initialize_env): Use parse_bind_var to parse OMP_PROC_BIND
	instead of parse_boolean.  Use parse_places_var for OMP_PLACES
	parsing.  Don't call parse_affinity if OMP_PLACES has been
	successfully parsed (and call gomp_init_affinity in that case).
	Adjust handle_omp_display_env caller.
	(omp_get_proc_bind): Return bind_var ICV.
	* config/posix/affinity.c (gomp_init_thread_affinity): Add place
	argument.
	(gomp_affinity_alloc, gomp_affinity_init_place, gomp_affinity_add_cpus,
	gomp_affinity_remove_cpu, gomp_affinity_copy_place,
	gomp_affinity_same_place, gomp_affinity_finalize_place_list,
	gomp_affinity_init_level, gomp_affinity_print_place): New stubs.
	* config/linux/proc.c (gomp_cpuset_popcount): Fix up check extern decl.
	(gomp_init_num_threads): If HAVE_PTHREAD_AFFINITY_NP isn't defined,
	fix up sizeof argument.  Free and clear gomp_cpusetp if it didn't
	contain any logical CPUs.
	(get_num_procs): Check gomp_places_list instead of gomp_cpu_affinity.
	* config/linux/affinity.c: Include errno.h, stdio.h and string.h.
	(affinity_counter): Remove.
	(CPU_CLR_S): Define if CPU_ALLOC_SIZE is not defined.
	(gomp_init_affinity): Rewritten, if gomp_places_list is NULL, try
	silently create OMP_PLACES=threads, if it is non-NULL afterwards,
	bind current thread to the first place.
	(gomp_init_thread_affinity): Rewritten.  Add place argument, just
	pthread_setaffinity_np to gomp_places_list[place].
	(gomp_affinity_alloc, gomp_affinity_init_place, gomp_affinity_add_cpus,
	gomp_affinity_remove_cpu, gomp_affinity_copy_place,
	gomp_affinity_same_place, gomp_affinity_finalize_place_list,
	gomp_affinity_init_level, gomp_affinity_print_place): New functions.
	* parallel.c (GOMP_parallel_start): Adjust gomp_team_start caller.
	(GOMP_parallel): Likewise, pass through flags parameter to it.
	* sections.c (GOMP_parallel_sections_start): Adjust gomp_team_start
	caller.
	(GOMP_parallel_sections): Likewise, pass through flags parameter to it.
	* loop.c (gomp_parallel_loop_start): Add flags argument, pass it through
	to gomp_team_start.
	(GOMP_parallel_loop_static_start, GOMP_parallel_loop_dynamic_start,
	GOMP_parallel_loop_guided_start, GOMP_parallel_loop_runtime_start):
	Adjust gomp_parallel_loop_start callers.
	(GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
	GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): Likewise, pass
	through flags parameter to it.
	* testsuite/libgomp.c/affinity-1.c: New test.
	* testsuite/libgomp.c++/affinity-1.C: New test.

--- libgomp/libgomp.h.jj	2013-09-27 12:04:12.000000000 +0200
+++ libgomp/libgomp.h	2013-10-02 19:29:56.584401613 +0200
@@ -202,6 +202,10 @@ struct gomp_team_state
   /* Active nesting level.  Only active parallel regions are counted.  */
   unsigned active_level;
 
+  /* Place-partition-var, offset and length into gomp_places_list array.  */
+  unsigned place_partition_off;
+  unsigned place_partition_len;
+
 #ifdef HAVE_SYNC_BUILTINS
   /* Number of single stmts encountered.  */
   unsigned long single_count;
@@ -230,6 +234,7 @@ struct gomp_task_icv
   int default_device_var;
   bool dyn_var;
   bool nest_var;
+  char bind_var;
   /* Internal ICV.  */
   struct target_mem_desc *target_data;
 };
@@ -245,6 +250,10 @@ extern bool gomp_cancel_var;
 extern unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
 extern unsigned long gomp_available_cpus, gomp_managed_threads;
 extern unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len;
+extern char *gomp_bind_var_list;
+extern unsigned long gomp_bind_var_list_len;
+extern void **gomp_places_list;
+extern unsigned long gomp_places_list_len;
 
 enum gomp_task_kind
 {
@@ -405,7 +414,11 @@ struct gomp_thread
   /* This semaphore is used for ordered loops.  */
   gomp_sem_t release;
 
-  /* user pthread thread pool */
+  /* Place this thread is bound to plus one, or zero if not bound
+     to any place.  */
+  unsigned int place;
+
+  /* User pthread thread pool */
   struct gomp_thread_pool *thread_pool;
 };
 
@@ -467,17 +480,22 @@ static inline struct gomp_task_icv *gomp
 /* The attributes to be used during thread creation.  */
 extern pthread_attr_t gomp_thread_attr;
 
-/* Other variables.  */
-
-extern unsigned short *gomp_cpu_affinity;
-extern size_t gomp_cpu_affinity_len;
-
 /* Function prototypes.  */
 
 /* affinity.c */
 
 extern void gomp_init_affinity (void);
-extern void gomp_init_thread_affinity (pthread_attr_t *);
+extern void gomp_init_thread_affinity (pthread_attr_t *, unsigned int);
+extern void **gomp_affinity_alloc (unsigned long, bool);
+extern void gomp_affinity_init_place (void *);
+extern bool gomp_affinity_add_cpus (void *, unsigned long, unsigned long,
+				    long, bool);
+extern bool gomp_affinity_remove_cpu (void *, unsigned long);
+extern bool gomp_affinity_copy_place (void *, void *, long);
+extern bool gomp_affinity_same_place (void *, void *);
+extern bool gomp_affinity_finalize_place_list (bool);
+extern bool gomp_affinity_init_level (int, unsigned long, bool);
+extern void gomp_affinity_print_place (void *);
 
 /* alloc.c */
 
@@ -560,7 +578,7 @@ gomp_finish_task (struct gomp_task *task
 
 extern struct gomp_team *gomp_new_team (unsigned);
 extern void gomp_team_start (void (*) (void *), void *, unsigned,
-			     struct gomp_team *);
+			     unsigned, struct gomp_team *);
 extern void gomp_team_end (void);
 
 /* target.c */
--- libgomp/team.c.jj	2013-09-24 12:52:53.000000000 +0200
+++ libgomp/team.c	2013-10-03 16:27:34.522387742 +0200
@@ -53,6 +53,7 @@ struct gomp_thread_start_data
   struct gomp_team_state ts;
   struct gomp_task *task;
   struct gomp_thread_pool *thread_pool;
+  unsigned int place;
   bool nested;
 };
 
@@ -84,6 +85,7 @@ gomp_thread_start (void *xdata)
   thr->thread_pool = data->thread_pool;
   thr->ts = data->ts;
   thr->task = data->task;
+  thr->place = data->place;
 
   thr->ts.team->ordered_release[thr->ts.team_id] = &thr->release;
 
@@ -262,7 +264,7 @@ gomp_free_thread (void *arg __attribute_
 
 void
 gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads,
-		 struct gomp_team *team)
+		 unsigned flags, struct gomp_team *team)
 {
   struct gomp_thread_start_data *start_data;
   struct gomp_thread *thr, *nthr;
@@ -273,6 +275,10 @@ gomp_team_start (void (*fn) (void *), vo
   unsigned i, n, old_threads_used = 0;
   pthread_attr_t thread_attr, *attr;
   unsigned long nthreads_var;
+  char bind, bind_var;
+  unsigned int s = 0, rest = 0, p = 0, k = 0;
+  unsigned int affinity_count = 0;
+  struct gomp_thread **affinity_thr = NULL;
 
   thr = gomp_thread ();
   nested = thr->ts.team != NULL;
@@ -284,6 +290,8 @@ gomp_team_start (void (*fn) (void *), vo
   pool = thr->thread_pool;
   task = thr->task;
   icv = task ? &task->icv : &gomp_global_icv;
+  if (__builtin_expect (gomp_places_list != NULL, 0) && thr->place == 0)
+    gomp_init_affinity ();
 
   /* Always save the previous state, even if this isn't a nested team.
      In particular, we should save any work share state from an outer
@@ -306,14 +314,95 @@ gomp_team_start (void (*fn) (void *), vo
   if (__builtin_expect (gomp_nthreads_var_list != NULL, 0)
       && thr->ts.level < gomp_nthreads_var_list_len)
     nthreads_var = gomp_nthreads_var_list[thr->ts.level];
+  bind_var = icv->bind_var;
+  if (bind_var != omp_proc_bind_false && (flags & 7) != omp_proc_bind_false)
+    bind_var = flags & 7;
+  bind = bind_var;
+  if (__builtin_expect (gomp_bind_var_list != NULL, 0)
+      && thr->ts.level < gomp_bind_var_list_len)
+    bind_var = gomp_bind_var_list[thr->ts.level];
   gomp_init_task (thr->task, task, icv);
   team->implicit_task[0].icv.nthreads_var = nthreads_var;
+  team->implicit_task[0].icv.bind_var = bind_var;
 
   if (nthreads == 1)
     return;
 
   i = 1;
 
+  if (__builtin_expect (gomp_places_list != NULL, 0))
+    {
+      if (bind == omp_proc_bind_false)
+	bind = omp_proc_bind_true;
+      /* Depending on chosen proc_bind model, set subpartition
+	 for the master thread and initialize helper variables
+	 P and optionally S, K and/or REST used by later place
+	 computation for each additional thread.  */
+      p = thr->place - 1;
+      switch (bind)
+	{
+	case omp_proc_bind_false:
+	  bind = omp_proc_bind_true;
+	  /* FALLTHRU */
+	case omp_proc_bind_true:
+	case omp_proc_bind_close:
+	  if (nthreads > thr->ts.place_partition_len)
+	    {
+	      /* T > P.  S threads will be placed in each place,
+		 and the final REM threads placed one by one
+		 into the already occupied places.  */
+	      s = nthreads / thr->ts.place_partition_len;
+	      rest = nthreads % thr->ts.place_partition_len;
+	    }
+	  else
+	    s = 1;
+	  k = 1;
+	  break;
+	case omp_proc_bind_master:
+	  /* Each thread will be bound to master's place.  */
+	  break;
+	case omp_proc_bind_spread:
+	  if (nthreads <= thr->ts.place_partition_len)
+	    {
+	      /* T <= P.  Each subpartition will have in between s
+		 and s+1 places (subpartitions starting at or
+		 after rest will have s places, earlier s+1 places),
+		 each thread will be bound to the first place in
+		 its subpartition (except for the master thread
+		 that can be bound to another place in its
+		 subpartition).  */
+	      s = thr->ts.place_partition_len / nthreads;
+	      rest = thr->ts.place_partition_len % nthreads;
+	      rest = (s + 1) * rest + thr->ts.place_partition_off;
+	      if (p < rest)
+		{
+		  p -= (p - thr->ts.place_partition_off) % (s + 1);
+		  thr->ts.place_partition_len = s + 1;
+		}
+	      else
+		{
+		  p -= (p - rest) % s;
+		  thr->ts.place_partition_len = s;
+		}
+	      thr->ts.place_partition_off = p;
+	    }
+	  else
+	    {
+	      /* T > P.  Each subpartition will have just a single
+		 place and we'll place between s and s+1
+		 threads into each subpartition.  */
+	      s = nthreads / thr->ts.place_partition_len;
+	      rest = nthreads % thr->ts.place_partition_len;
+	      thr->ts.place_partition_off = p;
+	      thr->ts.place_partition_len = 1;
+	      k = 1;
+	    }
+	  break;
+	}
+    }
+  else
+    bind = omp_proc_bind_false;
+
   /* We only allow the reuse of idle threads for non-nested PARALLEL
      regions.  This appears to be implied by the semantics of
      threadprivate variables, but perhaps that's reading too much into
@@ -344,47 +433,244 @@ gomp_team_start (void (*fn) (void *), vo
 	 team will exit.  */
       pool->threads_used = nthreads;
 
+      /* If necessary, expand the size of the gomp_threads array.  It is
+	 expected that changes in the number of threads are rare, thus we
+	 make no effort to expand gomp_threads_size geometrically.  */
+      if (nthreads >= pool->threads_size)
+	{
+	  pool->threads_size = nthreads + 1;
+	  pool->threads
+	    = gomp_realloc (pool->threads,
+			    pool->threads_size
+			    * sizeof (struct gomp_thread_data *));
+	}
+
       /* Release existing idle threads.  */
       for (; i < n; ++i)
 	{
-	  nthr = pool->threads[i];
+	  unsigned int place_partition_off = thr->ts.place_partition_off;
+	  unsigned int place_partition_len = thr->ts.place_partition_len;
+	  unsigned int place = 0;
+	  if (__builtin_expect (gomp_places_list != NULL, 0))
+	    {
+	      switch (bind)
+		{
+		case omp_proc_bind_true:
+		case omp_proc_bind_close:
+		  if (k == s)
+		    {
+		      ++p;
+		      if (p == (team->prev_ts.place_partition_off
+				+ team->prev_ts.place_partition_len))
+			p = team->prev_ts.place_partition_off;
+		      k = 1;
+		      if (i == nthreads - rest)
+			s = 1;
+		    }
+		  else
+		    ++k;
+		  break;
+		case omp_proc_bind_master:
+		  break;
+		case omp_proc_bind_spread:
+		  if (k == 0)
+		    {
+		      /* T <= P.  */
+		      if (p < rest)
+			p += s + 1;
+		      else
+			p += s;
+		      if (p == (team->prev_ts.place_partition_off
+				+ team->prev_ts.place_partition_len))
+			p = team->prev_ts.place_partition_off;
+		      place_partition_off = p;
+		      if (p < rest)
+			place_partition_len = s + 1;
+		      else
+			place_partition_len = s;
+		    }
+		  else
+		    {
+		      /* T > P.  */
+		      if (k == s)
+			{
+			  ++p;
+			  if (p == (team->prev_ts.place_partition_off
+				    + team->prev_ts.place_partition_len))
+			    p = team->prev_ts.place_partition_off;
+			  k = 1;
+			  if (i == nthreads - rest)
+			    s = 1;
+			}
+		      else
+			++k;
+		      place_partition_off = p;
+		      place_partition_len = 1;
+		    }
+		  break;
+		}
+	      if (affinity_thr != NULL
+		  || (bind != omp_proc_bind_true
+		      && pool->threads[i]->place != p + 1)
+		  || pool->threads[i]->place <= place_partition_off
+		  || pool->threads[i]->place > (place_partition_off
+						+ place_partition_len))
+		{
+		  unsigned int l;
+		  if (affinity_thr == NULL)
+		    {
+		      unsigned int j;
+
+		      if (team->prev_ts.place_partition_len > 64)
+			affinity_thr
+			  = gomp_malloc (team->prev_ts.place_partition_len
+					 * sizeof (struct gomp_thread *));
+		      else
+			affinity_thr
+			  = gomp_alloca (team->prev_ts.place_partition_len
+					 * sizeof (struct gomp_thread *));
+		      memset (affinity_thr, '\0',
+			      team->prev_ts.place_partition_len
+			      * sizeof (struct gomp_thread *));
+		      for (j = i; j < old_threads_used; j++)
+			{
+			  if (pool->threads[j]->place
+			      > team->prev_ts.place_partition_off
+			      && (pool->threads[j]->place
+				  <= (team->prev_ts.place_partition_off
+				      + team->prev_ts.place_partition_len)))
+			    {
+			      l = pool->threads[j]->place - 1
+				  - team->prev_ts.place_partition_off;
+			      pool->threads[j]->data = affinity_thr[l];
+			      affinity_thr[l] = pool->threads[j];
+			    }
+			  pool->threads[j] = NULL;
+			}
+		      if (nthreads > old_threads_used)
+			memset (&pool->threads[old_threads_used],
+				'\0', ((nthreads - old_threads_used)
+				       * sizeof (struct gomp_thread *)));
+		      n = nthreads;
+		      affinity_count = old_threads_used - i;
+		    }
+		  if (affinity_count == 0)
+		    break;
+		  l = p;
+		  if (affinity_thr[l - team->prev_ts.place_partition_off]
+		      == NULL)
+		    {
+		      if (bind != omp_proc_bind_true)
+			continue;
+		      for (l = place_partition_off;
+			   l < place_partition_off + place_partition_len;
+			   l++)
+			if (affinity_thr[l - team->prev_ts.place_partition_off]
+			    != NULL)
+			  break;
+		      if (l == place_partition_off + place_partition_len)
+			continue;
+		    }
+		  nthr = affinity_thr[l - team->prev_ts.place_partition_off];
+		  affinity_thr[l - team->prev_ts.place_partition_off]
+		    = (struct gomp_thread *) nthr->data;
+		  affinity_count--;
+		  pool->threads[i] = nthr;
+		}
+	      else
+		nthr = pool->threads[i];
+	      place = p + 1;
+	    }
+	  else
+	    nthr = pool->threads[i];
 	  nthr->ts.team = team;
 	  nthr->ts.work_share = &team->work_shares[0];
 	  nthr->ts.last_work_share = NULL;
 	  nthr->ts.team_id = i;
 	  nthr->ts.level = team->prev_ts.level + 1;
 	  nthr->ts.active_level = thr->ts.active_level;
+	  nthr->ts.place_partition_off = place_partition_off;
+	  nthr->ts.place_partition_len = place_partition_len;
 #ifdef HAVE_SYNC_BUILTINS
 	  nthr->ts.single_count = 0;
 #endif
 	  nthr->ts.static_trip = 0;
 	  nthr->task = &team->implicit_task[i];
+	  nthr->place = place;
 	  gomp_init_task (nthr->task, task, icv);
 	  team->implicit_task[i].icv.nthreads_var = nthreads_var;
+	  team->implicit_task[i].icv.bind_var = bind_var;
 	  nthr->fn = fn;
 	  nthr->data = data;
 	  team->ordered_release[i] = &nthr->release;
 	}
 
+      if (__builtin_expect (affinity_thr != NULL, 0))
+	{
+	  /* If AFFINITY_THR is non-NULL just because we had to
+	     permute some threads in the pool, but we've managed
+	     to find exactly as many old threads as we'd find
+	     without affinity, we don't need to handle this
+	     specially anymore.  */
+	  if (nthreads <= old_threads_used
+	      ? (affinity_count == old_threads_used - nthreads)
+	      : (i == old_threads_used))
+	    {
+	      if (team->prev_ts.place_partition_len > 64)
+		free (affinity_thr);
+	      affinity_thr = NULL;
+	      affinity_count = 0;
+	    }
+	  else
+	    {
+	      i = 1;
+	      /* We are going to compute the places/subpartitions
+		 again from the beginning.  So, we need to reinitialize
+		 vars modified by the switch (bind) above inside
+		 of the loop, to the state they had after the initial
+		 switch (bind).  */
+	      switch (bind)
+		{
+		case omp_proc_bind_true:
+		case omp_proc_bind_close:
+		  if (nthreads > thr->ts.place_partition_len)
+		    /* T > P.  S has been changed, so needs
+		       to be recomputed.  */
+		    s = nthreads / thr->ts.place_partition_len;
+		  k = 1;
+		  p = thr->place - 1;
+		  break;
+		case omp_proc_bind_master:
+		  /* No vars have been changed.  */
+		  break;
+		case omp_proc_bind_spread:
+		  p = thr->ts.place_partition_off;
+		  if (k != 0)
+		    {
+		      /* T > P.  */
+		      s = nthreads / team->prev_ts.place_partition_len;
+		      k = 1;
+		    }
+		  break;
+		}
+
+	      /* Increase the barrier threshold to make sure all new
+		 threads and all the threads we're going to let die
+		 arrive before the team is released.  */
+	      if (affinity_count)
+		gomp_barrier_reinit (&pool->threads_dock,
+				     nthreads + affinity_count);
+	    }
+	}
+
       if (i == nthreads)
 	goto do_release;
 
-      /* If necessary, expand the size of the gomp_threads array.  It is
-	 expected that changes in the number of threads are rare, thus we
-	 make no effort to expand gomp_threads_size geometrically.  */
-      if (nthreads >= pool->threads_size)
-	{
-	  pool->threads_size = nthreads + 1;
-	  pool->threads
-	    = gomp_realloc (pool->threads,
-			    pool->threads_size
-			    * sizeof (struct gomp_thread_data *));
-	}
     }
 
-  if (__builtin_expect (nthreads > old_threads_used, 0))
+  if (__builtin_expect (nthreads + affinity_count > old_threads_used, 0))
     {
-      long diff = (long) nthreads - (long) old_threads_used;
+      long diff = (long) (nthreads + affinity_count) - (long) old_threads_used;
 
       if (old_threads_used == 0)
 	--diff;
@@ -399,7 +685,7 @@ gomp_team_start (void (*fn) (void *), vo
     }
 
   attr = &gomp_thread_attr;
-  if (__builtin_expect (gomp_cpu_affinity != NULL, 0))
+  if (__builtin_expect (gomp_places_list != NULL, 0))
     {
       size_t stacksize;
       pthread_attr_init (&thread_attr);
@@ -413,11 +699,78 @@ gomp_team_start (void (*fn) (void *), vo
 			    * (nthreads-i));
 
   /* Launch new threads.  */
-  for (; i < nthreads; ++i, ++start_data)
+  for (; i < nthreads; ++i)
     {
       pthread_t pt;
       int err;
 
+      start_data->ts.place_partition_off = thr->ts.place_partition_off;
+      start_data->ts.place_partition_len = thr->ts.place_partition_len;
+      start_data->place = 0;
+      if (__builtin_expect (gomp_places_list != NULL, 0))
+	{
+	  switch (bind)
+	    {
+	    case omp_proc_bind_true:
+	    case omp_proc_bind_close:
+	      if (k == s)
+		{
+		  ++p;
+		  if (p == (team->prev_ts.place_partition_off
+			    + team->prev_ts.place_partition_len))
+		    p = team->prev_ts.place_partition_off;
+		  k = 1;
+		  if (i == nthreads - rest)
+		    s = 1;
+		}
+	      else
+		++k;
+	      break;
+	    case omp_proc_bind_master:
+	      break;
+	    case omp_proc_bind_spread:
+	      if (k == 0)
+		{
+		  /* T <= P.  */
+		  if (p < rest)
+		    p += s + 1;
+		  else
+		    p += s;
+		  if (p == (team->prev_ts.place_partition_off
+			    + team->prev_ts.place_partition_len))
+		    p = team->prev_ts.place_partition_off;
+		  start_data->ts.place_partition_off = p;
+		  if (p < rest)
+		    start_data->ts.place_partition_len = s + 1;
+		  else
+		    start_data->ts.place_partition_len = s;
+		}
+	      else
+		{
+		  /* T > P.  */
+		  if (k == s)
+		    {
+		      ++p;
+		      if (p == (team->prev_ts.place_partition_off
+				+ team->prev_ts.place_partition_len))
+			p = team->prev_ts.place_partition_off;
+		      k = 1;
+		      if (i == nthreads - rest)
+			s = 1;
+		    }
+		  else
+		    ++k;
+		  start_data->ts.place_partition_off = p;
+		  start_data->ts.place_partition_len = 1;
+		}
+	      break;
+	    }
+	  start_data->place = p + 1;
+	  if (affinity_thr != NULL && pool->threads[i] != NULL)
+	    continue;
+	  gomp_init_thread_affinity (attr, p);
+	}
+
       start_data->fn = fn;
       start_data->fn_data = data;
       start_data->ts.team = team;
@@ -433,18 +786,16 @@ gomp_team_start (void (*fn) (void *), vo
       start_data->task = &team->implicit_task[i];
       gomp_init_task (start_data->task, task, icv);
       team->implicit_task[i].icv.nthreads_var = nthreads_var;
+      team->implicit_task[i].icv.bind_var = bind_var;
       start_data->thread_pool = pool;
       start_data->nested = nested;
 
-      if (gomp_cpu_affinity != NULL)
-	gomp_init_thread_affinity (attr);
-
-      err = pthread_create (&pt, attr, gomp_thread_start, start_data);
+      err = pthread_create (&pt, attr, gomp_thread_start, start_data++);
       if (err != 0)
 	gomp_fatal ("Thread creation failed: %s", strerror (err));
     }
 
-  if (__builtin_expect (gomp_cpu_affinity != NULL, 0))
+  if (__builtin_expect (gomp_places_list != NULL, 0))
     pthread_attr_destroy (&thread_attr);
 
  do_release:
@@ -453,11 +804,19 @@ gomp_team_start (void (*fn) (void *), vo
   /* Decrease the barrier threshold to match the number of threads
      that should arrive back at the end of this team.  The extra
      threads should be exiting.  Note that we arrange for this test
-     to never be true for nested teams.  */
-  if (__builtin_expect (nthreads < old_threads_used, 0))
+     to never be true for nested teams.  If AFFINITY_COUNT is non-zero,
+     the barrier as well as gomp_managed_threads was temporarily
+     set to NTHREADS + AFFINITY_COUNT.  For NTHREADS < OLD_THREADS_COUNT,
+     AFFINITY_COUNT if non-zero will be always at least
+     OLD_THREADS_COUNT - NTHREADS.  */
+  if (__builtin_expect (nthreads < old_threads_used, 0)
+      || __builtin_expect (affinity_count, 0))
     {
       long diff = (long) nthreads - (long) old_threads_used;
 
+      if (affinity_count)
+	diff = -affinity_count;
+
       gomp_barrier_reinit (&pool->threads_dock, nthreads);
 
 #ifdef HAVE_SYNC_BUILTINS
@@ -468,6 +827,9 @@ gomp_team_start (void (*fn) (void *), vo
       gomp_mutex_unlock (&gomp_remaining_threads_lock);
 #endif
     }
+  if (__builtin_expect (affinity_thr != NULL, 0)
+      && team->prev_ts.place_partition_len > 64)
+    free (affinity_thr);
 }
 
 
--- libgomp/env.c.jj	2013-09-24 12:52:53.000000000 +0200
+++ libgomp/env.c	2013-10-03 21:24:12.126526840 +0200
@@ -59,11 +59,10 @@ struct gomp_task_icv gomp_global_icv = {
   .default_device_var = 0,
   .dyn_var = false,
   .nest_var = false,
+  .bind_var = omp_proc_bind_false,
   .target_data = NULL
 };
 
-unsigned short *gomp_cpu_affinity;
-size_t gomp_cpu_affinity_len;
 unsigned long gomp_max_active_levels_var = INT_MAX;
 unsigned long gomp_thread_limit_var = ULONG_MAX;
 bool gomp_cancel_var = false;
@@ -74,6 +73,10 @@ gomp_mutex_t gomp_remaining_threads_lock
 unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1;
 unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
 unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len;
+char *gomp_bind_var_list;
+unsigned long gomp_bind_var_list_len;
+void **gomp_places_list;
+unsigned long gomp_places_list_len;
 
 /* Parse the OMP_SCHEDULE environment variable.  */
 
@@ -298,6 +301,412 @@ parse_unsigned_long_list (const char *na
   return false;
 }
 
+/* Parse environment variable set to a boolean or list of omp_proc_bind_t
+   enum values.  Return true if one was present and it was successfully
+   parsed.  */
+
+static bool
+parse_bind_var (const char *name, char *p1stvalue,
+		char **pvalues, unsigned long *pnvalues)
+{
+  char *env;
+  char value, *values = NULL;
+  int i;
+  static struct proc_bind_kinds
+  {
+    const char name[7];
+    const char len;
+    omp_proc_bind_t kind;
+  } kinds[] =
+  {
+    { "false", 5, omp_proc_bind_false },
+    { "true", 4, omp_proc_bind_true },
+    { "master", 6, omp_proc_bind_master },
+    { "close", 5, omp_proc_bind_close },
+    { "spread", 6, omp_proc_bind_spread }
+  };
+
+  env = getenv (name);
+  if (env == NULL)
+    return false;
+
+  while (isspace ((unsigned char) *env))
+    ++env;
+  if (*env == '\0')
+    goto invalid;
+
+  for (i = 0; i < 5; i++)
+    if (strncasecmp (env, kinds[i].name, kinds[i].len) == 0)
+      {
+	value = kinds[i].kind;
+	env += kinds[i].len;
+	break;
+      }
+  if (i == 5)
+    goto invalid;
+
+  while (isspace ((unsigned char) *env))
+    ++env;
+  if (*env != '\0')
+    {
+      if (*env == ',')
+	{
+	  unsigned long nvalues = 0, nalloced = 0;
+
+	  if (value == omp_proc_bind_false
+	      || value == omp_proc_bind_true)
+	    goto invalid;
+
+	  do
+	    {
+	      env++;
+	      if (nvalues == nalloced)
+		{
+		  char *n;
+		  nalloced = nalloced ? nalloced * 2 : 16;
+		  n = realloc (values, nalloced);
+		  if (n == NULL)
+		    {
+		      free (values);
+		      gomp_error ("Out of memory while trying to parse"
+				  " environment variable %s", name);
+		      return false;
+		    }
+		  values = n;
+		  if (nvalues == 0)
+		    values[nvalues++] = value;
+		}
+
+	      while (isspace ((unsigned char) *env))
+		++env;
+	      if (*env == '\0')
+		goto invalid;
+
+	      for (i = 2; i < 5; i++)
+		if (strncasecmp (env, kinds[i].name, kinds[i].len) == 0)
+		  {
+		    value = kinds[i].kind;
+		    env += kinds[i].len;
+		    break;
+		  }
+	      if (i == 5)
+		goto invalid;
+
+	      values[nvalues++] = value;
+	      while (isspace ((unsigned char) *env))
+		++env;
+	      if (*env == '\0')
+		break;
+	      if (*env != ',')
+		goto invalid;
+	    }
+	  while (1);
+	  *p1stvalue = values[0];
+	  *pvalues = values;
+	  *pnvalues = nvalues;
+	  return true;
+	}
+      goto invalid;
+    }
+
+  *p1stvalue = value;
+  return true;
+
+ invalid:
+  free (values);
+  gomp_error ("Invalid value for environment variable %s", name);
+  return false;
+}
+
+static bool
+parse_one_place (char **envp, bool *negatep, unsigned long *lenp,
+		 long *stridep)
+{
+  char *env = *envp, *start;
+  void *p = gomp_places_list ? gomp_places_list[gomp_places_list_len] : NULL;
+  unsigned long len = 1;
+  long stride = 1;
+  int pass;
+  bool any_negate = false;
+  *negatep = false;
+  while (isspace ((unsigned char) *env))
+    ++env;
+  if (*env == '!')
+    {
+      *negatep = true;
+      ++env;
+      while (isspace ((unsigned char) *env))
+	++env;
+    }
+  if (*env != '{')
+    return false;
+  ++env;
+  while (isspace ((unsigned char) *env))
+    ++env;
+  start = env;
+  for (pass = 0; pass < (any_negate ? 2 : 1); pass++)
+    {
+      env = start;
+      do
+	{
+	  unsigned long this_num, this_len = 1;
+	  long this_stride = 1;
+	  bool this_negate = (*env == '!');
+	  if (this_negate)
+	    {
+	      if (gomp_places_list)
+		any_negate = true;
+	      ++env;
+	      while (isspace ((unsigned char) *env))
+		++env;
+	    }
+
+	  errno = 0;
+	  this_num = strtoul (env, &env, 10);
+	  if (errno)
+	    return false;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+	  if (*env == ':')
+	    {
+	      ++env;
+	      while (isspace ((unsigned char) *env))
+		++env;
+	      errno = 0;
+	      this_len = strtoul (env, &env, 10);
+	      if (errno || this_len == 0)
+		return false;
+	      while (isspace ((unsigned char) *env))
+		++env;
+	      if (*env == ':')
+		{
+		  ++env;
+		  while (isspace ((unsigned char) *env))
+		    ++env;
+		  errno = 0;
+		  this_stride = strtol (env, &env, 10);
+		  if (errno)
+		    return false;
+		  while (isspace ((unsigned char) *env))
+		    ++env;
+		}
+	    }
+	  if (this_negate && this_len != 1)
+	    return false;
+	  if (gomp_places_list && pass == this_negate)
+	    {
+	      if (this_negate)
+		{
+		  if (!gomp_affinity_remove_cpu (p, this_num))
+		    return false;
+		}
+	      else if (!gomp_affinity_add_cpus (p, this_num, this_len,
+						this_stride, false))
+		return false;
+	    }
+	  if (*env == '}')
+	    break;
+	  if (*env != ',')
+	    return false;
+	  ++env;
+	}
+      while (1);
+    }
+
+  ++env;
+  while (isspace ((unsigned char) *env))
+    ++env;
+  if (*env == ':')
+    {
+      ++env;
+      while (isspace ((unsigned char) *env))
+	++env;
+      errno = 0;
+      len = strtoul (env, &env, 10);
+      if (errno || len == 0 || len >= 65536)
+	return false;
+      while (isspace ((unsigned char) *env))
+	++env;
+      if (*env == ':')
+	{
+	  ++env;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+	  errno = 0;
+	  stride = strtol (env, &env, 10);
+	  if (errno)
+	    return false;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+	}
+    }
+  if (*negatep && len != 1)
+    return false;
+  *envp = env;
+  *lenp = len;
+  *stridep = stride;
+  return true;
+}
+
+static bool
+parse_places_var (const char *name)
+{
+  char *env = getenv (name), *end;
+  bool any_negate = false;
+  int level = 0;
+  unsigned long count = 0;
+  if (env == NULL)
+    return false;
+
+  while (isspace ((unsigned char) *env))
+    ++env;
+  if (*env == '\0')
+    goto invalid;
+
+  if (strncasecmp (env, "threads", 7) == 0)
+    {
+      env += 7;
+      level = 1;
+    }
+  else if (strncasecmp (env, "cores", 5) == 0)
+    {
+      env += 5;
+      level = 2;
+    }
+  else if (strncasecmp (env, "sockets", 7) == 0)
+    {
+      env += 7;
+      level = 3;
+    }
+  if (level)
+    {
+      count = ULONG_MAX;
+      while (isspace ((unsigned char) *env))
+	++env;
+      if (*env != '\0')
+	{
+	  if (*env++ != '(')
+	    goto invalid;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+
+	  errno = 0;
+	  count = strtoul (env, &end, 10);
+	  if (errno)
+	    goto invalid;
+	  env = end;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+	  if (*env != ')')
+	    goto invalid;
+	  ++env;
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+	  if (*env != '\0')
+	    goto invalid;
+	}
+      return gomp_affinity_init_level (level, count, false);
+    }
+
+  count = 0;
+  end = env;
+  do
+    {
+      bool negate;
+      unsigned long len;
+      long stride;
+      if (!parse_one_place (&end, &negate, &len, &stride))
+	goto invalid;
+      if (negate)
+	{
+	  if (!any_negate)
+	    count++;
+	  any_negate = true;
+	}
+      else
+	count += len;
+      if (count > 65536)
+	goto invalid;
+      if (*end == '\0')
+	break;
+      if (*end != ',')
+	goto invalid;
+      end++;
+    }
+  while (1);
+
+  if (gomp_global_icv.bind_var == omp_proc_bind_false)
+    return false;
+
+  gomp_places_list_len = 0;
+  gomp_places_list = gomp_affinity_alloc (count, false);
+  if (gomp_places_list == NULL)
+    return false;
+
+  do
+    {
+      bool negate;
+      unsigned long len;
+      long stride;
+      gomp_affinity_init_place (gomp_places_list[gomp_places_list_len]);
+      if (!parse_one_place (&env, &negate, &len, &stride))
+	goto invalid;
+      if (negate)
+	{
+	  void *p;
+	  for (count = 0; count < gomp_places_list_len; count++)
+	    if (gomp_affinity_same_place
+			(gomp_places_list[count],
+			 gomp_places_list[gomp_places_list_len]))
+	      break;
+	  if (count == gomp_places_list_len)
+	    {
+	      gomp_error ("Trying to remove a non-existing place from list "
+			  "of places");
+	      goto invalid;
+	    }
+	  p = gomp_places_list[count];
+	  memmove (&gomp_places_list[count],
+		   &gomp_places_list[count + 1],
+		   (gomp_places_list_len - count - 1) * sizeof (void *));
+	  --gomp_places_list_len;
+	  gomp_places_list[gomp_places_list_len] = p;
+	}
+      else if (len == 1)
+	++gomp_places_list_len;
+      else
+	{
+	  for (count = 0; count < len - 1; count++)
+	    if (!gomp_affinity_copy_place
+			(gomp_places_list[gomp_places_list_len + count + 1],
+			 gomp_places_list[gomp_places_list_len + count],
+			 stride))
+	      goto invalid;
+	  gomp_places_list_len += len;
+	}
+      if (*env == '\0')
+	break;
+      env++;
+    }
+  while (1);
+
+  if (gomp_places_list_len == 0)
+    {
+      gomp_error ("All places have been removed");
+      goto invalid;
+    }
+  if (!gomp_affinity_finalize_place_list (false))
+    goto invalid;
+  return true;
+
+ invalid:
+  free (gomp_places_list);
+  gomp_places_list = NULL;
+  gomp_places_list_len = 0;
+  gomp_error ("Invalid value for environment variable %s", name);
+  return false;
+}
+
 /* Parse the OMP_STACKSIZE environment varible.  Return true if one was
    present and it was successfully parsed.  */
 
@@ -505,84 +914,89 @@ parse_wait_policy (void)
 static bool
 parse_affinity (void)
 {
-  char *env, *end;
+  char *env, *end, *start;
+  int pass;
   unsigned long cpu_beg, cpu_end, cpu_stride;
-  unsigned short *cpus = NULL;
-  size_t allocated = 0, used = 0, needed;
+  size_t count = 0, needed;
 
   env = getenv ("GOMP_CPU_AFFINITY");
   if (env == NULL)
     return false;
 
-  do
+  start = env;
+  for (pass = 0; pass < 2; pass++)
     {
-      while (*env == ' ' || *env == '\t')
-	env++;
-
-      cpu_beg = strtoul (env, &end, 0);
-      cpu_end = cpu_beg;
-      cpu_stride = 1;
-      if (env == end || cpu_beg >= 65536)
-	goto invalid;
-
-      env = end;
-      if (*env == '-')
+      env = start;
+      if (pass == 1)
 	{
-	  cpu_end = strtoul (++env, &end, 0);
-	  if (env == end || cpu_end >= 65536 || cpu_end < cpu_beg)
+	  gomp_places_list_len = 0;
+	  gomp_places_list = gomp_affinity_alloc (count, true);
+	  if (gomp_places_list == NULL)
+	    return false;
+	}
+      do
+	{
+	  while (isspace ((unsigned char) *env))
+	    ++env;
+
+	  errno = 0;
+	  cpu_beg = strtoul (env, &end, 0);
+	  if (errno || cpu_beg >= 65536)
 	    goto invalid;
+	  cpu_end = cpu_beg;
+	  cpu_stride = 1;
 
 	  env = end;
-	  if (*env == ':')
+	  if (*env == '-')
 	    {
-	      cpu_stride = strtoul (++env, &end, 0);
-	      if (env == end || cpu_stride == 0 || cpu_stride >= 65536)
+	      errno = 0;
+	      cpu_end = strtoul (++env, &end, 0);
+	      if (errno || cpu_end >= 65536 || cpu_end < cpu_beg)
 		goto invalid;
 
 	      env = end;
-	    }
-	}
+	      if (*env == ':')
+		{
+		  errno = 0;
+		  cpu_stride = strtoul (++env, &end, 0);
+		  if (errno || cpu_stride == 0 || cpu_stride >= 65536)
+		    goto invalid;
 
-      needed = (cpu_end - cpu_beg) / cpu_stride + 1;
-      if (used + needed >= allocated)
-	{
-	  unsigned short *new_cpus;
+		  env = end;
+		}
+	    }
 
-	  if (allocated < 64)
-	    allocated = 64;
-	  if (allocated > needed)
-	    allocated <<= 1;
+	  needed = (cpu_end - cpu_beg) / cpu_stride + 1;
+	  if (pass == 0)
+	    count += needed;
 	  else
-	    allocated += 2 * needed;
-	  new_cpus = realloc (cpus, allocated * sizeof (unsigned short));
-	  if (new_cpus == NULL)
 	    {
-	      free (cpus);
-	      gomp_error ("not enough memory to store GOMP_CPU_AFFINITY list");
-	      return false;
+	      while (needed--)
+		{
+		  void *p = gomp_places_list[gomp_places_list_len];
+		  gomp_affinity_init_place (p);
+		  if (gomp_affinity_add_cpus (p, cpu_beg, 1, 0, true))
+		    ++gomp_places_list_len;
+		  cpu_beg += cpu_stride;
+		}
 	    }
 
-	  cpus = new_cpus;
-	}
+	  while (isspace ((unsigned char) *env))
+	    ++env;
 
-      while (needed--)
-	{
-	  cpus[used++] = cpu_beg;
-	  cpu_beg += cpu_stride;
+	  if (*env == ',')
+	    env++;
+	  else if (*env == '\0')
+	    break;
 	}
-
-      while (*env == ' ' || *env == '\t')
-	env++;
-
-      if (*env == ',')
-	env++;
-      else if (*env == '\0')
-	break;
+      while (1);
     }
-  while (1);
 
-  gomp_cpu_affinity = cpus;
-  gomp_cpu_affinity_len = used;
+  if (gomp_places_list_len == 0)
+    {
+      free (gomp_places_list);
+      gomp_places_list = NULL;
+    }
   return true;
 
  invalid:
@@ -592,8 +1006,7 @@ parse_affinity (void)
 
 
 static void
-handle_omp_display_env (bool proc_bind, unsigned long stacksize,
-			int wait_policy)
+handle_omp_display_env (unsigned long stacksize, int wait_policy)
 {
   const char *env;
   bool display = false;
@@ -608,6 +1021,7 @@ handle_omp_display_env (bool proc_bind,
     ++env;
   if (strncasecmp (env, "true", 4) == 0)
     {
+      display = true;
       env += 4;
     }
   else if (strncasecmp (env, "false", 5) == 0)
@@ -633,7 +1047,7 @@ handle_omp_display_env (bool proc_bind,
 
   fputs ("\nOPENMP DISPLAY ENVIRONMENT BEGIN\n", stderr);
 
-  fputs ("  _OPENMP = '201107'\n", stderr);
+  fputs ("  _OPENMP = '201307'\n", stderr);
   fprintf (stderr, "  OMP_DYNAMIC = '%s'\n",
 	   gomp_global_icv.dyn_var ? "TRUE" : "FALSE");
   fprintf (stderr, "  OMP_NESTED = '%s'\n",
@@ -665,8 +1079,48 @@ handle_omp_display_env (bool proc_bind,
     }
   fputs ("'\n", stderr);
 
-  fprintf (stderr, "  OMP_PROC_BIND = '%s'\n",
-	   proc_bind ? "TRUE" : "FALSE");
+  fputs ("  OMP_PROC_BIND = '", stderr);
+  switch (gomp_global_icv.bind_var)
+    {
+    case omp_proc_bind_false:
+      fputs ("FALSE", stderr);
+      break;
+    case omp_proc_bind_true:
+      fputs ("TRUE", stderr);
+      break;
+    case omp_proc_bind_master:
+      fputs ("MASTER", stderr);
+      break;
+    case omp_proc_bind_close:
+      fputs ("CLOSE", stderr);
+      break;
+    case omp_proc_bind_spread:
+      fputs ("SPREAD", stderr);
+      break;
+    }
+  for (i = 1; i < gomp_bind_var_list_len; i++)
+    switch (gomp_bind_var_list[i])
+      {
+      case omp_proc_bind_master:
+	fputs (",MASTER", stderr);
+	break;
+      case omp_proc_bind_close:
+	fputs (",CLOSE", stderr);
+	break;
+      case omp_proc_bind_spread:
+	fputs (",SPREAD", stderr);
+	break;
+      }
+  fputs ("'\n", stderr);
+  fputs ("  OMP_PLACES = '", stderr);
+  for (i = 0; i < gomp_places_list_len; i++)
+    {
+      fputs ("{", stderr);
+      gomp_affinity_print_place (gomp_places_list[i]);
+      fputs (i + 1 == gomp_places_list_len ? "}" : "},", stderr);
+    }
+  fputs ("'\n", stderr);
+
   fprintf (stderr, "  OMP_STACKSIZE = '%lu'\n", stacksize);
 
   /* GOMP's default value is actually neither active nor passive.  */
@@ -677,8 +1131,6 @@ handle_omp_display_env (bool proc_bind,
   fprintf (stderr, "  OMP_MAX_ACTIVE_LEVELS = '%lu'\n",
 	   gomp_max_active_levels_var);
 
-/* FIXME: Unimplemented OpenMP 4.0 environment variables.
-  fprintf (stderr, "  OMP_PLACES = ''\n"); */
   fprintf (stderr, "  OMP_CANCELLATION = '%s'\n",
 	   gomp_cancel_var ? "TRUE" : "FALSE");
   fprintf (stderr, "  OMP_DEFAULT_DEVICE = '%d'\n",
@@ -686,14 +1138,7 @@ handle_omp_display_env (bool proc_bind,
 
   if (verbose)
     {
-      fputs ("  GOMP_CPU_AFFINITY = '", stderr);
-      if (gomp_cpu_affinity_len)
-	{
-	  fprintf (stderr, "%d", gomp_cpu_affinity[0]);
-	  for (i = 1; i < gomp_cpu_affinity_len; i++)
-	    fprintf (stderr, " %d", gomp_cpu_affinity[i]);
-	}
-      fputs ("'\n", stderr);
+      fputs ("  GOMP_CPU_AFFINITY = ''\n", stderr);
       fprintf (stderr, "  GOMP_STACKSIZE = '%lu'\n", stacksize);
 #ifdef HAVE_INTTYPES_H
       fprintf (stderr, "  GOMP_SPINCOUNT = '%"PRIu64"'\n",
@@ -713,7 +1158,6 @@ initialize_env (void)
 {
   unsigned long stacksize;
   int wait_policy;
-  bool bind_var = false;
 
   /* Do a compile time check that mkomp_h.pl did good job.  */
   omp_check_defines ();
@@ -721,7 +1165,6 @@ initialize_env (void)
   parse_schedule ();
   parse_boolean ("OMP_DYNAMIC", &gomp_global_icv.dyn_var);
   parse_boolean ("OMP_NESTED", &gomp_global_icv.nest_var);
-  parse_boolean ("OMP_PROC_BIND", &bind_var);
   parse_boolean ("OMP_CANCELLATION", &gomp_cancel_var);
   parse_int ("OMP_DEFAULT_DEVICE", &gomp_global_icv.default_device_var, true);
   parse_unsigned_long ("OMP_MAX_ACTIVE_LEVELS", &gomp_max_active_levels_var,
@@ -739,7 +1182,14 @@ initialize_env (void)
 				 &gomp_nthreads_var_list,
 				 &gomp_nthreads_var_list_len))
     gomp_global_icv.nthreads_var = gomp_available_cpus;
-  if (parse_affinity () || bind_var)
+  if (!parse_bind_var ("OMP_PROC_BIND",
+		       &gomp_global_icv.bind_var,
+		       &gomp_bind_var_list,
+		       &gomp_bind_var_list_len))
+    gomp_global_icv.bind_var = omp_proc_bind_false;
+  if (parse_places_var ("OMP_PLACES")
+      || parse_affinity ()
+      || gomp_global_icv.bind_var)
     gomp_init_affinity ();
   wait_policy = parse_wait_policy ();
   if (!parse_spincount ("GOMP_SPINCOUNT", &gomp_spin_count_var))
@@ -791,7 +1241,7 @@ initialize_env (void)
 	gomp_error ("Stack size change failed: %s", strerror (err));
     }
 
-  handle_omp_display_env (bind_var, stacksize, wait_policy);
+  handle_omp_display_env (stacksize, wait_policy);
 }
 
 
@@ -900,7 +1350,8 @@ omp_get_cancellation (void)
 omp_proc_bind_t
 omp_get_proc_bind (void)
 {
-  return omp_proc_bind_false;
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->bind_var;
 }
 
 void
--- libgomp/config/posix/affinity.c.jj	2013-03-20 10:02:06.000000000 +0100
+++ libgomp/config/posix/affinity.c	2013-10-02 19:37:03.539151309 +0200
@@ -32,7 +32,84 @@ gomp_init_affinity (void)
 }
 
 void
-gomp_init_thread_affinity (pthread_attr_t *attr)
+gomp_init_thread_affinity (pthread_attr_t *attr, unsigned int place)
 {
   (void) attr;
+  (void) place;
+}
+
+void **
+gomp_affinity_alloc (unsigned long count, bool quiet)
+{
+  (void) count;
+  if (!quiet)
+    gomp_error ("Affinity not supported on this configuration");
+  return NULL;
+}
+
+void
+gomp_affinity_init_place (void *p)
+{
+  (void) p;
+}
+
+bool
+gomp_affinity_add_cpus (void *p, unsigned long num,
+			unsigned long len, long stride, bool quiet)
+{
+  (void) p;
+  (void) num;
+  (void) len;
+  (void) stride;
+  (void) quiet;
+  return false;
+}
+
+bool
+gomp_affinity_remove_cpu (void *p, unsigned long num)
+{
+  (void) p;
+  (void) num;
+  return false;
+}
+
+bool
+gomp_affinity_copy_place (void *p, void *q, long stride)
+{
+  (void) p;
+  (void) q;
+  (void) stride;
+  return false;
+}
+
+bool
+gomp_affinity_same_place (void *p, void *q)
+{
+  (void) p;
+  (void) q;
+  return false;
+}
+
+bool
+gomp_affinity_finalize_place_list (bool quiet)
+{
+  (void) quiet;
+  return false;
+}
+
+bool
+gomp_affinity_init_level (int level, unsigned long count, bool quiet)
+{
+  (void) level;
+  (void) count;
+  (void) quiet;
+  if (!quiet)
+    gomp_error ("Affinity not supported on this configuration");
+  return NULL;
+}
+
+void
+gomp_affinity_print_place (void *p)
+{
+  (void) p;
 }
--- libgomp/config/linux/proc.c.jj	2013-10-01 14:09:00.000000000 +0200
+++ libgomp/config/linux/proc.c	2013-10-04 09:03:01.908352129 +0200
@@ -56,9 +56,9 @@ gomp_cpuset_popcount (cpu_set_t *cpusetp
 #endif
   size_t i;
   unsigned long ret = 0;
-  extern int check[sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int)];
+  extern int check[sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int)
+		   ? 1 : -1];
 
-  (void) check;
   for (i = 0; i < gomp_cpuset_size / sizeof (cpusetp->__bits[0]); i++)
     {
       unsigned long int mask = cpusetp->__bits[i];
@@ -82,7 +82,7 @@ gomp_init_num_threads (void)
   gomp_cpuset_size = sysconf (_SC_NPROCESSORS_CONF);
   gomp_cpuset_size = CPU_ALLOC_SIZE (gomp_cpuset_size);
 #else
-  gomp_cpuset_size = sizeof (cpuset);
+  gomp_cpuset_size = sizeof (cpu_set_t);
 #endif
 
   gomp_cpusetp = (cpu_set_t *) gomp_malloc (gomp_cpuset_size);
@@ -92,7 +92,11 @@ gomp_init_num_threads (void)
       /* Count only the CPUs this process can use.  */
       gomp_global_icv.nthreads_var = gomp_cpuset_popcount (gomp_cpusetp);
       if (gomp_global_icv.nthreads_var == 0)
-	gomp_global_icv.nthreads_var = 1;
+	{
+	  gomp_global_icv.nthreads_var = 1;
+	  free (gomp_cpusetp);
+	  gomp_cpusetp = NULL;
+	}
       return;
     }
   else
@@ -110,7 +114,7 @@ static int
 get_num_procs (void)
 {
 #ifdef HAVE_PTHREAD_AFFINITY_NP
-  if (gomp_cpu_affinity == NULL)
+  if (gomp_places_list == NULL)
     {
       /* Count only the CPUs this process can use.  */
       if (gomp_cpusetp
--- libgomp/config/linux/affinity.c.jj	2013-10-01 15:52:33.000000000 +0200
+++ libgomp/config/linux/affinity.c	2013-10-04 09:27:56.220320153 +0200
@@ -29,115 +29,326 @@
 #endif
 #include "libgomp.h"
 #include "proc.h"
+#include <errno.h>
 #include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
 #include <unistd.h>
 
 #ifdef HAVE_PTHREAD_AFFINITY_NP
-static unsigned int affinity_counter;
 
 #ifndef CPU_ALLOC_SIZE
 #define CPU_ISSET_S(idx, size, set) CPU_ISSET(idx, set)
 #define CPU_ZERO_S(size, set) CPU_ZERO(set)
 #define CPU_SET_S(idx, size, set) CPU_SET(idx, set)
+#define CPU_CLR_S(idx, size, set) CPU_CLR(idx, set)
 #endif
 
 void
 gomp_init_affinity (void)
 {
-  size_t idx, widx;
-  unsigned long cpus = 0;
-  cpu_set_t *cpusetnewp;
+  if (gomp_places_list == NULL)
+    {
+      if (!gomp_affinity_init_level (1, ULONG_MAX, true))
+	return;
+    }
+
+  struct gomp_thread *thr = gomp_thread ();
+  pthread_setaffinity_np (pthread_self (), gomp_cpuset_size,
+			  (cpu_set_t *) gomp_places_list[0]);
+  thr->place = 1;
+  thr->ts.place_partition_off = 0;
+  thr->ts.place_partition_len = gomp_places_list_len;
+}
+
+void
+gomp_init_thread_affinity (pthread_attr_t *attr, unsigned int place)
+{
+  pthread_attr_setaffinity_np (attr, gomp_cpuset_size,
+			       (cpu_set_t *) gomp_places_list[place]);
+}
+
+void **
+gomp_affinity_alloc (unsigned long count, bool quiet)
+{
+  unsigned long i;
+  void **ret;
+  char *p;
 
   if (gomp_cpusetp == NULL)
     {
-      gomp_error ("could not get CPU affinity set");
-      free (gomp_cpu_affinity);
-      gomp_cpu_affinity = NULL;
-      gomp_cpu_affinity_len = 0;
-      return;
+      if (!quiet)
+	gomp_error ("Could not get CPU affinity set");
+      return NULL;
     }
 
-#ifdef CPU_ALLOC_SIZE
-  cpusetnewp = (cpu_set_t *) gomp_alloca (gomp_cpuset_size);
-#else
-  cpu_set_t cpusetnew;
-  cpusetnewp = &cpusetnew;
-#endif
+  ret = malloc (count * sizeof (void *) + count * gomp_cpuset_size);
+  if (ret == NULL)
+    {
+      if (!quiet)
+	gomp_error ("Out of memory trying to allocate places list");
+      return NULL;
+    }
 
-  if (gomp_cpu_affinity_len == 0)
+  p = (char *) (ret + count);
+  for (i = 0; i < count; i++, p += gomp_cpuset_size)
+    ret[i] = p;
+  return ret;
+}
+
+void
+gomp_affinity_init_place (void *p)
+{
+  cpu_set_t *cpusetp = (cpu_set_t *) p;
+  CPU_ZERO_S (gomp_cpuset_size, cpusetp);
+}
+
+bool
+gomp_affinity_add_cpus (void *p, unsigned long num,
+			unsigned long len, long stride, bool quiet)
+{
+  cpu_set_t *cpusetp = (cpu_set_t *) p;
+  unsigned long max = 8 * gomp_cpuset_size;
+  for (;;)
     {
-      unsigned long count = gomp_cpuset_popcount (gomp_cpusetp);
-      if (count >= 65536)
-	count = 65536;
-      gomp_cpu_affinity = malloc (count * sizeof (unsigned short));
-      if (gomp_cpu_affinity == NULL)
+      if (num >= max)
 	{
-	  gomp_error ("not enough memory to store CPU affinity list");
-	  return;
+	  if (!quiet)
+	    gomp_error ("Logical CPU number %lu out of range", num);
+	  return false;
 	}
-      for (widx = idx = 0; widx < count && idx < 65536; idx++)
-	if (CPU_ISSET_S (idx, gomp_cpuset_size, gomp_cpusetp))
-	  {
-	    cpus++;
-	    gomp_cpu_affinity[widx++] = idx;
-	  }
-    }
-  else
-    {
-      CPU_ZERO_S (gomp_cpuset_size, cpusetnewp);
-      for (widx = idx = 0; idx < gomp_cpu_affinity_len; idx++)
-	if (gomp_cpu_affinity[idx] < 8 * gomp_cpuset_size
-	    && CPU_ISSET_S (gomp_cpu_affinity[idx], gomp_cpuset_size,
-			    gomp_cpusetp))
-	  {
-	    if (! CPU_ISSET_S (gomp_cpu_affinity[idx], gomp_cpuset_size,
-			       cpusetnewp))
-	      {
-		cpus++;
-		CPU_SET_S (gomp_cpu_affinity[idx], gomp_cpuset_size,
-			   cpusetnewp);
-	    }
-	  gomp_cpu_affinity[widx++] = gomp_cpu_affinity[idx];
+      CPU_SET_S (num, gomp_cpuset_size, cpusetp);
+      if (--len == 0)
+	return true;
+      if ((stride < 0 && num + stride > num)
+	  || (stride > 0 && num + stride < num))
+	{
+	  if (!quiet)
+	    gomp_error ("Logical CPU number %lu+%ld out of range",
+			num, stride);
+	  return false;
 	}
+      num += stride;
     }
+}
 
-  if (widx == 0)
+bool
+gomp_affinity_remove_cpu (void *p, unsigned long num)
+{
+  cpu_set_t *cpusetp = (cpu_set_t *) p;
+  if (num >= 8 * gomp_cpuset_size)
+    {
+      gomp_error ("Logical CPU number %lu out of range", num);
+      return false;
+    }
+  if (!CPU_ISSET_S (num, gomp_cpuset_size, cpusetp))
     {
-      gomp_error ("no CPUs left for affinity setting");
-      free (gomp_cpu_affinity);
-      gomp_cpu_affinity = NULL;
-      gomp_cpu_affinity_len = 0;
-      return;
+      gomp_error ("Logical CPU %lu to be removed is not in the set", num);
+      return false;
     }
+  CPU_CLR_S (num, gomp_cpuset_size, cpusetp);
+  return true;
+}
 
-  gomp_cpu_affinity_len = widx;
-  if (cpus < gomp_available_cpus)
-    gomp_available_cpus = cpus;
-  CPU_ZERO_S (gomp_cpuset_size, cpusetnewp);
-  CPU_SET_S (gomp_cpu_affinity[0], gomp_cpuset_size, cpusetnewp);
-  pthread_setaffinity_np (pthread_self (), gomp_cpuset_size,
-			  cpusetnewp);
-  affinity_counter = 1;
+bool
+gomp_affinity_copy_place (void *p, void *q, long stride)
+{
+  unsigned long i, max = 8 * gomp_cpuset_size;
+  cpu_set_t *destp = (cpu_set_t *) p;
+  cpu_set_t *srcp = (cpu_set_t *) q;
+
+  CPU_ZERO_S (gomp_cpuset_size, destp);
+  for (i = 0; i < max; i++)
+    if (CPU_ISSET_S (i, gomp_cpuset_size, srcp))
+      {
+	if ((stride < 0 && i + stride > i)
+	    || (stride > 0 && (i + stride < i || i + stride >= max)))
+	  {
+	    gomp_error ("Logical CPU number %lu+%ld out of range", i, stride);
+	    return false;
+	  }
+	CPU_SET_S (i + stride, gomp_cpuset_size, destp);
+      }
+  return true;
 }
 
-void
-gomp_init_thread_affinity (pthread_attr_t *attr)
+bool
+gomp_affinity_same_place (void *p, void *q)
+{
+#ifdef CPU_EQUAL_S
+  return CPU_EQUAL_S (gomp_cpuset_size, (cpu_set_t *) p, (cpu_set_t *) q);
+#else
+  return memcmp (p, q, gomp_cpuset_size) == 0;
+#endif
+}
+
+bool
+gomp_affinity_finalize_place_list (bool quiet)
 {
-  unsigned int cpu;
-  cpu_set_t *cpusetp;
+  unsigned long i, j;
 
-#ifdef CPU_ALLOC_SIZE
-  cpusetp = (cpu_set_t *) gomp_alloca (gomp_cpuset_size);
+  for (i = 0, j = 0; i < gomp_places_list_len; i++)
+    {
+      cpu_set_t *cpusetp = (cpu_set_t *) gomp_places_list[i];
+      bool nonempty = false;
+#ifdef CPU_AND_S
+      CPU_AND_S (gomp_cpuset_size, cpusetp, cpusetp, gomp_cpusetp);
+      nonempty = gomp_cpuset_popcount (cpusetp) != 0;
 #else
-  cpu_set_t cpuset;
-  cpusetp = &cpuset;
+      unsigned long k, max = gomp_cpuset_size / sizeof (cpusetp->__bits[0]);
+      for (k = 0; k < max; k++)
+	if ((cpusetp->__bits[k] &= gomp_cpusetp->__bits[k]) != 0)
+	  nonempty = true;
 #endif
+      if (nonempty)
+	gomp_places_list[j++] = gomp_places_list[i];
+    }
 
-  cpu = __atomic_fetch_add (&affinity_counter, 1, MEMMODEL_RELAXED);
-  cpu %= gomp_cpu_affinity_len;
-  CPU_ZERO_S (gomp_cpuset_size, cpusetp);
-  CPU_SET_S (gomp_cpu_affinity[cpu], gomp_cpuset_size, cpusetp);
-  pthread_attr_setaffinity_np (attr, gomp_cpuset_size, cpusetp);
+  if (j == 0)
+    {
+      if (!quiet)
+	gomp_error ("None of the places contain usable logical CPUs");
+      return false;
+    }
+  else if (j < gomp_places_list_len)
+    {
+      if (!quiet)
+	gomp_error ("Number of places reduced from %ld to %ld because some "
+		    "places didn't contain any usable logical CPUs",
+		    gomp_places_list_len, j);
+      gomp_places_list_len = j;
+    }
+  return true;
+}
+
+bool
+gomp_affinity_init_level (int level, unsigned long count, bool quiet)
+{
+  unsigned long i, max = 8 * gomp_cpuset_size;
+
+  if (gomp_cpusetp)
+    {
+      unsigned long maxcount = gomp_cpuset_popcount (gomp_cpusetp);
+      if (count > maxcount)
+	count = maxcount;
+    }
+  gomp_places_list = gomp_affinity_alloc (count, quiet);
+  gomp_places_list_len = 0;
+  if (gomp_places_list == NULL)
+    return false;
+  /* SMT (threads).  */
+  if (level == 1)
+    {
+      for (i = 0; i < max && gomp_places_list_len < count; i++)
+	if (CPU_ISSET_S (i, gomp_cpuset_size, gomp_cpusetp))
+	  {
+	    gomp_affinity_init_place (gomp_places_list[gomp_places_list_len]);
+	    gomp_affinity_add_cpus (gomp_places_list[gomp_places_list_len],
+				    i, 1, 0, true);
+	    ++gomp_places_list_len;
+	  }
+      return true;
+    }
+  else
+    {
+      char name[sizeof ("/sys/devices/system/cpu/cpu/topology/"
+			"thread_siblings_list") + 3 * sizeof (unsigned long)];
+      size_t prefix_len = sizeof ("/sys/devices/system/cpu/cpu") - 1;
+      cpu_set_t *copy = gomp_alloca (gomp_cpuset_size);
+      FILE *f;
+      char *line = NULL;
+      size_t linelen = 0;
+
+      memcpy (name, "/sys/devices/system/cpu/cpu", prefix_len);
+      memcpy (copy, gomp_cpusetp, gomp_cpuset_size);
+      for (i = 0; i < max && gomp_places_list_len < count; i++)
+	if (CPU_ISSET_S (i, gomp_cpuset_size, copy))
+	  {
+	    sprintf (name + prefix_len, "%lu/topology/%s_siblings_list",
+		     i, level == 2 ? "thread" : "core");
+	    f = fopen (name, "r");
+	    if (f != NULL)
+	      {
+		if (getline (&line, &linelen, f) > 0)
+		  {
+		    char *p = line;
+		    bool seen_i = false;
+		    void *pl = gomp_places_list[gomp_places_list_len];
+		    gomp_affinity_init_place (pl);
+		    while (*p && *p != '\n')
+		      {
+			unsigned long first, last;
+			errno = 0;
+			first = strtoul (p, &p, 10);
+			if (errno)
+			  break;
+			last = first;
+			if (*p == '-')
+			  {
+			    errno = 0;
+			    last = strtoul (p + 1, &p, 10);
+			    if (errno || last < first)
+			      break;
+			  }
+			for (; first <= last; first++)
+			  if (CPU_ISSET_S (first, gomp_cpuset_size, copy)
+			      && gomp_affinity_add_cpus (pl, first, 1, 0,
+							 true))
+			    {
+			      CPU_CLR_S (first, gomp_cpuset_size, copy);
+			      if (first == i)
+				seen_i = true;
+			    }
+			if (*p == ',')
+			  ++p;
+		      }
+		    if (seen_i)
+		      gomp_places_list_len++;
+		  }
+		fclose (f);
+	      }
+	  }
+      if (gomp_places_list == 0)
+	{
+	  if (!quiet)
+	    gomp_error ("Error reading %s topology",
+			level == 2 ? "core" : "socket");
+	  free (gomp_places_list);
+	  gomp_places_list = NULL;
+	  return false;
+	}
+      return true;
+    }
+  return false;
+}
+
+void
+gomp_affinity_print_place (void *p)
+{
+  unsigned long i, max = 8 * gomp_cpuset_size, len;
+  cpu_set_t *cpusetp = (cpu_set_t *) p;
+  bool notfirst = false;
+
+  for (i = 0, len = 0; i < max; i++)
+    if (CPU_ISSET_S (i, gomp_cpuset_size, cpusetp))
+      {
+	if (len == 0)
+	  {
+	    if (notfirst)
+	      fputc (',', stderr);
+	    notfirst = true;
+	    fprintf (stderr, "%lu", i);
+	  }
+	++len;
+      }
+    else
+      {
+	if (len > 1)
+	  fprintf (stderr, ":%lu", len);
+	len = 0;
+      }
+  if (len > 1)
+    fprintf (stderr, ":%lu", len);
 }
 
 #else
--- libgomp/parallel.c.jj	2013-09-25 09:58:13.000000000 +0200
+++ libgomp/parallel.c	2013-10-02 16:01:09.708768043 +0200
@@ -105,7 +105,7 @@ void
 GOMP_parallel_start (void (*fn) (void *), void *data, unsigned num_threads)
 {
   num_threads = gomp_resolve_num_threads (num_threads, 0);
-  gomp_team_start (fn, data, num_threads, gomp_new_team (num_threads));
+  gomp_team_start (fn, data, num_threads, 0, gomp_new_team (num_threads));
 }
 
 void
@@ -134,9 +134,8 @@ ialias (GOMP_parallel_end)
 void
 GOMP_parallel (void (*fn) (void *), void *data, unsigned num_threads, unsigned int flags)
 {
-  (void) flags;
   num_threads = gomp_resolve_num_threads (num_threads, 0);
-  gomp_team_start (fn, data, num_threads, gomp_new_team (num_threads));
+  gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
   fn (data);
   ialias_call (GOMP_parallel_end) ();
 }
--- libgomp/sections.c.jj	2013-09-24 12:52:53.000000000 +0200
+++ libgomp/sections.c	2013-10-02 16:01:01.666812852 +0200
@@ -139,7 +139,7 @@ GOMP_parallel_sections_start (void (*fn)
   num_threads = gomp_resolve_num_threads (num_threads, count);
   team = gomp_new_team (num_threads);
   gomp_sections_init (&team->work_shares[0], count);
-  gomp_team_start (fn, data, num_threads, team);
+  gomp_team_start (fn, data, num_threads, 0, team);
 }
 
 ialias_redirect (GOMP_parallel_end)
@@ -150,11 +150,10 @@ GOMP_parallel_sections (void (*fn) (void
 {
   struct gomp_team *team;
 
-  (void) flags;
   num_threads = gomp_resolve_num_threads (num_threads, count);
   team = gomp_new_team (num_threads);
   gomp_sections_init (&team->work_shares[0], count);
-  gomp_team_start (fn, data, num_threads, team);
+  gomp_team_start (fn, data, num_threads, flags, team);
   fn (data);
   GOMP_parallel_end ();
 }
--- libgomp/loop.c.jj	2013-09-24 12:52:53.000000000 +0200
+++ libgomp/loop.c	2013-10-02 16:01:51.174580453 +0200
@@ -439,14 +439,14 @@ static void
 gomp_parallel_loop_start (void (*fn) (void *), void *data,
 			  unsigned num_threads, long start, long end,
 			  long incr, enum gomp_schedule_type sched,
-			  long chunk_size)
+			  long chunk_size, unsigned int flags)
 {
   struct gomp_team *team;
 
   num_threads = gomp_resolve_num_threads (num_threads, 0);
   team = gomp_new_team (num_threads);
   gomp_loop_init (&team->work_shares[0], start, end, incr, sched, chunk_size);
-  gomp_team_start (fn, data, num_threads, team);
+  gomp_team_start (fn, data, num_threads, flags, team);
 }
 
 void
@@ -455,7 +455,7 @@ GOMP_parallel_loop_static_start (void (*
 				 long incr, long chunk_size)
 {
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_STATIC, chunk_size);
+			    GFS_STATIC, chunk_size, 0);
 }
 
 void
@@ -464,7 +464,7 @@ GOMP_parallel_loop_dynamic_start (void (
 				  long incr, long chunk_size)
 {
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_DYNAMIC, chunk_size);
+			    GFS_DYNAMIC, chunk_size, 0);
 }
 
 void
@@ -473,7 +473,7 @@ GOMP_parallel_loop_guided_start (void (*
 				 long incr, long chunk_size)
 {
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_GUIDED, chunk_size);
+			    GFS_GUIDED, chunk_size, 0);
 }
 
 void
@@ -483,7 +483,7 @@ GOMP_parallel_loop_runtime_start (void (
 {
   struct gomp_task_icv *icv = gomp_icv (false);
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    icv->run_sched_var, icv->run_sched_modifier);
+			    icv->run_sched_var, icv->run_sched_modifier, 0);
 }
 
 ialias_redirect (GOMP_parallel_end)
@@ -493,9 +493,8 @@ GOMP_parallel_loop_static (void (*fn) (v
 			   unsigned num_threads, long start, long end,
 			   long incr, long chunk_size, unsigned flags)
 {
-  (void) flags;
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_STATIC, chunk_size);
+			    GFS_STATIC, chunk_size, flags);
   fn (data);
   GOMP_parallel_end ();
 }
@@ -505,9 +504,8 @@ GOMP_parallel_loop_dynamic (void (*fn) (
 			    unsigned num_threads, long start, long end,
 			    long incr, long chunk_size, unsigned flags)
 {
-  (void) flags;
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_DYNAMIC, chunk_size);
+			    GFS_DYNAMIC, chunk_size, flags);
   fn (data);
   GOMP_parallel_end ();
 }
@@ -517,9 +515,8 @@ GOMP_parallel_loop_guided (void (*fn) (v
 			  unsigned num_threads, long start, long end,
 			  long incr, long chunk_size, unsigned flags)
 {
-  (void) flags;
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    GFS_GUIDED, chunk_size);
+			    GFS_GUIDED, chunk_size, flags);
   fn (data);
   GOMP_parallel_end ();
 }
@@ -529,10 +526,10 @@ GOMP_parallel_loop_runtime (void (*fn) (
 			    unsigned num_threads, long start, long end,
 			    long incr, unsigned flags)
 {
-  (void) flags;
   struct gomp_task_icv *icv = gomp_icv (false);
   gomp_parallel_loop_start (fn, data, num_threads, start, end, incr,
-			    icv->run_sched_var, icv->run_sched_modifier);
+			    icv->run_sched_var, icv->run_sched_modifier,
+			    flags);
   fn (data);
   GOMP_parallel_end ();
 }
--- libgomp/testsuite/libgomp.c/affinity-1.c.jj	2013-10-03 12:10:14.672443228 +0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c	2013-10-04 09:51:49.561581476 +0200
@@ -0,0 +1,1133 @@
+/* Affinity tests.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_PROC_BIND "false" } */
+/* { dg-additional-options "-DINTERPOSE_GETAFFINITY -DDO_FORK -ldl" { target *-*-linux* } } */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include "config.h"
+#include <alloca.h>
+#include <omp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#ifdef DO_FORK
+#include <signal.h>
+#endif
+#ifdef HAVE_PTHREAD_AFFINITY_NP
+#include <sched.h>
+#include <pthread.h>
+#ifdef INTERPOSE_GETAFFINITY
+#include <dlfcn.h>
+#endif
+#endif
+
+struct place
+{
+  int start, len;
+};
+struct places
+{
+  char name[40];
+  int count;
+  struct place places[8];
+} places_array[] = {
+  { "", 1, { { -1, -1 } } },
+  { "{0}:8", 8,
+    { { 0, 1 }, { 1, 1 }, { 2, 1 }, { 3, 1 },
+      { 4, 1 }, { 5, 1 }, { 6, 1 }, { 7, 1 } } },
+  { "{7,6}:2:-3", 2, { { 6, 2 }, { 3, 2 } } },
+  { "{6,7}:4:-2,!{2,3}", 3, { { 6, 2 }, { 4, 2 }, { 0, 2 } } },
+  { "{1}:7:1", 7,
+    { { 1, 1 }, { 2, 1 }, { 3, 1 },
+      { 4, 1 }, { 5, 1 }, { 6, 1 }, { 7, 1 } } },
+  { "{0,1},{3,2,4},{6,5,!6},{6},{7:2:-1,!6}", 5,
+    { { 0, 2 }, { 2, 3 }, { 5, 1 }, { 6, 1 }, { 7, 1 } } }
+};
+
+unsigned long contig_cpucount;
+
+#if defined (HAVE_PTHREAD_AFFINITY_NP) && defined (_SC_NPROCESSORS_CONF) \
+    && defined (CPU_ALLOC_SIZE)
+
+#if defined (RTLD_NEXT) && defined (INTERPOSE_GETAFFINITY)
+int (*orig_getaffinity_np) (pthread_t, size_t, cpu_set_t *);
+
+int
+pthread_getaffinity_np (pthread_t thread, size_t cpusetsize, cpu_set_t *cpuset)
+{
+  int ret;
+  if (orig_getaffinity_np == NULL)
+    {
+      unsigned long i, max;
+      orig_getaffinity_np = (int (*) (pthread_t, size_t, cpu_set_t *))
+			    dlsym (RTLD_NEXT, "pthread_getaffinity_np");
+      if (orig_getaffinity_np == NULL)
+	exit (0);
+      ret = orig_getaffinity_np (thread, cpusetsize, cpuset);
+      if (ret != 0)
+	return ret;
+      max = 8 * cpusetsize;
+      for (i = 0; i < max; i++)
+	if (!CPU_ISSET_S (i, cpusetsize, cpuset))
+	  break;
+      contig_cpucount = i;
+    }
+  return orig_getaffinity_np (thread, cpusetsize, cpuset);
+}
+#endif
+
+void
+print_affinity (struct place p)
+{
+  static unsigned long size;
+  if (size == 0)
+    {
+      size = sysconf (_SC_NPROCESSORS_CONF);
+      size = CPU_ALLOC_SIZE (size);
+    }
+  cpu_set_t *cpusetp = (cpu_set_t *) alloca (size);
+  if (pthread_getaffinity_np (pthread_self (), size, cpusetp) == 0)
+    {
+      unsigned long i, len, max = 8 * size;
+      int notfirst = 0, unexpected = 1;
+
+      printf (" bound to {");
+      for (i = 0, len = 0; i < max; i++)
+	if (CPU_ISSET_S (i, size, cpusetp))
+	  {
+	    if (len == 0)
+	      {
+		if (notfirst)
+		  {
+		    unexpected = 1;
+		    printf (",");
+		  }
+		else if (i == (unsigned long) p.start)
+		  unexpected = 0;
+		notfirst = 1;
+		printf ("%lu", i);
+	      }
+	    ++len;
+	  }
+	else
+	  {
+	    if (len && len != (unsigned long) p.len)
+	      unexpected = 1;
+	    if (len > 1)
+	      printf (":%lu", len);
+	    len = 0;
+	  }
+      if (len && len != (unsigned long) p.len)
+	unexpected = 1;
+      if (len > 1)
+	printf (":%lu", len);
+      printf ("}");
+      if (p.start != -1 && unexpected)
+	{
+	  printf (", expected {%d", p.start);
+	  if (p.len != 1)
+	    printf (":%d", p.len);
+	  printf ("} instead");
+	}
+      else if (p.start != -1)
+	printf (", verified");
+    }
+}
+#else
+void
+print_affinity (struct place p)
+{
+  (void) p.start;
+  (void) p.len;
+}
+#endif
+
+
+int
+main ()
+{
+  char *env_proc_bind = getenv ("OMP_PROC_BIND");
+  int test_false = env_proc_bind && strcmp (env_proc_bind, "false") == 0;
+  int test_true = env_proc_bind && strcmp (env_proc_bind, "true") == 0;
+  int test_spread_master_close
+    = env_proc_bind && strcmp (env_proc_bind, "spread,master,close") == 0;
+  char *env_places = getenv ("OMP_PLACES");
+  int test_places = 0;
+
+#ifdef DO_FORK
+  if (env_places == NULL && contig_cpucount == 8 && test_false
+      && getenv ("GOMP_AFFINITY") == NULL)
+    {
+      int i, j, status;
+      pid_t pid;
+      for (j = 0; j < 2; j++)
+	{
+	  if (setenv ("OMP_PROC_BIND", j ? "spread,master,close" : "true", 1)
+	      < 0)
+	    break;
+	  for (i = sizeof (places_array) / sizeof (places_array[0]) - 1;
+	       i; --i)
+	    {
+	      if (setenv ("OMP_PLACES", places_array[i].name, 1) < 0)
+		break;
+	      pid = fork ();
+	      if (pid == -1)
+		break;
+	      if (pid == 0)
+		{
+		  execl ("/proc/self/exe", "affinity-1.exe", NULL);
+		  _exit (1);
+		}
+	      if (waitpid (pid, &status, 0) < 0)
+		break;
+	      if (WIFSIGNALED (status) && WTERMSIG (status) == SIGABRT)
+		abort ();
+	      else if (!WIFEXITED (status) || WEXITSTATUS (status) != 0)
+		break;
+	    }
+	  if (i)
+	    break;
+	}
+    }
+#endif
+
+  int first = 1;
+  if (env_proc_bind)
+    {
+      printf ("OMP_PROC_BIND='%s'", env_proc_bind);
+      first = 0;
+    }
+  if (env_places)
+    printf ("%sOMP_PLACES='%s'", first ? "" : " ", env_places);
+  printf ("\n");
+
+  if (env_places && contig_cpucount >= 8
+      && (test_true || test_spread_master_close))
+    {
+      for (test_places = sizeof (places_array) / sizeof (places_array[0]) - 1;
+	   test_places; --test_places)
+	if (strcmp (env_places, places_array[test_places].name) == 0)
+	  break;
+    }
+
+#define verify(if_true, if_s_m_c) \
+  if (test_false && omp_get_proc_bind () != omp_proc_bind_false)	\
+    abort ();								\
+  if (test_true && omp_get_proc_bind () != if_true)			\
+    abort ();								\
+  if (test_spread_master_close && omp_get_proc_bind () != if_s_m_c)	\
+    abort ();
+
+  verify (omp_proc_bind_true, omp_proc_bind_spread);
+
+  printf ("Initial thread");
+  print_affinity (places_array[test_places].places[0]);
+  printf ("\n");
+  omp_set_nested (1);
+
+  #pragma omp parallel if (0)
+  {
+    verify (omp_proc_bind_true, omp_proc_bind_master);
+    #pragma omp parallel if (0)
+    {
+      verify (omp_proc_bind_true, omp_proc_bind_close);
+      #pragma omp parallel if (0)
+      {
+	verify (omp_proc_bind_true, omp_proc_bind_close);
+      }
+      #pragma omp parallel if (0) proc_bind (spread)
+      {
+	verify (omp_proc_bind_spread, omp_proc_bind_spread);
+      }
+    }
+    #pragma omp parallel if (0) proc_bind (master)
+    {
+      verify (omp_proc_bind_master, omp_proc_bind_close);
+      #pragma omp parallel if (0)
+      {
+	verify (omp_proc_bind_master, omp_proc_bind_close);
+      }
+      #pragma omp parallel if (0) proc_bind (spread)
+      {
+	verify (omp_proc_bind_spread, omp_proc_bind_spread);
+      }
+    }
+  }
+
+  /* True/spread */
+  #pragma omp parallel num_threads (4)
+  {
+    verify (omp_proc_bind_true, omp_proc_bind_master);
+    #pragma omp critical
+    {
+      struct place p = places_array[0].places[0];
+      int thr = omp_get_thread_num ();
+      printf ("#1 thread %d", thr);
+      if (omp_get_num_threads () == 4 && test_spread_master_close)
+	switch (places_array[test_places].count)
+	  {
+	  case 8:
+	    /* T = 4, P = 8, each subpartition has 2 places.  */
+	  case 7:
+	    /* T = 4, P = 7, each subpartition has 2 places, but
+	       last partition, which has just one place.  */
+	    p = places_array[test_places].places[2 * thr];
+	    break;
+	  case 5:
+	    /* T = 4, P = 5, first subpartition has 2 places, the
+	       rest just one.  */
+	    p = places_array[test_places].places[thr ? 1 + thr : 0];
+	    break;
+	  case 3:
+	    /* T = 4, P = 3, unit sized subpartitions, first gets
+	       thr0 and thr3, second thr1, third thr2.  */
+	    p = places_array[test_places].places[thr == 3 ? 0 : thr];
+	    break;
+	  case 2:
+	    /* T = 4, P = 2, unit sized subpartitions, each with
+	       2 threads.  */
+	    p = places_array[test_places].places[thr / 2];
+	    break;
+	  }
+      print_affinity (p);
+      printf ("\n");
+    }
+    #pragma omp barrier
+    if (omp_get_thread_num () == 3)
+      {
+	/* True/spread, true/master.  */
+	#pragma omp parallel num_threads (3)
+	{
+	  verify (omp_proc_bind_true, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#1,#1 thread 3,%d", thr);
+	    if (omp_get_num_threads () == 3 && test_spread_master_close)
+	      /* Outer is spread, inner master, so just bind to the
+		 place or the master thread, which is thr 3 above.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		case 7:
+		  p = places_array[test_places].places[6];
+		  break;
+		case 5:
+		  p = places_array[test_places].places[4];
+		  break;
+		case 3:
+		  p = places_array[test_places].places[0];
+		  break;
+		case 2:
+		  p = places_array[test_places].places[1];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* True/spread, spread.  */
+	#pragma omp parallel num_threads (5) proc_bind (spread)
+	{
+	  verify (omp_proc_bind_spread, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#1,#2 thread 3,%d", thr);
+	    if (omp_get_num_threads () == 5 && test_spread_master_close)
+	      /* Outer is spread, inner spread.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 5, P = 2, unit sized subpartitions.  */
+		  p = places_array[test_places].places[thr == 4 ? 6
+						       : 6 + thr / 2];
+		  break;
+		/* The rest are T = 5, P = 1.  */
+		case 7:
+		  p = places_array[test_places].places[6];
+		  break;
+		case 5:
+		  p = places_array[test_places].places[4];
+		  break;
+		case 3:
+		  p = places_array[test_places].places[0];
+		  break;
+		case 2:
+		  p = places_array[test_places].places[1];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 3)
+	    {
+	      /* True/spread, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#1,#2,#1 thread 3,3,%d", thr);
+		  if (omp_get_num_threads () == 5 && test_spread_master_close)
+		    /* Outer is spread, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      /* All are T = 5, P = 1.  */
+		      case 8:
+			p = places_array[test_places].places[7];
+			break;
+		      case 7:
+			p = places_array[test_places].places[6];
+			break;
+		      case 5:
+			p = places_array[test_places].places[4];
+			break;
+		      case 3:
+			p = places_array[test_places].places[0];
+			break;
+		      case 2:
+			p = places_array[test_places].places[1];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	}
+	/* True/spread, master.  */
+	#pragma omp parallel num_threads (4) proc_bind(master)
+	{
+	  verify (omp_proc_bind_master, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#1,#3 thread 3,%d", thr);
+	    if (omp_get_num_threads () == 4 && test_spread_master_close)
+	      /* Outer is spread, inner master, so just bind to the
+		 place or the master thread, which is thr 3 above.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		case 7:
+		  p = places_array[test_places].places[6];
+		  break;
+		case 5:
+		  p = places_array[test_places].places[4];
+		  break;
+		case 3:
+		  p = places_array[test_places].places[0];
+		  break;
+		case 2:
+		  p = places_array[test_places].places[1];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* True/spread, close.  */
+	#pragma omp parallel num_threads (6) proc_bind (close)
+	{
+	  verify (omp_proc_bind_close, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#1,#4 thread 3,%d", thr);
+	    if (omp_get_num_threads () == 6 && test_spread_master_close)
+	      /* Outer is spread, inner close.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 6, P = 2, unit sized subpartitions.  */
+		  p = places_array[test_places].places[6 + thr / 3];
+		  break;
+		/* The rest are T = 6, P = 1.  */
+		case 7:
+		  p = places_array[test_places].places[6];
+		  break;
+		case 5:
+		  p = places_array[test_places].places[4];
+		  break;
+		case 3:
+		  p = places_array[test_places].places[0];
+		  break;
+		case 2:
+		  p = places_array[test_places].places[1];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+      }
+  }
+
+  /* Spread.  */
+  #pragma omp parallel num_threads (5) proc_bind(spread)
+  {
+    verify (omp_proc_bind_spread, omp_proc_bind_master);
+    #pragma omp critical
+    {
+      struct place p = places_array[0].places[0];
+      int thr = omp_get_thread_num ();
+      printf ("#2 thread %d", thr);
+      if (omp_get_num_threads () == 5
+	  && (test_spread_master_close || test_true))
+	switch (places_array[test_places].count)
+	  {
+	  case 8:
+	    /* T = 5, P = 8, first 3 subpartitions have 2 places, last
+	       2 one place.  */
+	    p = places_array[test_places].places[thr < 3 ? 2 * thr : 3 + thr];
+	    break;
+	  case 7:
+	    /* T = 5, P = 7, first 2 subpartitions have 2 places, last
+	       3 one place.  */
+	    p = places_array[test_places].places[thr < 2 ? 2 * thr : 2 + thr];
+	    break;
+	  case 5:
+	    /* T = 5, P = 5, unit sized subpartitions, each one with one
+	       thread.  */
+	    p = places_array[test_places].places[thr];
+	    break;
+	  case 3:
+	    /* T = 5, P = 3, unit sized subpartitions, first gets
+	       thr0 and thr3, second thr1 and thr4, third thr2.  */
+	    p = places_array[test_places].places[thr >= 3 ? thr - 3 : thr];
+	    break;
+	  case 2:
+	    /* T = 5, P = 2, unit sized subpartitions, first with
+	       thr{0,1,4} and second with thr{2,3}.  */
+	    p = places_array[test_places].places[thr == 4 ? 0 : thr / 2];
+	    break;
+	  }
+      print_affinity (p);
+      printf ("\n");
+    }
+    #pragma omp barrier
+    if (omp_get_thread_num () == 3)
+      {
+	int pp = 0;
+	switch (places_array[test_places].count)
+	  {
+	  case 8: pp = 6; break;
+	  case 7: pp = 5; break;
+	  case 5: pp = 3; break;
+	  case 2: pp = 1; break;
+	  }
+	/* Spread, spread/master.  */
+	#pragma omp parallel num_threads (3) firstprivate (pp)
+	{
+	  verify (omp_proc_bind_spread, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#2,#1 thread 3,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is spread, inner spread resp. master, bit we have
+		 just unit sized partitions.  */
+	      p = places_array[test_places].places[pp];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Spread, spread.  */
+	#pragma omp parallel num_threads (5) proc_bind (spread) \
+			     firstprivate (pp)
+	{
+	  verify (omp_proc_bind_spread, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#2,#2 thread 3,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is spread, inner spread, bit we have
+		 just unit sized partitions.  */
+	      p = places_array[test_places].places[pp];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Spread, master.  */
+	#pragma omp parallel num_threads (4) proc_bind(master) \
+			     firstprivate(pp)
+	{
+	  verify (omp_proc_bind_master, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#2,#3 thread 3,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is spread, inner master, bit we have
+		 just unit sized partitions.  */
+	      p = places_array[test_places].places[pp];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Spread, close.  */
+	#pragma omp parallel num_threads (6) proc_bind (close) \
+			     firstprivate (pp)
+	{
+	  verify (omp_proc_bind_close, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#2,#4 thread 3,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is spread, inner close, bit we have
+		 just unit sized partitions.  */
+	      p = places_array[test_places].places[pp];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+      }
+  }
+
+  /* Master.  */
+  #pragma omp parallel num_threads (3) proc_bind(master)
+  {
+    verify (omp_proc_bind_master, omp_proc_bind_master);
+    #pragma omp critical
+    {
+      struct place p = places_array[0].places[0];
+      int thr = omp_get_thread_num ();
+      printf ("#3 thread %d", thr);
+      if (test_spread_master_close || test_true)
+	p = places_array[test_places].places[0];
+      print_affinity (p);
+      printf ("\n");
+    }
+    #pragma omp barrier
+    if (omp_get_thread_num () == 2)
+      {
+	/* Master, master.  */
+	#pragma omp parallel num_threads (4)
+	{
+	  verify (omp_proc_bind_master, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#3,#1 thread 2,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is master, inner is master.  */
+	      p = places_array[test_places].places[0];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Master, spread.  */
+	#pragma omp parallel num_threads (4) proc_bind (spread)
+	{
+	  verify (omp_proc_bind_spread, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#3,#2 thread 2,%d", thr);
+	    if (omp_get_num_threads () == 4
+		&& (test_spread_master_close || test_true))
+	      /* Outer is master, inner is spread.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 4, P = 8, each subpartition has 2 places.  */
+		case 7:
+		  /* T = 4, P = 7, each subpartition has 2 places, but
+		     last partition, which has just one place.  */
+		  p = places_array[test_places].places[2 * thr];
+		  break;
+		case 5:
+		  /* T = 4, P = 5, first subpartition has 2 places, the
+		     rest just one.  */
+		  p = places_array[test_places].places[thr ? 1 + thr : 0];
+		  break;
+		case 3:
+		  /* T = 4, P = 3, unit sized subpartitions, first gets
+		     thr0 and thr3, second thr1, third thr2.  */
+		  p = places_array[test_places].places[thr == 3 ? 0 : thr];
+		  break;
+		case 2:
+		  /* T = 4, P = 2, unit sized subpartitions, each with
+		     2 threads.  */
+		  p = places_array[test_places].places[thr / 2];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 0)
+	    {
+	      /* Master, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#3,#2,#1 thread 2,0,%d", thr);
+		  if (omp_get_num_threads () == 5
+		      && (test_spread_master_close || test_true))
+		    /* Outer is master, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      /* First 3 are T = 5, P = 2.  */
+		      case 8:
+		      case 7:
+		      case 5:
+			p = places_array[test_places].places[(thr & 2) / 2];
+			break;
+		      /* All the rest are T = 5, P = 1.  */
+		      case 3:
+		      case 2:
+			p = places_array[test_places].places[0];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 3)
+	    {
+	      /* Master, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#3,#2,#2 thread 2,3,%d", thr);
+		  if (omp_get_num_threads () == 5
+		      && (test_spread_master_close || test_true))
+		    /* Outer is master, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      case 8:
+			/* T = 5, P = 2.  */
+			p = places_array[test_places].places[6
+							     + (thr & 2) / 2];
+			break;
+		      /* All the rest are T = 5, P = 1.  */
+		      case 7:
+			p = places_array[test_places].places[6];
+			break;
+		      case 5:
+			p = places_array[test_places].places[4];
+			break;
+		      case 3:
+			p = places_array[test_places].places[0];
+			break;
+		      case 2:
+			p = places_array[test_places].places[1];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	}
+	/* Master, master.  */
+	#pragma omp parallel num_threads (4) proc_bind(master)
+	{
+	  verify (omp_proc_bind_master, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#3,#3 thread 2,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is master, inner master.  */
+	      p = places_array[test_places].places[0];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Master, close.  */
+	#pragma omp parallel num_threads (6) proc_bind (close)
+	{
+	  verify (omp_proc_bind_close, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#3,#4 thread 2,%d", thr);
+	    if (omp_get_num_threads () == 6
+		&& (test_spread_master_close || test_true))
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 6, P = 8.  */
+		case 7:
+		  /* T = 6, P = 7.  */
+		  p = places_array[test_places].places[thr];
+		  break;
+		case 5:
+		  /* T = 6, P = 5.  thr{0,5} go into the first place.  */
+		  p = places_array[test_places].places[thr == 5 ? 0 : thr];
+		  break;
+		case 3:
+		  /* T = 6, P = 3, two threads into each place.  */
+		  p = places_array[test_places].places[thr / 2];
+		  break;
+		case 2:
+		  /* T = 6, P = 2, 3 threads into each place.  */
+		  p = places_array[test_places].places[thr / 3];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+      }
+  }
+
+  #pragma omp parallel num_threads (5) proc_bind(close)
+  {
+    verify (omp_proc_bind_close, omp_proc_bind_master);
+    #pragma omp critical
+    {
+      struct place p = places_array[0].places[0];
+      int thr = omp_get_thread_num ();
+      printf ("#4 thread %d", thr);
+      if (omp_get_num_threads () == 5
+	  && (test_spread_master_close || test_true))
+	switch (places_array[test_places].count)
+	  {
+	  case 8:
+	    /* T = 5, P = 8.  */
+	  case 7:
+	    /* T = 5, P = 7.  */
+	  case 5:
+	    /* T = 5, P = 5.  */
+	    p = places_array[test_places].places[thr];
+	    break;
+	  case 3:
+	    /* T = 5, P = 3, thr{0,3} in first place, thr{1,4} in second,
+	       thr2 in third.  */
+	    p = places_array[test_places].places[thr >= 3 ? thr - 3 : thr];
+	    break;
+	  case 2:
+	    /* T = 5, P = 2, thr{0,1,4} in first place, thr{2,3} in second.  */
+	    p = places_array[test_places].places[thr == 4 ? 0 : thr / 2];
+	    break;
+	  }
+      print_affinity (p);
+      printf ("\n");
+    }
+    #pragma omp barrier
+    if (omp_get_thread_num () == 2)
+      {
+	int pp = 0;
+	switch (places_array[test_places].count)
+	  {
+	  case 8:
+	  case 7:
+	  case 5:
+	  case 3:
+	    pp = 2;
+	    break;
+	  case 2:
+	    pp = 1;
+	    break;
+	  }
+	/* Close, close/master.  */
+	#pragma omp parallel num_threads (4) firstprivate (pp)
+	{
+	  verify (omp_proc_bind_close, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#4,#1 thread 2,%d", thr);
+	    if (test_spread_master_close)
+	      /* Outer is close, inner is master.  */
+	      p = places_array[test_places].places[pp];
+	    else if (omp_get_num_threads () == 4 && test_true)
+	      /* Outer is close, inner is close.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 4, P = 8.  */
+		case 7:
+		  /* T = 4, P = 7.  */
+		  p = places_array[test_places].places[2 + thr];
+		  break;
+		case 5:
+		  /* T = 4, P = 5.  There is wrap-around for thr3.  */
+		  p = places_array[test_places].places[thr == 3 ? 0 : 2 + thr];
+		  break;
+		case 3:
+		  /* T = 4, P = 3, thr{0,3} go into p2, thr1 into p0, thr2
+		     into p1.  */
+		  p = places_array[test_places].places[(2 + thr) % 3];
+		  break;
+		case 2:
+		  /* T = 4, P = 2, 2 threads into each place.  */
+		  p = places_array[test_places].places[1 - thr / 2];
+		  break;
+		}
+
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Close, spread.  */
+	#pragma omp parallel num_threads (4) proc_bind (spread)
+	{
+	  verify (omp_proc_bind_spread, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#4,#2 thread 2,%d", thr);
+	    if (omp_get_num_threads () == 4
+		&& (test_spread_master_close || test_true))
+	      /* Outer is close, inner is spread.  */
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 4, P = 8, each subpartition has 2 places.  */
+		case 7:
+		  /* T = 4, P = 7, each subpartition has 2 places, but
+		     last partition, which has just one place.  */
+		  p = places_array[test_places].places[thr == 3 ? 0
+						       : 2 + 2 * thr];
+		  break;
+		case 5:
+		  /* T = 4, P = 5, first subpartition has 2 places, the
+		     rest just one.  */
+		  p = places_array[test_places].places[thr == 3 ? 0
+						       : 2 + thr];
+		  break;
+		case 3:
+		  /* T = 4, P = 3, unit sized subpartitions, third gets
+		     thr0 and thr3, first thr1, second thr2.  */
+		  p = places_array[test_places].places[thr == 0 ? 2 : thr - 1];
+		  break;
+		case 2:
+		  /* T = 4, P = 2, unit sized subpartitions, each with
+		     2 threads.  */
+		  p = places_array[test_places].places[1 - thr / 2];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 0)
+	    {
+	      /* Close, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#4,#2,#1 thread 2,0,%d", thr);
+		  if (omp_get_num_threads () == 5
+		      && (test_spread_master_close || test_true))
+		    /* Outer is close, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      case 8:
+		      case 7:
+			/* T = 5, P = 2.  */
+			p = places_array[test_places].places[2
+							     + (thr & 2) / 2];
+			break;
+		      /* All the rest are T = 5, P = 1.  */
+		      case 5:
+		      case 3:
+			p = places_array[test_places].places[2];
+			break;
+		      case 2:
+			p = places_array[test_places].places[1];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 2)
+	    {
+	      /* Close, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#4,#2,#2 thread 2,2,%d", thr);
+		  if (omp_get_num_threads () == 5
+		      && (test_spread_master_close || test_true))
+		    /* Outer is close, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      case 8:
+			/* T = 5, P = 2.  */
+			p = places_array[test_places].places[6
+							     + (thr & 2) / 2];
+			break;
+		      /* All the rest are T = 5, P = 1.  */
+		      case 7:
+			p = places_array[test_places].places[6];
+			break;
+		      case 5:
+			p = places_array[test_places].places[4];
+			break;
+		      case 3:
+			p = places_array[test_places].places[1];
+			break;
+		      case 2:
+			p = places_array[test_places].places[0];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	  #pragma omp barrier
+	  if (omp_get_thread_num () == 3)
+	    {
+	      /* Close, spread, close.  */
+	      #pragma omp parallel num_threads (5) proc_bind (close)
+	      {
+		verify (omp_proc_bind_close, omp_proc_bind_close);
+		#pragma omp critical
+		{
+		  struct place p = places_array[0].places[0];
+		  int thr = omp_get_thread_num ();
+		  printf ("#4,#2,#3 thread 2,3,%d", thr);
+		  if (omp_get_num_threads () == 5
+		      && (test_spread_master_close || test_true))
+		    /* Outer is close, inner spread, innermost close.  */
+		    switch (places_array[test_places].count)
+		      {
+		      case 8:
+		      case 7:
+		      case 5:
+			/* T = 5, P = 2.  */
+			p = places_array[test_places].places[(thr & 2) / 2];
+			break;
+		      /* All the rest are T = 5, P = 1.  */
+		      case 3:
+			p = places_array[test_places].places[2];
+			break;
+		      case 2:
+			p = places_array[test_places].places[0];
+			break;
+		      }
+		  print_affinity (p);
+		  printf ("\n");
+		}
+	      }
+	    }
+	}
+	/* Close, master.  */
+	#pragma omp parallel num_threads (4) proc_bind(master) \
+			     firstprivate (pp)
+	{
+	  verify (omp_proc_bind_master, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#4,#3 thread 2,%d", thr);
+	    if (test_spread_master_close || test_true)
+	      /* Outer is close, inner master.  */
+	      p = places_array[test_places].places[pp];
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+	/* Close, close.  */
+	#pragma omp parallel num_threads (6) proc_bind (close)
+	{
+	  verify (omp_proc_bind_close, omp_proc_bind_close);
+	  #pragma omp critical
+	  {
+	    struct place p = places_array[0].places[0];
+	    int thr = omp_get_thread_num ();
+	    printf ("#4,#4 thread 2,%d", thr);
+	    if (omp_get_num_threads () == 6
+		&& (test_spread_master_close || test_true))
+	      switch (places_array[test_places].count)
+		{
+		case 8:
+		  /* T = 6, P = 8.  */
+		  p = places_array[test_places].places[2 + thr];
+		  break;
+		case 7:
+		  /* T = 6, P = 7.  */
+		  p = places_array[test_places].places[thr == 5 ? 0 : 2 + thr];
+		  break;
+		case 5:
+		  /* T = 6, P = 5.  thr{0,5} go into the third place.  */
+		  p = places_array[test_places].places[thr >= 3 ? thr - 3
+						       : 2 + thr];
+		  break;
+		case 3:
+		  /* T = 6, P = 3, two threads into each place.  */
+		  p = places_array[test_places].places[thr < 2 ? 2
+						       : thr / 2 - 1];
+		  break;
+		case 2:
+		  /* T = 6, P = 2, 3 threads into each place.  */
+		  p = places_array[test_places].places[1 - thr / 3];
+		  break;
+		}
+	    print_affinity (p);
+	    printf ("\n");
+	  }
+	}
+      }
+  }
+
+  return 0;
+}
--- libgomp/testsuite/libgomp.c++/affinity-1.C.jj	2013-10-03 19:23:10.108263078 +0200
+++ libgomp/testsuite/libgomp.c++/affinity-1.C	2013-10-03 19:23:06.169285928 +0200
@@ -0,0 +1,4 @@
+// { dg-do run }
+// { dg-set-target-env-var OMP_PROC_BIND "true" }
+
+#include "../libgomp.c/affinity-1.c"

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]