[gomp4] Target fallback ICV handling, ICV fixes
Jakub Jelinek
jakub@redhat.com
Fri Oct 4 20:15:00 GMT 2013
Hi!
I've committed the following patch to gomp-4.0-branch.
The omp-low.c changes are to fix some bugs with if clause on
#pragma omp target{, data, update}.
The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++.
The libgomp changes are:
1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment
ICV, rather than global var
2) gomp_remaining_threads_count has been removed, instead as required by the
spec ThreadsBusy from the spec is tracked per contention group inside
of thread_pool; if there is just one contention group, then the new
thr->thread_pool->threads_busy should be the difference between
icv->thread_limit_var and the old gomp_remaining_threads_count;
so, in gomp_resolve_num_threads now we add nthreads - 1 to it
rather than subtracting it
3) apparently the old OMP_THREAD_LIMIT code was buggy, because
GOMP_parallel_end was also subtracting from gomp_remaining_threads_count
rather than adding to it (with the new code it is correct to subtract;
when I get spare time I'll write a small alternative patch for the
release branches together with thread-limit-1.c testcase)
4) as the threads_busy count is now per-contention group, if a parallel
isn't nested, we actually don't need to atomically update the counter,
because there is just one thread in the contention group
5) gomp_managed_threads counter remains to be a global var, that is used
to decide about spinning length, that is desirable to be global and
is not user observable thing covered by the standard; I've just
renamed the mutex guarding it
6) for GOMP_target host fallback, the function will create a new initial
thread by making a copy of the old TLS *gomp_thread () and clearing it
(except for affinity place and reinitializing it's place var to the
whole place list), then restoring back
7) I've noticed that &thr->release semaphore is never used for the master
threads, so there is no point initializing it; we were initializing
it just for the first initial thread, e.g. not in subsequent user
pthread_create created threads that encounter #pragma omp constructs;
and the semaphore wasn't ever destroyed
8) GOMP_teams is now implemented for the host fallback just by adjusting
icv->thread_limit_var
9) on the target-7.c testcase I found several issues in the var remapping
code (some fields could be uninitialized in certain cases)
Tested on x86_64-linux, committed.
2013-10-04 Jakub Jelinek <jakub@redhat.com>
* omp-low.c (expand_omp_target): When handling IF clause on
#pragma omp target, split new_bb rather than entry_bb. If
not GF_OMP_TARGET_KIND_REGION, split new_bb right before
the GOMP_TARGET stmt, rather than after labels.
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to
201307 instead of 201107.
libgomp/
* libgomp.h (struct gomp_task_icv): Add thread_limit_var.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_lock): New variable.
(struct gomp_thread_pool): Add threads_busy field.
(gomp_free_thread): New prototype.
* parallel.c (gomp_resolve_num_threads): Adjust for
thread_limit now being in icv->thread_limit_var. Use
UINT_MAX instead of ULONG_MAX as infinity. If not nested,
just return minimum of max_num_threads and icv->thread_limit_var
and if thr->thread_pool, set threads_busy to the returned value.
Otherwise, don't update atomically gomp_remaining_threads_count,
but instead thr->thread_pool->threads_busy.
(GOMP_parallel_end): Adjust for thread_limit now being in
icv->thread_limit_var. Use UINT_MAX instead of ULONG_MAX as
infinity. Adjust threads_busy in the pool rather than
gomp_remaining_threads_count. Remember team->nthreads and call
gomp_team_end before adjusting threads_busy, if not nested
afterwards, just set it to 1 non-atomically.
* team.c (gomp_thread_start): Clear thr->thread_pool and
thr->task before returning.
(gomp_free_pool_helper): Clear thr->thread_pool and thr->task
before calling pthread_exit.
(gomp_free_thread): No longer static. Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_start): Set thr->thread_pool->threads_busy to
nthreads immediately after creating new pool. Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_end): Use gomp_managed_threads_lock instead of
gomp_remaining_threads_lock.
(initialize_team): Don't call gomp_sem_init here.
* env.c (gomp_global_icv): Initialize thread_limit_var field.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_locks): New variable.
(handle_omp_display_env): Adjust for thread_limit var being
in gomp_global_icv.
(initialize_env): Likewise. If user provided value is > INT_MAX,
set gomp_global_icv.thread_limit_var to UINT_MAX (infinity).
Initialize gomp_managed_threads_lock instead of
gomp_remaining_threads_lock.
(omp_get_thread_limit): Adjust for thread_limit var in
icv->thread_limit_var.
* target.c: Include limits.h.
(resolve_device): If device_id is < -1, return NULL.
(gomp_map_vars): Make sure tgt->array, tgt->to_free, tgt->tgt_start,
tgt->tgt_end and k->copy_from aren't left uninitialized.
(GOMP_target): Arrange for host callback to be performed in a
separate initial thread and contention group, inheriting ICVs from
gomp_global_icv etc.
(GOMP_teams): Adjust thread_limit_var ICV.
* testsuite/libgomp.c/affinity-1.c (main): Call omp_set_dynamic (0).
* testsuite/libgomp.c/target-3.c: New test.
* testsuite/libgomp.c/target-4.c: New test.
* testsuite/libgomp.c/target-5.c: New test.
* testsuite/libgomp.c/target-6.c: New test.
* testsuite/libgomp.c/target-7.c: New test.
* testsuite/libgomp.c/thread-limit-1.c: New test.
* testsuite/libgomp.c/thread-limit-2.c: New test.
* testsuite/libgomp.c/thread-limit-3.c: New test.
--- gcc/omp-low.c.jj 2013-09-26 09:53:03.000000000 +0200
+++ gcc/omp-low.c 2013-10-04 21:39:19.927947940 +0200
@@ -7881,9 +7881,16 @@ expand_omp_target (struct omp_region *re
tree tmp_var;
tmp_var = create_tmp_var (TREE_TYPE (device), NULL);
- e = split_block (entry_bb, NULL);
+ if (kind != GF_OMP_TARGET_KIND_REGION)
+ {
+ gsi = gsi_last_bb (new_bb);
+ gsi_prev (&gsi);
+ e = split_block (new_bb, gsi_stmt (gsi));
+ }
+ else
+ e = split_block (new_bb, NULL);
cond_bb = e->src;
- entry_bb = e->dest;
+ new_bb = e->dest;
remove_edge (e);
then_bb = create_empty_bb (cond_bb);
@@ -7892,7 +7899,7 @@ expand_omp_target (struct omp_region *re
set_immediate_dominator (CDI_DOMINATORS, else_bb, cond_bb);
stmt = gimple_build_cond_empty (cond);
- gsi = gsi_start_bb (cond_bb);
+ gsi = gsi_last_bb (cond_bb);
gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
gsi = gsi_start_bb (then_bb);
@@ -7911,8 +7918,8 @@ expand_omp_target (struct omp_region *re
add_bb_to_loop (then_bb, cond_bb->loop_father);
add_bb_to_loop (else_bb, cond_bb->loop_father);
}
- make_edge (then_bb, entry_bb, EDGE_FALLTHRU);
- make_edge (else_bb, entry_bb, EDGE_FALLTHRU);
+ make_edge (then_bb, new_bb, EDGE_FALLTHRU);
+ make_edge (else_bb, new_bb, EDGE_FALLTHRU);
device = tmp_var;
}
--- gcc/c-family/c-cppbuiltin.c.jj 2013-06-26 12:13:47.000000000 +0200
+++ gcc/c-family/c-cppbuiltin.c 2013-10-04 21:10:43.884067723 +0200
@@ -896,7 +896,7 @@ c_cpp_builtins (cpp_reader *pfile)
cpp_define (pfile, "__SSP__=1");
if (flag_openmp)
- cpp_define (pfile, "_OPENMP=201107");
+ cpp_define (pfile, "_OPENMP=201307");
if (int128_integer_type_node != NULL_TREE)
builtin_define_type_sizeof ("__SIZEOF_INT128__",
--- libgomp/libgomp.h.jj 2013-10-04 10:28:53.000000000 +0200
+++ libgomp/libgomp.h 2013-10-04 13:48:39.460550732 +0200
@@ -232,6 +232,7 @@ struct gomp_task_icv
enum gomp_schedule_type run_sched_var;
int run_sched_modifier;
int default_device_var;
+ unsigned int thread_limit_var;
bool dyn_var;
bool nest_var;
char bind_var;
@@ -240,10 +241,8 @@ struct gomp_task_icv
};
extern struct gomp_task_icv gomp_global_icv;
-extern unsigned long gomp_thread_limit_var;
-extern unsigned long gomp_remaining_threads_count;
#ifndef HAVE_SYNC_BUILTINS
-extern gomp_mutex_t gomp_remaining_threads_lock;
+extern gomp_mutex_t gomp_managed_threads_lock;
#endif
extern unsigned long gomp_max_active_levels_var;
extern bool gomp_cancel_var;
@@ -431,6 +430,8 @@ struct gomp_thread_pool
unsigned threads_size;
unsigned threads_used;
struct gomp_team *last_team;
+ /* Number of threads running in this contention group. */
+ unsigned long threads_busy;
/* This barrier holds and releases threads waiting in threads. */
gomp_barrier_t threads_dock;
@@ -580,6 +581,7 @@ extern struct gomp_team *gomp_new_team (
extern void gomp_team_start (void (*) (void *), void *, unsigned,
unsigned, struct gomp_team *);
extern void gomp_team_end (void);
+extern void gomp_free_thread (void *);
/* target.c */
--- libgomp/parallel.c.jj 2013-10-02 16:01:09.000000000 +0200
+++ libgomp/parallel.c 2013-10-04 14:54:47.843337279 +0200
@@ -37,18 +37,19 @@
unsigned
gomp_resolve_num_threads (unsigned specified, unsigned count)
{
- struct gomp_thread *thread = gomp_thread();
+ struct gomp_thread *thr = gomp_thread ();
struct gomp_task_icv *icv;
unsigned threads_requested, max_num_threads, num_threads;
- unsigned long remaining;
+ unsigned long busy;
+ struct gomp_thread_pool *pool;
icv = gomp_icv (false);
if (specified == 1)
return 1;
- else if (thread->ts.active_level >= 1 && !icv->nest_var)
+ else if (thr->ts.active_level >= 1 && !icv->nest_var)
return 1;
- else if (thread->ts.active_level >= gomp_max_active_levels_var)
+ else if (thr->ts.active_level >= gomp_max_active_levels_var)
return 1;
/* If NUM_THREADS not specified, use nthreads_var. */
@@ -72,30 +73,46 @@ gomp_resolve_num_threads (unsigned speci
max_num_threads = count;
}
- /* ULONG_MAX stands for infinity. */
- if (__builtin_expect (gomp_thread_limit_var == ULONG_MAX, 1)
+ /* UINT_MAX stands for infinity. */
+ if (__builtin_expect (icv->thread_limit_var == UINT_MAX, 1)
|| max_num_threads == 1)
return max_num_threads;
+ /* The threads_busy counter lives in thread_pool, if there
+ isn't a thread_pool yet, there must be just one thread
+ in the contention group. If thr->team is NULL, this isn't
+ nested parallel, so there is just one thread in the
+ contention group as well, no need to handle it atomically. */
+ pool = thr->thread_pool;
+ if (thr->ts.team == NULL)
+ {
+ num_threads = max_num_threads;
+ if (num_threads > icv->thread_limit_var)
+ num_threads = icv->thread_limit_var;
+ if (pool)
+ pool->threads_busy = num_threads;
+ return num_threads;
+ }
+
#ifdef HAVE_SYNC_BUILTINS
do
{
- remaining = gomp_remaining_threads_count;
+ busy = pool->threads_busy;
num_threads = max_num_threads;
- if (num_threads > remaining)
- num_threads = remaining + 1;
+ if (icv->thread_limit_var - busy + 1 < num_threads)
+ num_threads = icv->thread_limit_var - busy + 1;
}
- while (__sync_val_compare_and_swap (&gomp_remaining_threads_count,
- remaining, remaining - num_threads + 1)
- != remaining);
+ while (__sync_val_compare_and_swap (&pool->threads_busy,
+ busy, busy + num_threads - 1)
+ != busy);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
num_threads = max_num_threads;
- remaining = gomp_remaining_threads_count;
- if (num_threads > remaining)
- num_threads = remaining + 1;
- gomp_remaining_threads_count -= num_threads - 1;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ busy = pool->threads_busy;
+ if (icv->thread_limit_var - busy + 1 < num_threads)
+ num_threads = icv->thread_limit_var - busy + 1;
+ pool->threads_busy += num_threads - 1;
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
return num_threads;
@@ -111,23 +128,34 @@ GOMP_parallel_start (void (*fn) (void *)
void
GOMP_parallel_end (void)
{
- if (__builtin_expect (gomp_thread_limit_var != ULONG_MAX, 0))
+ struct gomp_task_icv *icv = gomp_icv (false);
+ if (__builtin_expect (icv->thread_limit_var != UINT_MAX, 0))
{
struct gomp_thread *thr = gomp_thread ();
struct gomp_team *team = thr->ts.team;
- if (team && team->nthreads > 1)
+ unsigned int nthreads = team ? team->nthreads : 1;
+ gomp_team_end ();
+ if (nthreads > 1)
{
+ /* If not nested, there is just one thread in the
+ contention group left, no need for atomicity. */
+ if (thr->ts.team == NULL)
+ thr->thread_pool->threads_busy = 1;
+ else
+ {
#ifdef HAVE_SYNC_BUILTINS
- __sync_fetch_and_add (&gomp_remaining_threads_count,
- 1UL - team->nthreads);
+ __sync_fetch_and_add (&thr->thread_pool->threads_busy,
+ 1UL - nthreads);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
- gomp_remaining_threads_count -= team->nthreads - 1;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
+ thr->thread_pool->threads_busy -= nthreads - 1;
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
+ }
}
}
- gomp_team_end ();
+ else
+ gomp_team_end ();
}
ialias (GOMP_parallel_end)
--- libgomp/team.c.jj 2013-10-03 16:27:34.000000000 +0200
+++ libgomp/team.c 2013-10-04 18:44:31.100494492 +0200
@@ -128,6 +128,8 @@ gomp_thread_start (void *xdata)
}
gomp_sem_destroy (&thr->release);
+ thr->thread_pool = NULL;
+ thr->task = NULL;
return NULL;
}
@@ -204,16 +206,19 @@ static struct gomp_thread_pool *gomp_new
static void
gomp_free_pool_helper (void *thread_pool)
{
+ struct gomp_thread *thr = gomp_thread ();
struct gomp_thread_pool *pool
= (struct gomp_thread_pool *) thread_pool;
gomp_barrier_wait_last (&pool->threads_dock);
- gomp_sem_destroy (&gomp_thread ()->release);
+ gomp_sem_destroy (&thr->release);
+ thr->thread_pool = NULL;
+ thr->task = NULL;
pthread_exit (NULL);
}
/* Free a thread pool and release its threads. */
-static void
+void
gomp_free_thread (void *arg __attribute__((unused)))
{
struct gomp_thread *thr = gomp_thread ();
@@ -241,9 +246,9 @@ gomp_free_thread (void *arg __attribute_
__sync_fetch_and_add (&gomp_managed_threads,
1L - pool->threads_used);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
gomp_managed_threads -= pool->threads_used - 1L;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
}
free (pool->threads);
@@ -285,6 +290,7 @@ gomp_team_start (void (*fn) (void *), vo
if (__builtin_expect (thr->thread_pool == NULL, 0))
{
thr->thread_pool = gomp_new_thread_pool ();
+ thr->thread_pool->threads_busy = nthreads;
pthread_setspecific (gomp_thread_destructor, thr);
}
pool = thr->thread_pool;
@@ -678,9 +684,9 @@ gomp_team_start (void (*fn) (void *), vo
#ifdef HAVE_SYNC_BUILTINS
__sync_fetch_and_add (&gomp_managed_threads, diff);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
gomp_managed_threads += diff;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
}
@@ -822,9 +828,9 @@ gomp_team_start (void (*fn) (void *), vo
#ifdef HAVE_SYNC_BUILTINS
__sync_fetch_and_add (&gomp_managed_threads, diff);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
gomp_managed_threads += diff;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
}
if (__builtin_expect (affinity_thr != NULL, 0)
@@ -871,9 +877,9 @@ gomp_team_end (void)
#ifdef HAVE_SYNC_BUILTINS
__sync_fetch_and_add (&gomp_managed_threads, 1L - team->nthreads);
#else
- gomp_mutex_lock (&gomp_remaining_threads_lock);
+ gomp_mutex_lock (&gomp_managed_threads_lock);
gomp_managed_threads -= team->nthreads - 1L;
- gomp_mutex_unlock (&gomp_remaining_threads_lock);
+ gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
/* This barrier has gomp_barrier_wait_last counterparts
and ensures the team can be safely destroyed. */
@@ -914,8 +920,6 @@ gomp_team_end (void)
static void __attribute__((constructor))
initialize_team (void)
{
- struct gomp_thread *thr;
-
#ifndef HAVE_TLS
static struct gomp_thread initial_thread_tls_data;
@@ -925,13 +929,6 @@ initialize_team (void)
if (pthread_key_create (&gomp_thread_destructor, gomp_free_thread) != 0)
gomp_fatal ("could not create thread pool destructor.");
-
-#ifdef HAVE_TLS
- thr = &gomp_tls_data;
-#else
- thr = &initial_thread_tls_data;
-#endif
- gomp_sem_init (&thr->release, 0);
}
static void __attribute__((destructor))
--- libgomp/env.c.jj 2013-10-04 10:28:53.000000000 +0200
+++ libgomp/env.c 2013-10-04 14:54:47.843337279 +0200
@@ -54,6 +54,7 @@
struct gomp_task_icv gomp_global_icv = {
.nthreads_var = 1,
+ .thread_limit_var = UINT_MAX,
.run_sched_var = GFS_DYNAMIC,
.run_sched_modifier = 1,
.default_device_var = 0,
@@ -64,11 +65,9 @@ struct gomp_task_icv gomp_global_icv = {
};
unsigned long gomp_max_active_levels_var = INT_MAX;
-unsigned long gomp_thread_limit_var = ULONG_MAX;
bool gomp_cancel_var = false;
-unsigned long gomp_remaining_threads_count;
#ifndef HAVE_SYNC_BUILTINS
-gomp_mutex_t gomp_remaining_threads_lock;
+gomp_mutex_t gomp_managed_threads_lock;
#endif
unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1;
unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
@@ -1126,8 +1125,8 @@ handle_omp_display_env (unsigned long st
/* GOMP's default value is actually neither active nor passive. */
fprintf (stderr, " OMP_WAIT_POLICY = '%s'\n",
wait_policy > 0 ? "ACTIVE" : "PASSIVE");
- fprintf (stderr, " OMP_THREAD_LIMIT = '%lu'\n",
- gomp_thread_limit_var);
+ fprintf (stderr, " OMP_THREAD_LIMIT = '%u'\n",
+ gomp_global_icv.thread_limit_var);
fprintf (stderr, " OMP_MAX_ACTIVE_LEVELS = '%lu'\n",
gomp_max_active_levels_var);
@@ -1156,7 +1155,7 @@ handle_omp_display_env (unsigned long st
static void __attribute__((constructor))
initialize_env (void)
{
- unsigned long stacksize;
+ unsigned long thread_limit_var, stacksize;
int wait_policy;
/* Do a compile time check that mkomp_h.pl did good job. */
@@ -1169,11 +1168,13 @@ initialize_env (void)
parse_int ("OMP_DEFAULT_DEVICE", &gomp_global_icv.default_device_var, true);
parse_unsigned_long ("OMP_MAX_ACTIVE_LEVELS", &gomp_max_active_levels_var,
true);
- parse_unsigned_long ("OMP_THREAD_LIMIT", &gomp_thread_limit_var, false);
- if (gomp_thread_limit_var != ULONG_MAX)
- gomp_remaining_threads_count = gomp_thread_limit_var - 1;
+ if (parse_unsigned_long ("OMP_THREAD_LIMIT", &thread_limit_var, false))
+ {
+ gomp_global_icv.thread_limit_var
+ = thread_limit_var > INT_MAX ? UINT_MAX : thread_limit_var;
+ }
#ifndef HAVE_SYNC_BUILTINS
- gomp_mutex_init (&gomp_remaining_threads_lock);
+ gomp_mutex_init (&gomp_managed_threads_lock);
#endif
gomp_init_num_threads ();
gomp_available_cpus = gomp_global_icv.nthreads_var;
@@ -1325,7 +1326,8 @@ omp_get_max_threads (void)
int
omp_get_thread_limit (void)
{
- return gomp_thread_limit_var > INT_MAX ? INT_MAX : gomp_thread_limit_var;
+ struct gomp_task_icv *icv = gomp_icv (false);
+ return icv->thread_limit_var > INT_MAX ? INT_MAX : icv->thread_limit_var;
}
void
--- libgomp/target.c.jj 2013-09-23 16:07:25.000000000 +0200
+++ libgomp/target.c 2013-10-04 21:20:22.013998048 +0200
@@ -26,6 +26,7 @@
creation and termination. */
#include "libgomp.h"
+#include <limits.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
@@ -144,8 +145,9 @@ resolve_device (int device_id)
struct gomp_task_icv *icv = gomp_icv (false);
device_id = icv->default_device_var;
}
- if (device_id >= gomp_get_num_devices ()
- && device_id != 257)
+ if (device_id < 0
+ || (device_id >= gomp_get_num_devices ()
+ && device_id != 257))
return NULL;
/* FIXME: Temporary hack for testing non-shared address spaces on host. */
@@ -239,11 +241,18 @@ gomp_map_vars (struct gomp_device_descr
tgt->tgt_start = (tgt->tgt_start + tgt_align - 1) & ~(tgt_align - 1);
tgt->tgt_end = tgt->tgt_start + tgt_size;
}
+ else
+ {
+ tgt->to_free = NULL;
+ tgt->tgt_start = 0;
+ tgt->tgt_end = 0;
+ }
tgt_size = 0;
if (is_target)
tgt_size = mapnum * sizeof (void *);
+ tgt->array = NULL;
if (not_found_cnt)
{
tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -273,6 +282,7 @@ gomp_map_vars (struct gomp_device_descr
k->tgt = tgt;
k->tgt_offset = tgt_size;
tgt_size += k->host_end - k->host_start;
+ k->copy_from = false;
if ((kinds[i] & 7) == 2 || (kinds[i] & 7) == 3)
k->copy_from = true;
k->refcount = 1;
@@ -475,13 +485,33 @@ GOMP_target (int device, void (*fn) (voi
if (devicep == NULL)
{
/* Host fallback. */
+ struct gomp_thread old_thr, *thr = gomp_thread ();
+ old_thr = *thr;
+ memset (thr, '\0', sizeof (*thr));
+ if (gomp_places_list)
+ {
+ thr->place = old_thr.place;
+ thr->ts.place_partition_len = gomp_places_list_len;
+ }
fn (hostaddrs);
+ gomp_free_thread (thr);
+ *thr = old_thr;
return;
}
struct target_mem_desc *tgt
= gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true);
+ struct gomp_thread old_thr, *thr = gomp_thread ();
+ old_thr = *thr;
+ memset (thr, '\0', sizeof (*thr));
+ if (gomp_places_list)
+ {
+ thr->place = old_thr.place;
+ thr->ts.place_partition_len = gomp_places_list_len;
+ }
fn ((void *) tgt->tgt_start);
+ gomp_free_thread (thr);
+ *thr = old_thr;
gomp_unmap_vars (tgt);
}
@@ -541,6 +571,13 @@ GOMP_target_update (int device, const vo
void
GOMP_teams (unsigned int num_teams, unsigned int thread_limit)
{
+ if (thread_limit)
+ {
+ struct gomp_task_icv *icv = gomp_icv (true);
+ icv->thread_limit_var
+ = thread_limit > INT_MAX ? UINT_MAX : thread_limit;
+ }
+ (void) num_teams;
}
#ifdef PLUGIN_SUPPORT
--- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2013-10-04 09:51:49.000000000 +0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c 2013-10-04 15:58:13.639326486 +0200
@@ -244,6 +244,7 @@ main ()
print_affinity (places_array[test_places].places[0]);
printf ("\n");
omp_set_nested (1);
+ omp_set_dynamic (0);
#pragma omp parallel if (0)
{
--- libgomp/testsuite/libgomp.c/target-3.c.jj 2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-3.c 2013-10-04 17:14:07.560142600 +0200
@@ -0,0 +1,17 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+ if (omp_get_level ())
+ abort ();
+ #pragma omp target if (0)
+ if (omp_get_level ())
+ abort ();
+ #pragma omp target if (0)
+ #pragma omp teams
+ if (omp_get_level ())
+ abort ();
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/target-4.c.jj 2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-4.c 2013-10-04 15:57:32.422553332 +0200
@@ -0,0 +1,14 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+ omp_set_dynamic (0);
+ #pragma omp parallel num_threads (4)
+ #pragma omp target if (0)
+ #pragma omp single
+ if (omp_get_num_threads () != 1)
+ abort ();
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/target-5.c.jj 2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-5.c 2013-10-04 17:05:23.632931518 +0200
@@ -0,0 +1,83 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+ int d_o = omp_get_dynamic ();
+ int n_o = omp_get_nested ();
+ omp_sched_t s_o;
+ int c_o;
+ omp_get_schedule (&s_o, &c_o);
+ int m_o = omp_get_max_threads ();
+ omp_set_dynamic (1);
+ omp_set_nested (1);
+ omp_set_schedule (omp_sched_static, 2);
+ omp_set_num_threads (4);
+ int d = omp_get_dynamic ();
+ int n = omp_get_nested ();
+ omp_sched_t s;
+ int c;
+ omp_get_schedule (&s, &c);
+ int m = omp_get_max_threads ();
+ if (!omp_is_initial_device ())
+ abort ();
+ #pragma omp target if (0)
+ {
+ omp_sched_t s_c;
+ int c_c;
+ omp_get_schedule (&s_c, &c_c);
+ if (d_o != omp_get_dynamic ()
+ || n_o != omp_get_nested ()
+ || s_o != s_c
+ || c_o != c_c
+ || m_o != omp_get_max_threads ())
+ abort ();
+ omp_set_dynamic (0);
+ omp_set_nested (0);
+ omp_set_schedule (omp_sched_dynamic, 4);
+ omp_set_num_threads (2);
+ if (!omp_is_initial_device ())
+ abort ();
+ }
+ if (!omp_is_initial_device ())
+ abort ();
+ omp_sched_t s_c;
+ int c_c;
+ omp_get_schedule (&s_c, &c_c);
+ if (d != omp_get_dynamic ()
+ || n != omp_get_nested ()
+ || s != s_c
+ || c != c_c
+ || m != omp_get_max_threads ())
+ abort ();
+ #pragma omp target if (0)
+ #pragma omp teams
+ {
+ omp_sched_t s_c;
+ int c_c;
+ omp_get_schedule (&s_c, &c_c);
+ if (d_o != omp_get_dynamic ()
+ || n_o != omp_get_nested ()
+ || s_o != s_c
+ || c_o != c_c
+ || m_o != omp_get_max_threads ())
+ abort ();
+ omp_set_dynamic (0);
+ omp_set_nested (0);
+ omp_set_schedule (omp_sched_dynamic, 4);
+ omp_set_num_threads (2);
+ if (!omp_is_initial_device ())
+ abort ();
+ }
+ if (!omp_is_initial_device ())
+ abort ();
+ omp_get_schedule (&s_c, &c_c);
+ if (d != omp_get_dynamic ()
+ || n != omp_get_nested ()
+ || s != s_c
+ || c != c_c
+ || m != omp_get_max_threads ())
+ abort ();
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/target-6.c.jj 2013-10-04 15:55:43.360132627 +0200
+++ libgomp/testsuite/libgomp.c/target-6.c 2013-10-04 18:12:54.097493969 +0200
@@ -0,0 +1,68 @@
+#include <omp.h>
+#include <stdlib.h>
+
+int
+main ()
+{
+ omp_set_dynamic (0);
+ omp_set_nested (1);
+ if (omp_in_parallel ())
+ abort ();
+ #pragma omp parallel num_threads (3)
+ if (omp_get_thread_num () == 2)
+ {
+ if (!omp_in_parallel ())
+ abort ();
+ #pragma omp parallel num_threads (3)
+ if (omp_get_thread_num () == 1)
+ {
+ if (!omp_in_parallel ()
+ || omp_get_level () != 2
+ || omp_get_ancestor_thread_num (0) != 0
+ || omp_get_ancestor_thread_num (1) != 2
+ || omp_get_ancestor_thread_num (2) != 1
+ || omp_get_ancestor_thread_num (3) != -1)
+ abort ();
+ #pragma omp target if (0)
+ {
+ if (omp_in_parallel ()
+ || omp_get_level () != 0
+ || omp_get_ancestor_thread_num (0) != 0
+ || omp_get_ancestor_thread_num (1) != -1)
+ abort ();
+ #pragma omp parallel num_threads (2)
+ {
+ if (!omp_in_parallel ()
+ || omp_get_level () != 1
+ || omp_get_ancestor_thread_num (0) != 0
+ || omp_get_ancestor_thread_num (1)
+ != omp_get_thread_num ()
+ || omp_get_ancestor_thread_num (2) != -1)
+ abort ();
+ }
+ }
+ #pragma omp target if (0)
+ {
+ #pragma omp teams thread_limit (2)
+ {
+ if (omp_in_parallel ()
+ || omp_get_level () != 0
+ || omp_get_ancestor_thread_num (0) != 0
+ || omp_get_ancestor_thread_num (1) != -1)
+ abort ();
+ #pragma omp parallel num_threads (2)
+ {
+ if (!omp_in_parallel ()
+ || omp_get_level () != 1
+ || omp_get_ancestor_thread_num (0) != 0
+ || omp_get_ancestor_thread_num (1)
+ != omp_get_thread_num ()
+ || omp_get_ancestor_thread_num (2) != -1)
+ abort ();
+ }
+ }
+ }
+ }
+ }
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/target-7.c.jj 2013-10-04 19:05:13.452909674 +0200
+++ libgomp/testsuite/libgomp.c/target-7.c 2013-10-04 21:47:21.814387228 +0200
@@ -0,0 +1,111 @@
+#include <omp.h>
+#include <stdlib.h>
+
+volatile int v;
+
+void
+foo (int f)
+{
+ int d = f ? omp_get_num_devices () : omp_get_default_device ();
+ int h = 5;
+ #pragma omp target device (d)
+ if (omp_get_level () != 0)
+ abort ();
+ #pragma omp target if (v > 1)
+ if (omp_get_level () != 0 || !omp_is_initial_device ())
+ abort ();
+ #pragma omp target device (d) if (v > 1)
+ if (omp_get_level () != 0 || !omp_is_initial_device ())
+ abort ();
+ #pragma omp target if (v <= 1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+ abort ();
+ #pragma omp target device (d) if (v <= 1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+ abort ();
+ #pragma omp target if (0)
+ if (omp_get_level () != 0 || !omp_is_initial_device ())
+ abort ();
+ #pragma omp target device (d) if (0)
+ if (omp_get_level () != 0 || !omp_is_initial_device ())
+ abort ();
+ #pragma omp target if (1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+ abort ();
+ #pragma omp target device (d) if (1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()))
+ abort ();
+ #pragma omp target data device (d) map (to: h)
+ {
+ #pragma omp target device (d)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 5)
+ abort ();
+ #pragma omp target update device (d) from (h)
+ }
+ #pragma omp target data if (v > 1) map (to: h)
+ {
+ #pragma omp target if (v > 1)
+ if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 6)
+ abort ();
+ #pragma omp target update if (v > 1) from (h)
+ }
+ #pragma omp target data device (d) if (v > 1) map (to: h)
+ {
+ #pragma omp target device (d) if (v > 1)
+ if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 7)
+ abort ();
+ #pragma omp target update device (d) if (v > 1) from (h)
+ }
+ #pragma omp target data if (v <= 1) map (to: h)
+ {
+ #pragma omp target if (v <= 1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 8)
+ abort ();
+ #pragma omp target update if (v <= 1) from (h)
+ }
+ #pragma omp target data device (d) if (v <= 1) map (to: h)
+ {
+ #pragma omp target device (d) if (v <= 1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 9)
+ abort ();
+ #pragma omp target update device (d) if (v <= 1) from (h)
+ }
+ #pragma omp target data if (0) map (to: h)
+ {
+ #pragma omp target if (0)
+ if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 10)
+ abort ();
+ #pragma omp target update if (0) from (h)
+ }
+ #pragma omp target data device (d) if (0) map (to: h)
+ {
+ #pragma omp target device (d) if (0)
+ if (omp_get_level () != 0 || !omp_is_initial_device () || h++ != 11)
+ abort ();
+ #pragma omp target update device (d) if (0) from (h)
+ }
+ #pragma omp target data if (1) map (to: h)
+ {
+ #pragma omp target if (1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 12)
+ abort ();
+ #pragma omp target update if (1) from (h)
+ }
+ #pragma omp target data device (d) if (1) map (to: h)
+ {
+ #pragma omp target device (d) if (1)
+ if (omp_get_level () != 0 || (f && !omp_is_initial_device ()) || h++ != 13)
+ abort ();
+ #pragma omp target update device (d) if (1) from (h)
+ }
+ if (h != 14)
+ abort ();
+}
+
+int
+main ()
+{
+ foo (0);
+ foo (1);
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-1.c.jj 2013-10-04 15:36:59.363996164 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-1.c 2013-10-04 15:41:48.479547631 +0200
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_THREAD_LIMIT "6" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+
+int
+main ()
+{
+ if (omp_get_thread_limit () != 6)
+ return 0;
+ omp_set_dynamic (0);
+ omp_set_nested (1);
+ #pragma omp parallel num_threads (3)
+ if (omp_get_num_threads () != 3)
+ abort ();
+ #pragma omp parallel num_threads (3)
+ if (omp_get_num_threads () != 3)
+ abort ();
+ #pragma omp parallel num_threads (8)
+ if (omp_get_num_threads () > 6)
+ abort ();
+ #pragma omp parallel num_threads (6)
+ if (omp_get_num_threads () != 6)
+ abort ();
+ int cnt = 0;
+ #pragma omp parallel num_threads (5)
+ #pragma omp parallel num_threads (5)
+ #pragma omp parallel num_threads (2)
+ #pragma omp atomic
+ cnt++;
+ if (cnt > 6)
+ abort ();
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-2.c.jj 2013-10-04 15:42:52.229212860 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-2.c 2013-10-04 15:48:28.230486490 +0200
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_THREAD_LIMIT "9" } */
+
+#include <stdlib.h>
+#include <unistd.h>
+
+int
+main ()
+{
+ if (omp_get_thread_limit () != 9)
+ return 0;
+ omp_set_dynamic (0);
+ #pragma omp parallel num_threads (8)
+ if (omp_get_num_threads () != 8)
+ abort ();
+ #pragma omp parallel num_threads (16)
+ if (omp_get_num_threads () > 9)
+ abort ();
+ #pragma omp target if (0)
+ #pragma omp teams thread_limit (6)
+ {
+ if (omp_get_thread_limit () > 6)
+ abort ();
+ if (omp_get_thread_limit () == 6)
+ {
+ omp_set_dynamic (0);
+ omp_set_nested (1);
+ #pragma omp parallel num_threads (3)
+ if (omp_get_num_threads () != 3)
+ abort ();
+ #pragma omp parallel num_threads (3)
+ if (omp_get_num_threads () != 3)
+ abort ();
+ #pragma omp parallel num_threads (8)
+ if (omp_get_num_threads () > 6)
+ abort ();
+ #pragma omp parallel num_threads (6)
+ if (omp_get_num_threads () != 6)
+ abort ();
+ int cnt = 0;
+ #pragma omp parallel num_threads (5)
+ #pragma omp parallel num_threads (5)
+ #pragma omp parallel num_threads (2)
+ #pragma omp atomic
+ cnt++;
+ if (cnt > 6)
+ abort ();
+ }
+ }
+ return 0;
+}
--- libgomp/testsuite/libgomp.c/thread-limit-3.c.jj 2013-10-04 15:54:02.289657212 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-3.c 2013-10-04 15:55:04.677326917 +0200
@@ -0,0 +1,12 @@
+#include <stdlib.h>
+#include <omp.h>
+
+int
+main ()
+{
+ #pragma omp target if (0)
+ #pragma omp teams thread_limit (1)
+ if (omp_get_thread_limit () != 1)
+ abort ();
+ return 0;
+}
Jakub
More information about the Gcc-patches
mailing list