libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Jakub Jelinek
jakub@redhat.com
Mon Mar 30 16:42:00 GMT 2015
On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
> Here is the latest patch for libgomp and mic plugin.
> make check-target-libgomp using intelmic emul passed.
> Also I used a testcase from the attachment.
This applies cleanly.
> Latest ptx part is here, I guess:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html
But the one Julian posted doesn't apply on top of your patch.
If there is any interdiff needed on top of your patch, can it be
posted against trunk + your patch?
> +/* Insert mapping of host -> target address pairs to splay tree. */
> +
> +static void
> +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
> + struct addr_pair *host_addr,
> + struct addr_pair *tgt_addr)
> +{
> + struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
> + tgt->refcount = 1;
> + tgt->array = gomp_malloc (sizeof (*tgt->array));
> + tgt->tgt_start = tgt_addr->start;
> + tgt->tgt_end = tgt_addr->end;
> + tgt->to_free = NULL;
> + tgt->list_count = 0;
> + tgt->device_descr = devicep;
> + splay_tree_node node = tgt->array;
> + splay_tree_key k = &node->key;
> + k->host_start = host_addr->start;
> + k->host_end = host_addr->end;
> + k->tgt_offset = 0;
> + k->refcount = 1;
> + k->copy_from = false;
> + k->tgt = tgt;
> + node->left = NULL;
> + node->right = NULL;
> + splay_tree_insert (&devicep->mem_map, node);
> +}
What is the reason to register and allocate these one at a time, rather than
using one struct target_mem_desc with one tgt->array for all splay tree
nodes registered from one image?
Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
clear it is special) and just use tgt_offset relative to that (i.e.
absolute), but having to malloc each node individually and having to malloc
a target_mem_desc for each one sounds expensive.
Everything is freed just once anyway, isn't it?
> @@ -654,6 +727,18 @@ void
> GOMP_offload_register (void *host_table, enum offload_target_type target_type,
> void *target_data)
> {
> + int i;
> + gomp_mutex_lock (®ister_lock);
> +
> + /* Load image to all initialized devices. */
> + for (i = 0; i < num_devices; i++)
> + {
> + struct gomp_device_descr *devicep = &devices[i];
> + if (devicep->type == target_type && devicep->is_initialized)
> + gomp_offload_image_to_device (devicep, host_table, target_data);
Shouldn't either this function, or gomp_offload_image_to_device lock
also devicep->lock mutex and unlock at the end?
Where exactly I guess depends on if the devicep->* hook calls should be
guarded with the mutex or not. If yes, it should be this function and
gomp_init_device.
> + if (devicep->type != target_type || !devicep->is_initialized)
> + continue;
> +
Similarly.
> + devicep->unload_image_func (devicep->target_id, target_data);
> +
> + /* Remove mapping from splay tree. */
> + for (j = 0; j < num_funcs; j++)
> + {
> + struct splay_tree_key_s k;
> + k.host_start = (uintptr_t) host_func_table[j];
> + k.host_end = k.host_start + 1;
> + splay_tree_remove (&devicep->mem_map, &k);
> + }
> +
> + for (j = 0; j < num_vars; j++)
> + {
> + struct splay_tree_key_s k;
> + k.host_start = (uintptr_t) host_var_table[j*2];
> + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
> + splay_tree_remove (&devicep->mem_map, &k);
> + }
> + }
Aren't you leaking here all the tgt->array and tgt allocations here?
Though, if you change it to just two allocations (one tgt, one array),
you'd need to free just once.
Jakub
More information about the Gcc-patches
mailing list