libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

Jakub Jelinek jakub@redhat.com
Mon Mar 30 16:42:00 GMT 2015


On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
> Here is the latest patch for libgomp and mic plugin.
> make check-target-libgomp using intelmic emul passed.
> Also I used a testcase from the attachment.

This applies cleanly.

> Latest ptx part is here, I guess:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html

But the one Julian posted doesn't apply on top of your patch.
If there is any interdiff needed on top of your patch, can it be
posted against trunk + your patch?

> +/* Insert mapping of host -> target address pairs to splay tree.  */
> +
> +static void
> +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
> +				struct addr_pair *host_addr,
> +				struct addr_pair *tgt_addr)
> +{
> +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
> +  tgt->refcount = 1;
> +  tgt->array = gomp_malloc (sizeof (*tgt->array));
> +  tgt->tgt_start = tgt_addr->start;
> +  tgt->tgt_end = tgt_addr->end;
> +  tgt->to_free = NULL;
> +  tgt->list_count = 0;
> +  tgt->device_descr = devicep;
> +  splay_tree_node node = tgt->array;
> +  splay_tree_key k = &node->key;
> +  k->host_start = host_addr->start;
> +  k->host_end = host_addr->end;
> +  k->tgt_offset = 0;
> +  k->refcount = 1;
> +  k->copy_from = false;
> +  k->tgt = tgt;
> +  node->left = NULL;
> +  node->right = NULL;
> +  splay_tree_insert (&devicep->mem_map, node);
> +}

What is the reason to register and allocate these one at a time, rather than
using one struct target_mem_desc with one tgt->array for all splay tree
nodes registered from one image?
Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
clear it is special) and just use tgt_offset relative to that (i.e.
absolute), but having to malloc each node individually and having to malloc
a target_mem_desc for each one sounds expensive.
Everything is freed just once anyway, isn't it?

> @@ -654,6 +727,18 @@ void
>  GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>  		       void *target_data)
>  {
> +  int i;
> +  gomp_mutex_lock (&register_lock);
> +
> +  /* Load image to all initialized devices.  */
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      if (devicep->type == target_type && devicep->is_initialized)
> +	gomp_offload_image_to_device (devicep, host_table, target_data);

Shouldn't either this function, or gomp_offload_image_to_device lock
also devicep->lock mutex and unlock at the end?
Where exactly I guess depends on if the devicep->* hook calls should be
guarded with the mutex or not.  If yes, it should be this function and
gomp_init_device.

> +      if (devicep->type != target_type || !devicep->is_initialized)
> +	continue;
> +

Similarly.

> +      devicep->unload_image_func (devicep->target_id, target_data);
> +
> +      /* Remove mapping from splay tree.  */
> +      for (j = 0; j < num_funcs; j++)
> +	{
> +	  struct splay_tree_key_s k;
> +	  k.host_start = (uintptr_t) host_func_table[j];
> +	  k.host_end = k.host_start + 1;
> +	  splay_tree_remove (&devicep->mem_map, &k);
> +	}
> +
> +      for (j = 0; j < num_vars; j++)
> +	{
> +	  struct splay_tree_key_s k;
> +	  k.host_start = (uintptr_t) host_var_table[j*2];
> +	  k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
> +	  splay_tree_remove (&devicep->mem_map, &k);
> +	}
> +    }

Aren't you leaking here all the tgt->array and tgt allocations here?
Though, if you change it to just two allocations (one tgt, one array),
you'd need to free just once.

	Jakub



More information about the Gcc-patches mailing list