[patch] adjust default nvptx launch geometry for OpenACC offloaded regions

Cesar Philippidis cesar@codesourcery.com
Thu Jul 26 14:27:00 GMT 2018


Hi Tom,

I see that you're reviewing the libgomp changes. Please disregard the
following hunk:

On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
>  			     default_dims[GOMP_DIM_VECTOR]);
>  	}
>        pthread_mutex_unlock (&ptx_dev_lock);
> +      int vectors = default_dims[GOMP_DIM_VECTOR];
> +      int workers = default_dims[GOMP_DIM_WORKER];
> +      int gangs = default_dims[GOMP_DIM_GANG];
> +
> +      if (nvptx_thread()->ptx_dev->driver_version > 6050)
> +	{
> +	  int grids, blocks;
> +	  CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
> +			    &blocks, function, NULL, 0,
> +			    dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
> +	  GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
> +			     "grid = %d, block = %d\n", grids, blocks);
> +
> +	  gangs = grids * dev_size;
> +	  workers = blocks / vectors;
> +	}

I revisited this change yesterday and I noticed it was setting gangs
incorrectly. Basically, gangs should be set as follows

  gangs = grids * (blocks / warp_size);

or to be more closer to og8 as

  gangs = 2 * grids * (blocks / warp_size);

The use of that magic constant 2 is to prevent thread starvation. That's
a similar concept behind make -j<2*#threads>.

Anyway, I'm still experimenting with that change. There are still some
discrepancies between the way that I select num_workers and how the
driver does. The driver appears to be a little bit more conservative,
but according to the thread occupancy calculator, that should yield
greater performance on GPUs.

I just wanted to give you a heads up because you seem to be working on this.

Thanks for all of your reviews!

By the way, are you now maintainer of the libgomp nvptx plugin?

Cesar



More information about the Gcc-patches mailing list