This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gomp-nvptx branch - middle-end changes


On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote:
> gcc/
> 	* internal-fn.c (expand_GOMP_SIMT_LANE): New.
> 	(expand_GOMP_SIMT_VF): New.
> 	(expand_GOMP_SIMT_LAST_LANE): New.
> 	(expand_GOMP_SIMT_ORDERED_PRED): New.
> 	(expand_GOMP_SIMT_VOTE_ANY): New.
> 	(expand_GOMP_SIMT_XCHG_BFLY): New.
> 	(expand_GOMP_SIMT_XCHG_IDX): New.
> 	* internal-fn.def (GOMP_SIMT_LANE): New.
> 	(GOMP_SIMT_VF): New.
> 	(GOMP_SIMT_LAST_LANE): New.
> 	(GOMP_SIMT_ORDERED_PRED): New.
> 	(GOMP_SIMT_VOTE_ANY): New.
> 	(GOMP_SIMT_XCHG_BFLY): New.
> 	(GOMP_SIMT_XCHG_IDX): New.
> 	* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
> 	(create_omp_child_function): ...here.  Set "omp target entrypoint"
> 	or "omp declare target" attribute based on is_gimple_omp_offloaded.
> 	(omp_max_simt_vf): New.  Use it...
> 	(omp_max_vf): ...here.
> 	(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
> 	(lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
> 	(lower_omp_ordered): Likewise, for "ordered" lowering.
> 	(expand_omp_simd): Add SIMT transforms.
> 	(pass_data_lower_omp): Add PROP_gimple_lomp_dev.
> 	(execute_omp_device_lower): New.
> 	(pass_data_omp_device_lower): New.
> 	(pass_omp_device_lower): New pass.
> 	(make_pass_omp_device_lower): New.
> 	* passes.def (pass_omp_device_lower): Position new pass.
> 	* tree-pass.h (PROP_gimple_lomp_dev): Define.
> 	(make_pass_omp_device_lower): Declare.

Ok for trunk, once the needed corresponding config/nvptx bits are committed,
with one nit below that needs immediate action and the rest can be resolved
incrementally.  I'd like to check in afterwards the attached patch, at least
for now, so that non-offloaded SIMD code is less affected.  Once you have
the intended outlining of SIMT regions for PTX offloading done (IMHO the
best place to do that is in omp expansion, not gimplification), you can
either base it on that, or revert and do earlier.

> +
> +/* Return maximum SIMT width if offloading may target SIMT hardware.  */
> +
> +static int
> +omp_max_simt_vf (void)
> +{
> +  if (!optimize)
> +    return 0;
> +  if (ENABLE_OFFLOADING)
> +    for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; )
> +      {
> +	if (!strncmp (c, "nvptx", strlen ("nvptx")))
> +	  return 32;
> +	else if ((c = strchr (c, ',')))
> +	  c++;
> +      }
> +  return 0;
> +}

As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES
in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B ./
etc. behavior.  I think it would be better to change the driver so that
it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but
-foffload option is used to disable all offloading and then in this function
use the configured in offloading targets if ENABLE_OFFLOADING and
OFFLOAD_TARGET_NAMES is not in the environment.  Can be done incrementally.

> +
>  /* Return maximum possible vectorization factor for the target.  */
>  
>  static int
> @@ -4277,16 +4306,18 @@ omp_max_vf (void)
>                || global_options_set.x_flag_tree_vectorize)))
>      return 1;
>  
> +  int vf = 1;
>    int vs = targetm.vectorize.autovectorize_vector_sizes ();
>    if (vs)
> +    vf = 1 << floor_log2 (vs);
> +  else
>      {
> -      vs = 1 << floor_log2 (vs);
> -      return vs;
> +      machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> +      if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> +	vf = GET_MODE_NUNITS (vqimode);
>      }
> -  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> -  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> -    return GET_MODE_NUNITS (vqimode);
> -  return 1;
> +  int svf = omp_max_simt_vf ();
> +  return MAX (vf, svf);

Increasing the vf even for host in non-offloaded regions is undesirable.
Can be partly solved by the attached patch I'm planning to apply
incrementally, the other part is for the simd modifier of schedule clause,
there I think what we want is use conditional expression (GOMP_USE_SIMT () ?
omp_max_simt_vf () : omp_max_vf).  I'll try to handle the schedule clause
later.

> +class pass_omp_device_lower : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_device_lower (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_omp_device_lower, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +    {
> +      /* FIXME: inlining does not propagate the lomp_dev property.  */
> +      return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev);

Please change this into
(ENABLE_OFFLOADING && (flag_openmp || in_lto))
for now, so that we don't waste compile time even when clearly it
isn't needed, and incrementally change the inliner to propagate
the property.

	Jakub

Attachment: gcc7-gomp-use-simt.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]