This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls


Hi!

Cesar, I have not yet fully digested this, but do I understand right that
you're really fixing two issues here, that are related (OpenACC routines)
but still can be addressed independently of each other?  Do I understand
right that the first one, the "problems with acc routines [...]
incorrectly permitting 'acc seq' loops to call gang, worker and vector
routines" is just a Fortran front end patch?  If yes, please split that
one out, so as to reduce the volume of remaining changes that remain to
be discussed.

On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> > On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> >> The second set of changes involves teaching the gimplifier to error when
> >> it detects a function call to an non-acc routines inside an OpenACC
> >> offloaded region.

As I understand, that's the same problem as has been discussed before
(Ilya CCed), and has recently again been filed in
<https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
threads include:
<http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.

> >> Actually, I relaxed non-acc routines by excluding
> >> calls to builtin functions, including those prefixed with _gfortran_.
> >> Nvptx does have a newlib c library, and it also has a subset of
> >> libgfortran. Still, this solution is probably not optimal.
> > 
> > I don't really like that, hardcoding prefixes or whatever is available
> > (you have quite some subset of libc, libm etc. available too) in the
> > compiler looks very hackish.  What is wrong with complaining during
> > linking of the offloaded code?

ACK.  Jakub, do I understand you correctly, that you basically say that
every function declaration that is in scope inside offloaded regions (for
example, GCC builtin functions, or standard library functions declared in
target compiler's header files) is permitted to be called in offloaded
regions, and the offloading compiler will then either be able to resolve
these (nvptx back end knows about trigonometric functions, for example,
and a lot of functions are available in the nvptx libc), or otherwise
error out during the offloading compilation (during linking), gracefully
without terminating the target compilation (that "gracefully" bit is
currently missing -- that's for another day).  That is, all such
functions are implicitly callable as OpenACC "seq" functions (which means
that they don't internally use gang/worker/vector parallelism).  In
particular, all these functions do *not* need to be marked with an
explicit "#pragma acc routine seq" directive.  (Functions internally
using gang/worker/vector parallelism will need to be marked
appropriately, using a "#pragma acc routine gang/worker/vector"
directive.)  That's how I understand your comment above, and your earlier
comments on this topic, and also is what I think should be done.

> Wouldn't the error get reported multiple times then, i.e. once per
> target? Then again, maybe this error could have been restrained to the
> host compiler.

That's not something I would care about right now.  :-)

> Anyway, this patch now reduces that error to a warning. Furthermore,
> that warning is being thrown in lower_omp_1 instead of
> gimplify_call_expr because the latter is called multiple times and that
> causes duplicate warnings. The only bit of fallout I had with this
> change was with the fortran FE's usage of BUILT_IN_EXPECT in
> gfc_{un}likely. Since these are generated implicitly by the FE, I just
> added an oacc_function attribute to those calls when flag_openacc is set.
> 
> >> Next, I had to modify the openacc header files in libgomp to mark
> >> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> >> build the opeancc.mod module for gfortran with -fopenacc. But doing
> >> that, caused caused gcc to stream offloaded code to the openacc.o object
> >> file. So, I've updated the behavior of flag_generate_offload such that
> >> minus one indicates that the user specified -foffload=disable, and that
> >> will prevent gcc from streaming offloaded lto code. The alternative was
> >> to hack libtool to build libgomp with -foffload=disable.
> > 
> > This also looks wrong.  I'd say the right thing is when loading modules
> > that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> > handled this well) into CU with the corresponding flags unset (-fopenacc,
> > -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> > IMHO the module loading code should just ignore it, pretend it wasn't there.
> > Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> > could be in the LTO inputs.

(Also a task for another day, in my opinion.)

> This required two changes. First, I had to teach lto-cgraph.c how to
> report an error rather then fail an assert when partitions are missing
> decls.

Something like that may make sense (conceptually).

> Second, I taught the lto wrapper how to stream offloaded code on
> the absence of -fopen*. The only kink with this approach is that I had
> to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
> bootstrap failures.

Uh.  Hopefully we're not going to need something like that.

> By the way, Thomas, I've added
> 
>  #pragma acc routine(__builtin_acc_on_device) seq
> 
> to openacc.h. Is this OK, or should I just modify the various
> libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
> another option is to have the compiler add that attribute directly. I
> don't think we're really expecting the end user to use
> __builtin_acc_on_device directly since this is a gcc-ism.

As per my reasoning above, all that should not be needed.


A few random comments on the patch:

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -303,6 +303,15 @@ enum save_state
>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>  };
>  
> +/* Flags to keep track of ACC routine states.  */
> +enum oacc_function
> +{ OACC_FUNCTION_NONE = 0,
> +  OACC_FUNCTION_SEQ,
> +  OACC_FUNCTION_GANG,
> +  OACC_FUNCTION_WORKER,
> +  OACC_FUNCTION_VECTOR
> +};

What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
as I can tell?

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>  
>  /* Determine the loop level for a routine.   */
>  
> -static int
> +static oacc_function
>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>  {
>    int level = -1;
> +  oacc_function ret = OACC_FUNCTION_SEQ;
>  
>    if (clauses)
>      {
>        unsigned mask = 0;
>  
>        if (clauses->gang)
> -	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_GANG;
> +	}
>        if (clauses->worker)
> -	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_WORKER;
> +	}
>        if (clauses->vector)
> -	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_VECTOR;
> +	}
>        if (clauses->seq)
>  	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>  

I have not looked in detail, so maybe I'm misunderstanding what is being
done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
together?

> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>    if (level < 0)
>      level = GOMP_DIM_MAX;
>  
> -  return level;
> +  return ret;
>  }

Just from that last hunk, it seems that the assignment to "level" is a
dead store?

> --- a/gcc/fortran/trans-decl.c
> +++ b/gcc/fortran/trans-decl.c
> @@ -1308,30 +1308,34 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
>  }
>  
>  
> -static tree
> -add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +tree
> +add_omp_offloading_attributes (unsigned omp_declare_target,
> +			       enum oacc_function acc_routine, tree list)
>  {
> -  unsigned id;
> -  tree attr;
> -
> -  for (id = 0; id < EXT_ATTR_NUM; id++)
> -    if (sym_attr.ext_attr & (1 << id))
> -      {
> -	attr = build_tree_list (
> -		 get_identifier (ext_attr_list[id].middle_end_name),
> -				 NULL_TREE);
> -	list = chainon (list, attr);
> -      }
> -
> -  if (sym_attr.omp_declare_target)
> +  if (omp_declare_target)
>      list = tree_cons (get_identifier ("omp declare target"),
>  		      NULL_TREE, list);
>  
> -  if (sym_attr.oacc_function)
> +  if (acc_routine)
>      {
>        tree dims = NULL_TREE;
>        int ix;
> -      int level = sym_attr.oacc_function - 1;
> +      int level = GOMP_DIM_MAX;
> +
> +      switch (acc_routine)
> +	{
> +	case OACC_FUNCTION_GANG:
> +	  level = GOMP_DIM_GANG;
> +	  break;
> +	case OACC_FUNCTION_WORKER:
> +	  level = GOMP_DIM_WORKER;
> +	  break;
> +	case OACC_FUNCTION_VECTOR:
> +	  level = GOMP_DIM_VECTOR;
> +	  break;
> +	case OACC_FUNCTION_SEQ:
> +	default:;
> +	}
>  
>        for (ix = GOMP_DIM_MAX; ix--;)
>  	dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
> @@ -1344,6 +1348,27 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>    return list;
>  }
>  
> +static tree
> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +{
> +  unsigned id;
> +  tree attr;
> +
> +  for (id = 0; id < EXT_ATTR_NUM; id++)
> +    if (sym_attr.ext_attr & (1 << id))
> +      {
> +	attr = build_tree_list (
> +		 get_identifier (ext_attr_list[id].middle_end_name),
> +				 NULL_TREE);
> +	list = chainon (list, attr);
> +      }
> +
> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
> +					sym_attr.oacc_function, list);
> +
> +  return list;
> +}

Something that I had noticed before, possibly related here: code in
gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
front ends do.  Is that function what you've re-implemented here?

> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
>  				     LDPR_NUM_KNOWN);
>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>    node->split_part = bp_unpack_value (bp, 1);
> -  gcc_assert (flag_ltrans
> -	      || (!node->in_other_partition
> -		  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> +				&& !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  }
>  
>  /* Return string alias is alias of.  */
> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
>      node->set_section_for_node (section);
>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>  					        LDPR_NUM_KNOWN);
> -  gcc_assert (flag_ltrans
> -	      || (!node->in_other_partition
> -		  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> +				&& !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  
>    return node;
>  }

That looks similar to what I remember from earlier, simiar patches, as
referenced above.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -17114,6 +17114,28 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  	  default:
>  	    break;
>  	  }
> +      /* Warn if a non-'acc routine' function is called from an OpenACC
> +	 offloaded region.  */
> +      if (fndecl)
> +	{
> +	  omp_context *octx = ctx;
> +	  bool is_oacc_offloaded = false;
> +
> +	  /* Check if the current function is an 'acc routine'.  */
> +	  if (get_oacc_fn_attrib (current_function_decl) != NULL_TREE)
> +	    is_oacc_offloaded = true;
> +
> +	  while (!is_oacc_offloaded && octx)
> +	    {
> +	      if (is_oacc_parallel (octx) || is_oacc_kernels (octx))
> +		is_oacc_offloaded = true;
> +	      octx = octx->outer;
> +	    }
> +
> +	  if (is_oacc_offloaded && get_oacc_fn_attrib (fndecl) == NULL_TREE)
> +	    warning_at (gimple_location (call_stmt), 0,
> +			"%qE is not an %<acc routine%>", fndecl);
> +	}
>        /* FALLTHRU */
>      default:
>        if ((ctx || task_shared_vars)

Per my reasoning above, we should either get a undeclared symbol error
(if the target compiler doesn't know about the routine), or should get a
offloading compiler link-time error, if the -- implicit "seq" -- routine
is missing there.

> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
>      {
>        unsigned outermost = this_mask & -this_mask;
>  
> -      if (outermost && outermost <= outer_mask)
> +      if ((outermost && outermost <= outer_mask)
> +	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
>  	{
>  	  if (noisy)
>  	    {

> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
> @@ -49,7 +49,7 @@ main ()
>    int red = 0;
>  #pragma acc parallel copy (red)
>    {
> -    /* Independent/seq loop tests.  */
> +    /* Independent loop tests.  */
>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
>      for (int i = 0; i < 10; i++)
>        red += gang ();
> @@ -62,6 +62,19 @@ main ()
>      for (int i = 0; i < 10; i++)
>        red += vector ();
>  
> +    /* Seq loop tests.  */
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += gang (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += worker (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += vector (); /* { dg-error "incorrectly nested" } */
> +    
>      /* Gang routine tests.  */
>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
>      for (int i = 0; i < 10; i++)

Do these test case changes actually relate to any of the compiler changes
discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
above?  Is that a separate issue to fix?  Eh, or is that actually the fix
for your first issue, the "problems with acc routines [...] incorrectly
permitting 'acc seq' loops to call gang, worker and vector routines"?

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-kernels.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT kernels
>  #include "data-clauses.h"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-parallel.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT parallel
>  #include "data-clauses.h"

Hmm?


GrÃÃe
 Thomas

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]