[PATCH] [gomp4] Initial support of OpenACC loop directive in C front-end.

Thomas Schwinge thomas@codesourcery.com
Thu Jul 24 16:41:00 GMT 2014


Hi!

On Thu, 20 Mar 2014 15:42:48 +0100, I wrote:
> On Tue, 18 Mar 2014 14:50:44 +0100, I wrote:
> > On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov <i.usmanov@samsung.com> wrote:
> > > This patch introduces support of OpenACC loop directive (and combined 
> > > directives) in C front-end up to GENERIC. Currently no clause is allowed.
> > 
> > Thanks!  I had worked on a simpler patch, not yet dealing with combined
> > clauses.  Also, I have some work for the GIMPLE level, namely building on
> > GIMPLE_OMP_FOR, adding a new GF_OMP_FOR_KIND_OACC_LOOP.  I'll post this
> > soon.
> 
> Here are the patches, committed in r208702..4 to gomp-4_0-branch.

> commit f1d39706db8dccbc988e2c66552511cd54632257
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Thu Mar 20 14:40:01 2014 +0000
> 
>     Continue implementation of OpenACC loop construct.

For loop scheduling, this is currently using
expand_omp_for_static_nochunk.  For a loop iterating through [0; 100) on
32 threads, this gives us the following schedule:

    0       0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
    32      9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19
    64      20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30
    96      30 31 31 31

..., that is, several consecutive loop iterations are executed on the
same thread.  This isn't ideal for GPUs, where for a number of "threads"
that are executing in parallel, we'd like all these to execute one
"bucket" of consecutive loop iterations, and then the whole set of them
moves to the next "bucket", so we'd like a schedule as follows:

    0       0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    32      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    64      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    96      0 1 2 3

Here, "buckets" of 32 iterations are being executed by 32 threads, then
the next 32 iterations, and so on.  (This is actually one of the OpenACC
parallelism concepts, vector parallelism, mapped to the "warp size" of a
Nvidia GPU.)

In r213006, I committed the following hack to use
expand_omp_for_static_chunk instead of expand_omp_for_static_nochunk, by
specifying a chunk_size of one to implement the desired scheduling.

commit 9a545f89fbb1b361286005ceb68e154d0afc84bd
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jul 24 15:55:49 2014 +0000

    Force OpenACC loop to use a chunk size of one.
    
    	gcc/
    	* omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
    	chunk size of one.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213006 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  3 +++
 gcc/omp-low.c      | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index f8a9d74..cc9b06c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-07-24  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
+	chunk size of one.
+
 	* omp-low.c (expand_omp_for_static_chunk): Merge changes
 	previously applied to expand_omp_for_static_nochunk.
 
diff --git gcc/omp-low.c gcc/omp-low.c
index 2799638..b188e2d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -619,6 +619,16 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
       fd->loop.step = build_int_cst (TREE_TYPE (fd->loop.v), 1);
       fd->loop.cond_code = LT_EXPR;
     }
+
+  //TODO
+  /* For OpenACC loops, force a chunk size of one, as this avoids the default
+    scheduling where several subsequent iterations are being executed by the
+    same thread.  */
+  if (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
+    {
+      gcc_assert (fd->chunk_size == NULL_TREE);
+      fd->chunk_size = build_int_cst (TREE_TYPE (fd->loop.v), 1);
+    }
 }
 
 

In r213005, I committed changes to expand_omp_for_static_chunk that are
just what has previously been applied to expand_omp_for_static_nochunk.
(Internally, we have builtins to query the real nthreads and threadid,
insteead of the dummy one, zero values that I'm using here.)

commit 6c07d1bd13f6ceef80beb3c62cd25c3aaa397f1b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jul 24 15:55:39 2014 +0000

    Make expand_omp_for_static_chunk usable for OpenACC.
    
    	gcc/
    	* omp-low.c (expand_omp_for_static_chunk): Merge changes
    	previously applied to expand_omp_for_static_nochunk.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213005 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  5 +++++
 gcc/omp-low.c      | 19 +++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index adfae10..f8a9d74 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-07-24  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-low.c (expand_omp_for_static_chunk): Merge changes
+	previously applied to expand_omp_for_static_nochunk.
+
 2014-07-14  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* omp-low.c (extract_omp_for_data): Likewise.
diff --git gcc/omp-low.c gcc/omp-low.c
index 6345e14..2799638 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -7040,8 +7040,6 @@ static void
 expand_omp_for_static_chunk (struct omp_region *region,
 			     struct omp_for_data *fd, gimple inner_stmt)
 {
-  gcc_assert (gimple_omp_for_kind (fd->for_stmt) != GF_OMP_FOR_KIND_OACC_LOOP);
-
   tree n, s0, e0, e, t;
   tree trip_var, trip_init, trip_main, trip_back, nthreads, threadid;
   tree type, itype, vmain, vback, vextra;
@@ -7054,6 +7052,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
   tree *counts = NULL;
   tree n1, n2, step;
 
+  gcc_assert ((gimple_omp_for_kind (fd->for_stmt)
+	       != GF_OMP_FOR_KIND_OACC_LOOP)
+	      || !inner_stmt);
+
   itype = type = TREE_TYPE (fd->loop.v);
   if (POINTER_TYPE_P (type))
     itype = signed_type_for (type);
@@ -7153,6 +7155,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
       threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM);
       threadid = build_call_expr (threadid, 0);
       break;
+    case GF_OMP_FOR_KIND_OACC_LOOP:
+      nthreads = integer_one_node;
+      threadid = integer_zero_node;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -7168,6 +7174,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
   step = fd->loop.step;
   if (gimple_omp_for_combined_into_p (fd->for_stmt))
     {
+      gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		  != GF_OMP_FOR_KIND_OACC_LOOP);
+
       tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
 				     OMP_CLAUSE__LOOPTEMP_);
       gcc_assert (innerc);
@@ -7351,6 +7360,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
   gsi = gsi_last_bb (exit_bb);
   if (!gimple_omp_return_nowait_p (gsi_stmt (gsi)))
     {
+      gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		  != GF_OMP_FOR_KIND_OACC_LOOP);
+
       t = gimple_omp_return_lhs (gsi_stmt (gsi));
       gsi_insert_after (&gsi, build_omp_barrier (t), GSI_SAME_STMT);
     }
@@ -7365,6 +7377,9 @@ expand_omp_for_static_chunk (struct omp_region *region,
       se = find_edge (cont_bb, body_bb);
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
+	  gcc_assert (gimple_omp_for_kind (fd->for_stmt)
+		      != GF_OMP_FOR_KIND_OACC_LOOP);
+
 	  remove_edge (se);
 	  se = NULL;
 	}


Grüße,
 Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140724/c9247d13/attachment.sig>


More information about the Gcc-patches mailing list