This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

We have maintained the gupc (GNU Unified Parallel C) branch for
a couple of years now, and would like to merge these changes into
the GCC trunk.

It is our goal to integrate the GUPC changes into the GCC 4.8
trunk, in order to provide a UPC (Unified Parallel C) capability
in the subsequent GCC 4.8 release.

The purpose of this note is to introduce the GUPC project,
provide an overview of the UPC-related changes and to introduce
the subsequent sets of patches which merge the GUPC branch into
GCC 4.8.

For reference,

The GUPC project page is here:

The current GUPC release is distributed here:

Roughly a year ago, we described the front-end related
changes at the time:

We merge the GCC trunk into the gupc branch on approximately
a weekly basis.  The current GUPC branch is based upon a recent
version of the GCC trunk (192449 dated 2012-10-15), and has
been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
IA64/Altix Linux. In earlier versions, GUPC was successfully
ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).

The UPC-related source code differences
can be viewed here in various formats:

In the discussion below, the changes are
excerpted in order to highlight important
aspects of the UPC-related changes.  The version used in
this presentation is 190707.

UPC's Shared Qualifier and Layout Qualifier

The UPC language specification describes
the language syntax and semantics:

UPC introduces a new qualifier, "shared"
that indicates that the qualified object
is located in a global shared address space
that is accessible by all UPC threads.
Additional qualifiers ("strict" and "relaxed")
further specify the semantics of accesses to
UPC shared objects.

In UPC, a shared qualified array can further
specify a "layout qualifier" that indicates
how the shared data is blocked and distributed.

There are two language pre-defined identifiers
that indicate the number of threads that
will be created when the program starts (THREADS)
and the current (zero-based) thread number
(MYTHREAD).  Typically, a UPC thread is implemented
as an operating system process.  Access to UPC
shared memory may be implemented locally via
OS provided facilities (for example, mmap),
or across nodes via a high speed network
inter-connect (for example, Infiniband).

GUPC provides a runtime (libgupc) that targets
an SMP-based system and uses mmap() to implement
global shared memory.  

Optionally, GUPC can use the more general and
more capable Berkeley UPCR runtime:
The UPCR runtime supports a number of network
topologies, and has been ported to most of the
current High Performance Computing (HPC) systems.

The following example illustrates
the use of the UPC "shared" qualifier
combined with a layout qualifier.

    #define BLKSIZE 5
    #define N_PER_THREAD (4 * BLKSIZE)
    shared [BLKSIZE] double A[N_PER_THREAD*THREADS];

Above the "[BLKSIZE]" construct is the UPC
layout factor; this specifies that the shared
array, A, distributes its elements across
each thread in blocks of 5 elements.  If the
program is run with two threads, then A is
distributed as shown below:

    Thread 0	Thread 1
    --------	---------
    A[ 0.. 4]	A[ 5.. 9]
    A[10..14]	A[15..19]
    A[20..24]	A[25..29]
    A[30..34]	A[35..39]

Above, the elements shown for thread 0
are defined as having "affinity" to thread 0.
Similarly, those elements shown for thread 1
have affinity to thread 1.  In UPC, a pointer
to a shared object can be cast to a thread
local pointer (a "C" pointer), when the
designated shared object has affinity
to the referencing thread.

A UPC "pointer-to-shared" (PTS) is a pointer
that references a UPC shared object.
A UPC pointer-to-shared is a "fat" pointer
with the following logical fields:
   (virt_addr, thread, offset)

The virtual address (virt_addr) field is combined with
the thread number (thread) and offset within the
block (offset), to derive the location of the
referenced object within the UPC shared address space.

GUPC implements pointer-to-shared objects using
either a "packed" representation or a "struct"
representation.  The user can select the
pointer-to-shared representation with a "configure"
parameter.  The packed representation is the default.

The "packed" pointer-to-shared representation
limits the range of the various fields within
the pointer-to-shared in order to gain efficiency.
Packed pointer-to-shared values encode the three
part shared address (described above) as a 64-bit
value (on both 64-bit and 32-bit platforms).

The "struct" representation provides a wider
addressing range at the expense of requiring
twice the number of bits (128) needed to encode
the pointer-to-shared value.

UPC-Related Front-End Changes

GCC's internal tree representation is
extended to record the UPC "shared",
"strict", "relaxed" qualifiers,
and the layout qualifier.

Index: gcc/tree.h
--- gcc/tree.h  (.../trunk)     (revision 190707)
+++ gcc/tree.h  (.../branches/gupc)     (revision 190736)
@@ -458,7 +458,10 @@ struct GTY(()) tree_base {
       unsigned packed_flag : 1;
       unsigned user_align : 1;
       unsigned nameless_flag : 1;
-      unsigned spare0 : 4;
+      unsigned upc_shared_flag : 1;
+      unsigned upc_strict_flag : 1;
+      unsigned upc_relaxed_flag : 1;
+      unsigned spare0 : 1;

       unsigned spare1 : 8;

UPC defines a few additional tree node types:

+++ gcc/upc/upc-tree.def	(.../branches/gupc)	(revision 190736)
+/* UPC statements */
+/* Used to represent a `upc_forall' statement. The operands are
+   UPC_FORALL_BODY, and UPC_FORALL_AFFINITY respectively. */
+DEFTREECODE (UPC_FORALL_STMT, "upc_forall_stmt", tcc_statement, 5)
+/* Used to represent a UPC synchronization statement. The first
+   operand is the synchronization operation, UPC_SYNC_OP:
+   UPC_SYNC_NOTIFY_OP	1	Notify operation
+   UPC_SYNC_WAIT_OP	2	Wait operation
+   UPC_SYNC_BARRIER_OP	3	Barrier operation
+   The second operand, UPC_SYNC_ID is the (optional) expression
+   whose value specifies the barrier identifier which is checked
+   by the various synchronization operations. */
+DEFTREECODE (UPC_SYNC_STMT, "upc_sync_stmt", tcc_statement, 2)

The "C" parser is extended to recognize UPC's syntactic

--- gcc/c-family/c-common.c	(.../trunk)	(revision 190707)
+++ gcc/c-family/c-common.c	(.../branches/gupc)	(revision 190736)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  
 #include "ggc.h"
 #include "c-common.h"
 #include "c-objc.h"
+#include "c-upc.h"
 #include "tm_p.h"
 #include "obstack.h"
 #include "cpplib.h"
@@ -193,6 +194,24 @@ const char *pch_file;
    user's namespace.  */
 int flag_iso;
+/* Nonzero whenever UPC -fupc-threads-N is asserted.
+   The value N gives the number of UPC threads to be
+   defined at compile-time. */
+int flag_upc_threads;
+/* Nonzero whenever UPC -fupc-pthreads-model-* is asserted. */
+int flag_upc_pthreads;
+/* The -fupc-pthreads-per-process-N switch tells the UPC compiler
+   and runtime to map N UPC threads per process onto
+   N POSIX threads running inside the process. */
+int flag_upc_pthreads_per_process;
+/* The implementation model for UPC threads that
+   are mapped to POSIX threads, specified at compilation
+   time by the -fupc-pthreads-model-* switch. */
+upc_pthreads_model_kind upc_pthreads_model;
 /* Warn about #pragma directives that are not recognized.  */
 int warn_unknown_pragmas; /* Tri state variable.  */
@@ -389,8 +408,9 @@ static int resort_field_decl_cmp (const 
    C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC
    C --std=c99: D_CXXONLY | D_OBJC
    ObjC is like C except that D_OBJC and D_CXX_OBJC are not set
-   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC
-   C++ --std=c0x: D_CONLY | D_OBJC
+   UPC is like C except that D_UPC is not set
+   C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC | D_UPC
+   C++ --std=c0x: D_CONLY | D_OBJC | D_UPC
    ObjC++ is like C++ except that D_OBJC is not set
    If -fno-asm is used, D_ASM is added to the mask.  If
@@ -583,6 +603,19 @@ const struct c_common_resword c_common_r
   { "inout",		RID_INOUT,		D_OBJC },
   { "oneway",		RID_ONEWAY,		D_OBJC },
   { "out",		RID_OUT,		D_OBJC },
+  /* UPC keywords */
+  { "shared",		RID_SHARED,		D_UPC },
+  { "relaxed",		RID_RELAXED,		D_UPC },
+  { "strict",		RID_STRICT,		D_UPC },
+  { "upc_barrier",	RID_UPC_BARRIER,	D_UPC },
+  { "upc_blocksizeof",	RID_UPC_BLOCKSIZEOF,	D_UPC },
+  { "upc_elemsizeof",	RID_UPC_ELEMSIZEOF,	D_UPC },
+  { "upc_forall",	RID_UPC_FORALL,		D_UPC },
+  { "upc_localsizeof",	RID_UPC_LOCALSIZEOF,	D_UPC },
+  { "upc_notify",	RID_UPC_NOTIFY,		D_UPC },
+  { "upc_wait",		RID_UPC_WAIT,		D_UPC },

--- gcc/c/c-parser.c	(.../trunk)	(revision 190707)
+++ gcc/c/c-parser.c	(.../branches/gupc)	(revision 190736)
@@ -498,6 +504,11 @@ c_token_starts_typename (c_token *token)
 	case RID_ACCUM:
 	case RID_SAT:
 	  return true;
+        /* UPC qualifiers */
+	case RID_SHARED:
+	case RID_STRICT:
+	  return true;
@@ -1224,6 +1245,14 @@ static void c_parser_objc_at_dynamic_dec
 static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
+/* These UPC parser functions are only ever called when
+   compiling UPC.  */
+static void c_parser_upc_forall_statement (c_parser *);
+static void c_parser_upc_sync_statement (c_parser *, int);
+static void c_parser_upc_shared_qual (source_location,
+                                      c_parser *,
+				      struct c_declspecs *);
+        /* UPC qualifiers */
+	case RID_SHARED:
+	  attrs_ok = true;
+          c_parser_upc_shared_qual (loc, parser, specs);
+	  break;
+	case RID_STRICT:
+	  attrs_ok = true;
+	  declspecs_add_qual (loc, specs, c_parser_peek_token (parser)->value);
+	  c_parser_consume_token (parser);
+	  break;
 	  if (!attrs_ok)
 	    goto out;
@@ -4558,6 +4612,22 @@ c_parser_statement_after_labels (c_parse
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_objc_synchronized_statement (parser);
+          gcc_assert (c_dialect_upc ());
+	  c_parser_upc_forall_statement (parser);
+	  break;
+        case RID_UPC_NOTIFY:
+          gcc_assert (c_dialect_upc ());
+	  c_parser_upc_sync_statement (parser, UPC_SYNC_NOTIFY_OP);
+	  goto expect_semicolon;
+        case RID_UPC_WAIT:
+          gcc_assert (c_dialect_upc ());
+	  c_parser_upc_sync_statement (parser, UPC_SYNC_WAIT_OP);
+	  goto expect_semicolon;
+        case RID_UPC_BARRIER:
+          gcc_assert (c_dialect_upc ());
+	  c_parser_upc_sync_statement (parser, UPC_SYNC_BARRIER_OP);
+	  goto expect_semicolon;
 	  goto expr_stmt;

--- gcc/c-family/c-pragma.c	(.../trunk)	(revision 190707)
+++ gcc/c-family/c-pragma.c	(.../branches/gupc)	(revision 190736)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  
 #include "c-pragma.h"
 #include "flags.h"
 #include "c-common.h"
+#include "c-upc.h"
 #include "tm_p.h"		/* For REGISTER_TARGET_PRAGMAS (why is
 				   this not a target hook?).  */
 #include "vec.h"
@@ -507,6 +508,242 @@ add_to_renaming_pragma_list (tree oldnam
 /* The current prefix set by #pragma extern_prefix.  */
 GTY(()) tree pragma_extern_prefix;
+/* variables used to implement #pragma upc semantics */
+static int pragma_upc_permitted;
+static int upc_cmode;
+static int *upc_cmode_stack;
+static int upc_cmode_stack_in_use;
+static int upc_cmode_stack_allocated;
+static void init_pragma_upc (void);
+static void handle_pragma_upc (cpp_reader * ARG_UNUSED (dummy));

c-decl.c handles the additional UPC qualifiers
and declspecs.  The layout qualifier is handled here:

--- gcc/c/c-decl.c	(.../trunk)	(revision 190707)
+++ gcc/c/c-decl.c	(.../branches/gupc)	(revision 190736)
@@ -8857,6 +9046,23 @@ declspecs_add_qual (source_location loc,
   bool dupe = false;
   specs->non_sc_seen_p = true;
   specs->declspecs_seen_p = true;
+  /* A UPC layout qualifier is encoded as an ARRAY_REF,
+     further, it implies the presence of the 'shared' keyword. */
+  if (TREE_CODE (qual) == ARRAY_REF)
+    {
+      if (specs->upc_layout_qualifier)
+        {
+          error ("two or more layout qualifiers specified");
+	  return specs;
+        }
+      else
+        {
+          specs->upc_layout_qualifier = qual;
+          qual = ridpointers[RID_SHARED];
+        }
+    }

In UPC, a qualifier includes both the traditional
"C" qualifier flags and the UPC "layout qualifier".
Thus, the pointer_quals field of a declarator node
is defined as a struct including both qualifier
flags and the UPC type qualifier, as shown below.
@@ -5702,7 +5835,9 @@ grokdeclarator (const struct c_declarato
 	    /* Process type qualifiers (such as const or volatile)
 	       that were given inside the `*'.  */
-	    type_quals = declarator->u.pointer_quals;
+	    type_quals = declarator->u.pointer.quals;
+	    upc_layout_qualifier = declarator->u.pointer.upc_layout_qual;
+	    sharedp = ((type_quals & TYPE_QUAL_SHARED) != 0);

UPC shared variables are allocated at runtime in the global
memory that is allocated and managed by the UPC runtime.
A separate link section is used as a method of assigning
virtual addresses to UPC shared variables.  The UPC
shared variable section is designated as a "no load"
section on systems that support that facility; in that
case, the linkage section begins at virtual address zero.
The logic below assigns UPC shared variables to
their own linkage section.

@@ -6235,6 +6409,13 @@ grokdeclarator (const struct c_declarato
+    /* Shared variables are given their own link section on
+       most target platforms, and if compiling in pthreads mode
+       regular local file scope variables are made thread local. */
+    if ((TREE_CODE(decl) == VAR_DECL)
+        && !threadp && (TREE_SHARED (decl) || flag_upc_pthreads))
+      upc_set_decl_section (decl);

Various UPC language related checks and operations
are called in the "C" front-end and middle-end.
To insure that these operations are defined,
when linked with the other language front-ends
and compilers, these functions are stub-ed,
in a fashion similar to Objective C:

--- gcc/c-family/c-upc.h	(.../trunk)	(revision 0)
+++ gcc/c-family/c-upc.h	(.../branches/gupc)	(revision 190736)
+/* UPC entry points.  */
+/* The following UPC functions are called by the C front-end;
+ * they all must have corresponding stubs in stub-upc.c.  */
+extern int count_upc_threads_refs (tree);
+extern void deny_pragma_upc (void);
+extern int get_upc_consistency_mode (void);
+extern tree upc_rts_forall_depth_var (void);
+extern void upc_set_decl_section (tree);
+extern void upc_write_global_declarations (void);

A few command line option flags must also be
stub'ed out in order to link the other
language front-ends.

--- gcc/c-family/stub-upc.c	(.../trunk)	(revision 0)
+++ gcc/c-family/stub-upc.c	(.../branches/gupc)	(revision 190736)
+int compiling_upc;
+int flag_upc;
+int use_upc_dwarf2_extensions;

The complete set of GUPC-related patches will be provided for
review in a collection of 16 patch sets.  A listing of those
patch sets is attached.

Each patch set will be sent in an separate email following
this one for the purposes of review.

		      -- end --

Attachment: gupc-patch-groups.txt
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]