Coarrays via GNU Fortran's Coarray Communication Library
Contents
-
Coarrays via GNU Fortran's Coarray Communication Library
- WARNING
- Obtaining and Compiling the Coarray Communication Libraries
- Compiling Coarray Programs
- Running Coarray Programs
- Combining Coarray Parallelization with OpenMP, MPI, pthreads, etc.
- Coarray Fortran with a Non-Fortran Main Program (for -fcoarray=lib)
- Implementation Details
WARNING
The MPI and ARMCI versions of the Coarray Communication Library are very incomplete. Nearly all programs will not work, typically by ignoring the requested communication and operating on the local image instead.
GNU Fortran currently supports three coarray modes, which can be selected via the -fcoarray= flag:
none: The default, which prints an error when a coarray construct is encountered
single: Optimized version for a single image, which allows for fast serial programs
lib: A communication-library-based coarray version, described on this page
Planned is also a shared-memory version.
Currently, three GNU Fortran coarray communication libraries exist:
A single-image library consisting of stubs, which allows to use a single image without recompiling and is also useful for debugging. (Using -fcoarray=single will produce faster code, however.)
A MPI version which is a wrapper to calls to a library implementing the Message Passing Interface (MPI). Planned is a purely MPI 1.x version; the current rough version also uses few MPI 2.x features
An ARMCI version which is a wrapper to calls to the ARMCI library, which is part of the Global Array Toolkit.
The whole implementation is in an embryonic state, most features do not yet work! Both the implementation in the compiler itself (front end) as also in the MPI and ARMCI libraries still require quite some work.
Obtaining and Compiling the Coarray Communication Libraries
The single library version is automatically installed with GCC. Thus it can be directly linked with -lcaf_single. For MPI or if you want to have a special version of libcaf_single, follow the instructions below.
The libraries are located in the GCC trunk (4.7) source code in the libgfortran/caf directory. (CAF = Co-Array Fortran, the initial name of the feature.) Note that the MPI and ARMCI versions will currently not be automatically compiled and installed
Obtain the files: by checking out the subversion or GIT repository, using the snapshot tar balls or directly by clicking on the links below. The following files are needed:
libcaf.h - common C header file
single.c - C source code for the single-image stub version
mpi.c - C source code for the MPI version
armci.c - C source code of the ARMCI version (not yet in the trunk, use: https://userpage.physik.fu-berlin.de/~tburnus/coarray/armci.c)
- Compiling the files
gcc -c -O2 single.c mpicc -c -O2 mpi.c mpicc -c -O2 armci.c
Note 1: The files might offer different versions, depending on some pre-processor flag. For instance, via GFC_CAF_CHECK (-DGFC_CAF_CHECK) some run-time checks can be enabled.
Note 2: Hereby, it has been assumed that both versions, single and MPI, should be compiled and that you have an MPI compiler installed, where mpicc invokes the proper compiler. The created object files (.o) can directly be linked to the coarray programs - or one proceeds to step 3. Note 3: It has been assumed that ARMCI version uses MPI as backend and that the header files are in the standard include path.
- Creating a library. Run now:
ar rcv libcaf_single.a single.o ranlib libcaf_single.a ar rcv libcaf_mpi.a mpi.o ranlib libcaf_mpi.a ar rcv libcaf_armci.a armci.o ranlib libcaf_armci.a
- Creating a shared library
mpicc -O2 -fPIC -shared -Wl,-soname,libcaf_mpi.so.1 -o libcaf_mpi.so.1.0.0 mpi.c mpicc -O2 -fPIC -shared -Wl,-soname,libcaf_armci.so.1 -o libcaf_armci.so.1.0.0 armci.c
Usually a static library will produce faster code. However, if you want to use your program on several systems without recompiling (e.g. because it is closed source) or if you regularly test several MPI versions or ARMCI with different backends, using a shared library might be better.If you want to switch between different CAF communication libraries, you can also generate a "libcaf.so" - and then use the LD_LIBRARY_PATH to switch between, e.g., the single.c and, e.g., the mpi.c version. However, be prepared to get confused ...
Compiling Coarray Programs
Simply compile the Fortran files as usually using the -fcoarray=lib flag.
For the single version, simply add -lcaf_single when linking the program. If you want to use your self-compiled version, you can also simply link single.o.
For the MPI version, you need to also link the MPI library; the easiest is to run mpif90 to link the files. Example:
mpif90 *.o -lcaf_mpi
You may need to use an environment variable or a command-line option to make sure MPI uses the current gfortran version. For Open MPI use, e.g., OMPI_FC=gfortran and for MPICH2, e.g., -f90=gfortran. Note, however, that this still requires that the MPI library is compatible to the current gfortran version; if in doubt, recompile
For the ARMCI version it depends on the backend; with the MPI backend, use
mpif90 *.o -lcaf_mpi -larmci
Running Coarray Programs
For the single version, simply run the program as usual.
For the MPI version and ARMCI with MPI backend: Start the program as you would start any other MPI program; for instance use
mpiexec -n 10 ./myCoarrayProgram
Combining Coarray Parallelization with OpenMP, MPI, pthreads, etc.
In principle, combining coarrays with other means of parallelization should work, but it is the users responsibility to avoid race conditions and other issues.
For the MPI version of the coarray library: Make sure that the user code does not call MPI_Init if the library is already intialized; e.g.
integer :: ierror logical :: init call MPI_Initialized (init, ierror) if (.not. init) call MPI_Init (ierror)
Coarray Fortran with a Non-Fortran Main Program (for -fcoarray=lib)
If the main program is not written in Fortran, the initialization and finalization of the coarray communication library needs to be called manually. The other prototypes are listed for references and should usually not be needed. The value of the named constants and function prototypes can be found in the C header file libcaf.h.
Note that this documentation is still in a flux and incomplete. The source code of the compiler and the libraries is the authoritative source; however, the API is not yet finished and will keep changing for a while.
Initializing the Coarray Communication Library
The initialization happens automatically if the main program is written in Fortran and its file is compiled with -fcoarray=lib. The initialization function has the following prototype
void
_gfortran_caf_init (int *argc, char ***argv,
int *this_image, int *num_images);argc (intent(inout)): NULL or a pointer containing the number of the command-line arguments
argv (intent(inout)): NULL or a pointer to a character array containing of the command-line arguments
this_image (intent(out)): Returns the number of the invoking image, starting from 1
num_images (intent(out)): Returns the total number of available images
The image numbers returned by _gfortran_caf_init need to be stored in global variables with the following type and name
const int _gfortran_caf_this_image; const int _gfortran_caf_num_images;
Note: As static coarrays are registered at start up time (via constructor functions [__attribute__((constructor))]), the _gfortran_caf_init calls comes after the actual library initialization; nevertheless, it should happen early and should be followed by a SYNC ALL call to make sure the initialization and coarray registration has happened on all images. Nevertheless, the _gfortran_caf_init is required and should include the command-line arguments.
Closing down the Coarray Communication Library
The following command should be called at the end of the program. It is automatically invoked at the end of the program, if the main program is written in Fortran and compiled with -fcoarray=lib. It is also invoked before the STOP statement, if the file is compiled with -fcoarray=lib. The prototype is
void _gfortran_caf_finalize (void);
Calling SYNC ALL
The SYNC ALL statement has the following prototype:
void _gfortran_caf_sync_all (int *stat, char *errmsg, int errmsg_len);
stat (intent(out)): If not NULL: set to 0 on success; set to STAT_STOPPED_IMAGE or other nonzero value on error
errmsg (intent(inout)): If not NULL: On error, it gets assigned an error message
errmsg_len (intent(in)): The string length of errmsg - or 0.
Calling SYNC IMAGES
The SYNC IMAGES statement has the following prototype:
void
_gfortran_caf_sync_images (int count, int images[],
int *stat, char *errmsg, int errmsg_len);count (intent(in)): Number of images passed by the next argument. -1 indicates SYNC IMAGES(*), 0 is a zero-sized array.
images (intent(in)): Array of image numbers to be used for synchronizing.
stat (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error
errmsg (intent(inout)): If not NULL: On error, it gets assigned an error message - otherwise neither read nor set
errmsg_len (intent(in)): The string length of errmsg - or 0
Calling SYNC MEMORY
This memory barrier is implemented using __sync_synchronize.
The STOP statement
The STOP statement is handled as in serial programs; if the file has been compiled with -fcoarray=lib, before the stop statement _gfortran_caf_finalize is invoked.
The ERROR STOP statement
For the ERROR STOP statement, the library is called and a graceful stop is tried before ending the program forcefully. The functions do not return. Two library functions are implemented, their prototype are
void _gfortran_caf_error_stop_str (const char *string, int32_t len); void _gfortran_caf_error_stop (int32_t error)
string (intent(in)): an string with an error message; it is written in the ERROR STOP string to stderr
len (intent(in)): the length of string.
error (intent(in)): an error number used, if possible, as exit status code and in ERROR STOP string, written to stderr.
Registering coarrays
void *
_gfortran_caf_register (ptrdiff_t size, caf_register_t type, caf_token_t ***token,
int *stat, char *errmsg, int errmsg_len);Return value: The address of the coarrays -- or NULL if an error occurred
size: Byte size of the coarray
type: CAF_REGTYPE_COARRAY_STATIC, CAF_REGTYPE_COARRAY_ALLOC, CAF_REGTYPE_LOCK, CAF_REGTYPE_LOCK_COMP
token: Coarray token
stat (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error
errmsg (intent(inout)): If not NULL: On error, it gets assigned an error message - otherwise neither read nor set
errmsg_len (intent(in)): The string length of errmsg - or 0
Note 1: For nonallocatable coarrays, this function needs to be called at startup of the program - or at least before any of the other images uses it. The registering must happen in the same order on all images. With gfortran, the registering happens via constructor functions (__attribute__((constructor))) such that usually a _gfortran_caf_register call precedes the _gfortran_caf_init call.
Note 2: The deregistering for allocatable coarrays is done via an explicit call to _gfortran_caf_deregister, for nonallocatable ones, this is done in _gfortran_caf_finalize.
Deregistering coarrays
void _gfortran_caf_deregister (const caf_token_t ***token, int *stat, char *errmsg, int errmsg_len);
stat (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error
token: Coarray token
errmsg: Is set to an error message if an error occured; otherwise it is left unmodified. Set to NULL if not needed
errmsg_len: The length of the buffer errmsg; set to 0 if errmsg is NULL
Note: This function only applies to allocatable coarrays, the others are automatically deregistered in _gfortran_caf_finalize.
Sending data to a remote image
The following is still rather incomplete and not completely thought through. Expect changes and additions for missed cases. Recall that communication is possible from local to remote, from remote to local and remote to remote; there can be scalars and (simply) contiguous arrays but also arrays with strides and vectors. Asynchronous communication is possible for defining remote coarrays (be careful of references to the same coarray later in the same segment) and referencing multiple coarrays in the same statement.
Note that especially broadcasts (as special support is lacking) should be optimized by doing asynchronous communication - with a wait after the loop. Requires that RHS may not be modified during the loop and does not go out of scope (temporary variable!)
do i = 1, num_images()
! if (i /= this_image()) ! (Optionally)
settings[i] = settings
end do
Sending to a scalar or to a contiguous array section
void _gfortran_caf_send (const caf_token_t ***token, size_t offset, int image_index, const void *data, size_t size, bool asynchonous);
token: The token of the array to be written to.
offset: Difference between the coarray base address and the actual data, used for caf(3)[2] = 8 or caf[4]%a(4)%b = 7.
image_index: Index of the coarray (typically remote, though it can also be on this_image).
data: Pointer to the to-be-transferred data.
size: The number of bytes to be transferred.
asynchronous: Return before the data transfer has been complete
Remarks:
Transferring data on this_image() is possible, including a[this_image()] = a such that the library has to handle fully overlap of input and output.
- The compiler does not assume a forward or backward move in the assignment, hence, a temporary is generated if there could be a partial but not full overlap.
- If the image_index == this_image, the library function shall not return before the data is set to the variable
Waiting for asynchronous transactions
void _gfortran_caf_send_wait (const caf_token_t ***token)
token: Token identifying a coarray.
Remarks:
- This function waits for all pending sending transactions related to the coarray identified by the token
- When this function returns, the data is not required to have arrived on the remote image, but accessing the data from the sending image shall give the new value and the memory used for the transfer, can be re-used.
- The main reason for this call is ensure that data is not modified while the data transfer is still under way or that the temporary variable used for sending the data is not yet freed.
Implementation Details
Argument handling
If a coarray is passed as actual argument to a coarray dummy, additional information needs to be transferred.
- For descriptor-free arrays and for assumed-shape coarray dummies: There are two hidden arguments, the "token" and the offset (of type ptrdiff_t). The latter contains the offset between the address of the (first element of) object which was passed and the address which has been saved in the token. Think of "call sub(caf(2:8))" or "call sub(caf%comp2)" for examples where this can occur.
- For allocatable coarrays, which use a descriptor: The descriptor simply contain besides rank-elements of the array-bound dimension triplets also corank-elements (dim[0] ...dim[rank-1] and dim[rank]...dim[rank+corank-1]); additionally the "token" is stored after the dimension triplets.
Front-end internal representation
Descriptor-free coarrays are arrays of the type GFC_ARRAY_TYPE_P; scalar types are normal scalars with the language-specific node attached. In this lang-specific node, the corank (GFC_TYPE_ARRAY_CORANK), the cobounds (GFC_TYPE_ARRAY_LBOUND and GFC_TYPE_ARRAY_UBOUND), token (GFC_TYPE_ARRAY_CAF_TOKEN) and offset (GFC_TYPE_ARRAY_CAF_OFFSET) is saved.
Allocatable coarrays use a descriptor - contrary to normal allocatables, also scalar coarrays have one. The descriptor contains additional elements as outlined above.
The passed token and offset arguments for assumed-shape coarrays are stored in the language-specific declaration (DECL_LANG_SPECIFIC) as GFC_DECL_TOKEN and DECL_LANG_SPECIFIC.
The token is an opaque object (of type "void*"), which contains in some direct or indirect way the base address of the coarray. The details are left to the library implementation. Possible choices would be the base address itself (e.g. for libcaf_single.a) or the base address of the coarray on all images (e.g. for libcaf_mpi.a).