= Coarrays via GNU Fortran's Coarray Communication Library = <> ---- == WARNING == /!\ '''The MPI and ARMCI versions of the Coarray Communication Library are very incomplete. Nearly all programs will not work, typically by ignoring the requested communication and operating on the local image instead.''' * See the '''[[Coarray#Supported_by_the_library_.28-fcoarray.3Dlib.29|list of features supported by the CAF library]]''' ---- GNU Fortran currently supports three '''[[Coarray|coarray]]''' modes, which can be selected via the '''`-fcoarray=`''' flag: * '''`none`''': The default, which prints an error when a coarray construct is encountered * '''`single`''': Optimized version for a single image, which allows for fast serial programs * '''`lib`''': A communication-library-based coarray version, described on this page ''Planned is also a shared-memory version.'' Currently, three GNU Fortran coarray communication libraries exist: * A '''single'''-image library consisting of stubs, which allows to use a single image without recompiling and is also useful for debugging. (Using `-fcoarray=single` will produce faster code, however.) * A '''MPI''' version which is a wrapper to calls to a library implementing the [[http://www.mpi-forum.org/docs/|Message Passing Interface (MPI)]]. Planned is a purely MPI 1.x version; the current rough version also uses few MPI 2.x features * An '''ARMCI''' version which is a wrapper to calls to the ARMCI library, which is part of the [[http://www.emsl.pnl.gov/docs/global/|Global Array Toolkit]]. /!\ The whole implementation is in an embryonic state, most features do not yet work! Both the implementation in the compiler itself (front end) as also in the MPI and ARMCI libraries still require quite some work. == Obtaining and Compiling the Coarray Communication Libraries == The ''single'' library version is automatically installed with GCC. Thus it can be directly linked with `-lcaf_single`. For MPI or if you want to have a special version of `libcaf_single`, follow the instructions below. The libraries are located in the GCC trunk (4.7) source code in the `libgfortran/caf` directory. (CAF = ''Co-Array Fortran'', the initial name of the feature.) Note that the MPI and ARMCI versions will currently not be automatically compiled and installed 1. Obtain the files: by checking out the [[http://gcc.gnu.org/svn.html|subversion]] or [[GitMirror|GIT]] repository, using the snapshot tar balls or directly by clicking on the links below. The following files are needed: * [[http://gcc.gnu.org/viewcvs/trunk/libgfortran/caf/libcaf.h?content-type=text%2Fplain&view=co|libcaf.h]] - common C header file * [[http://gcc.gnu.org/viewcvs/trunk/libgfortran/caf/single.c?content-type=text%2Fplain&view=co|single.c]] - C source code for the single-image stub version * [[http://gcc.gnu.org/viewcvs/trunk/libgfortran/caf/mpi.c?content-type=text%2Fplain&view=co|mpi.c]] - C source code for the MPI version * armci.c - C source code of the ARMCI version (not yet in the trunk, use: https://userpage.physik.fu-berlin.de/~tburnus/coarray/armci.c) 2. Compiling the files {{{ gcc -c -O2 single.c mpicc -c -O2 mpi.c mpicc -c -O2 armci.c }}} Note 1: The files might offer different versions, depending on some pre-processor flag. For instance, via `GFC_CAF_CHECK` (`-DGFC_CAF_CHECK`) some run-time checks can be enabled. Note 2: Hereby, it has been assumed that both versions, single and MPI, should be compiled and that you have an MPI compiler installed, where `mpicc` invokes the proper compiler. The created object files (`.o`) can directly be linked to the coarray programs - or one proceeds to step 3. Note 3: It has been assumed that ARMCI version uses MPI as backend and that the header files are in the standard include path. 3. Creating a library. Run now: {{{ ar rcv libcaf_single.a single.o ranlib libcaf_single.a ar rcv libcaf_mpi.a mpi.o ranlib libcaf_mpi.a ar rcv libcaf_armci.a armci.o ranlib libcaf_armci.a }}} 4. Creating a shared library {{{ mpicc -O2 -fPIC -shared -Wl,-soname,libcaf_mpi.so.1 -o libcaf_mpi.so.1.0.0 mpi.c mpicc -O2 -fPIC -shared -Wl,-soname,libcaf_armci.so.1 -o libcaf_armci.so.1.0.0 armci.c }}} Usually a static library will produce faster code. However, if you want to use your program on several systems without recompiling (e.g. because it is closed source) or if you regularly test several MPI versions or ARMCI with different backends, using a shared library might be better. If you want to switch between different CAF communication libraries, you can also generate a "libcaf.so" - and then use the `LD_LIBRARY_PATH` to switch between, e.g., the `single.c` and, e.g., the `mpi.c` version. However, be prepared to get confused ... == Compiling Coarray Programs == Simply compile the Fortran files as usually using the ''`-fcoarray=lib`'' flag. * For the ''single'' version, simply add `-lcaf_single` when linking the program. If you want to use your self-compiled version, you can also simply link `single.o`. * For the ''MPI'' version, you need to also link the MPI library; the easiest is to run `mpif90` to link the files. Example: {{{ mpif90 *.o -lcaf_mpi }}} /!\ You may need to use an environment variable or a command-line option to make sure MPI uses the current gfortran version. For [[http://open-mpi.org|Open MPI]] use, e.g., `OMPI_FC=`gfortran and for [[http://www.mcs.anl.gov/mpi/mpich/|MPICH2]], e.g., `-f90=`gfortran. Note, however, that this still requires that the MPI library is compatible to the current gfortran version; if in doubt, recompile :-) * For the '''ARMCI''' version it depends on the backend; with the MPI backend, use {{{ mpif90 *.o -lcaf_mpi -larmci }}} == Running Coarray Programs == * For the ''single'' version, simply run the program as usual. * For the ''MPI'' version and '''ARMCI''' with MPI backend: Start the program as you would start any other MPI program; for instance use {{{ mpiexec -n 10 ./myCoarrayProgram }}} == Combining Coarray Parallelization with OpenMP, MPI, pthreads, etc. == In principle, combining coarrays with other means of parallelization should work, but it is the users responsibility to avoid race conditions and other issues. For the MPI version of the coarray library: Make sure that the user code does not call `MPI_Init` if the library is already intialized; e.g. {{{ integer :: ierror logical :: init call MPI_Initialized (init, ierror) if (.not. init) call MPI_Init (ierror) }}} == Coarray Fortran with a Non-Fortran Main Program (for -fcoarray=lib) == If the main program is not written in Fortran, the initialization and finalization of the coarray communication library needs to be called manually. The other prototypes are listed for references and should usually not be needed. The value of the named constants and function prototypes can be found in the C header file [[http://gcc.gnu.org/viewcvs/trunk/libgfortran/caf/libcaf.h?content-type=text%2Fplain&view=co|libcaf.h]]. /!\ Note that this documentation is still in a flux and incomplete. The source code of the compiler and the libraries is the authoritative source; however, the API is not yet finished and will keep changing for a while. === Initializing the Coarray Communication Library === The initialization happens automatically if the main program is written in Fortran and its file is compiled with `-fcoarray=lib`. The initialization function has the following prototype {{{ void _gfortran_caf_init (int *argc, char ***argv, int *this_image, int *num_images); }}} * '''argc''' (`intent(inout)`): `NULL` or a pointer containing the number of the command-line arguments * '''argv''' (`intent(inout)`): `NULL` or a pointer to a character array containing of the command-line arguments * '''this_image''' (`intent(out)`): Returns the number of the invoking image, starting from 1 * '''num_images''' (`intent(out)`): Returns the total number of available images The image numbers returned by `_gfortran_caf_init` need to be stored in global variables with the following type and name {{{ const int _gfortran_caf_this_image; const int _gfortran_caf_num_images; }}} Note: As static coarrays are registered at start up time (via constructor functions [`__attribute__((constructor))`]), the `_gfortran_caf_init` calls comes after the actual library initialization; nevertheless, it should happen early and should be followed by a SYNC ALL call to make sure the initialization and coarray registration has happened on all images. Nevertheless, the `_gfortran_caf_init` is required and should include the command-line arguments. === Closing down the Coarray Communication Library === The following command should be called at the end of the program. It is automatically invoked at the end of the program, if the main program is written in Fortran and compiled with `-fcoarray=lib`. It is also invoked before the `STOP` statement, if the file is compiled with `-fcoarray=lib`. The prototype is {{{ void _gfortran_caf_finalize (void); }}} === Calling SYNC ALL === The `SYNC ALL` statement has the following prototype: {{{ void _gfortran_caf_sync_all (int *stat, char *errmsg, int errmsg_len); }}} * '''stat''' (intent(out)): If not NULL: set to 0 on success; set to STAT_STOPPED_IMAGE or other nonzero value on error * '''errmsg''' (intent(inout)): If not NULL: On error, it gets assigned an error message * '''errmsg_len''' (intent(in)): The string length of `errmsg` - or 0. === Calling SYNC IMAGES === The `SYNC IMAGES` statement has the following prototype: {{{ void _gfortran_caf_sync_images (int count, int images[], int *stat, char *errmsg, int errmsg_len); }}} * '''count''' (intent(in)): Number of images passed by the next argument. -1 indicates `SYNC IMAGES(*)`, 0 is a zero-sized array. * '''images''' (intent(in)): Array of image numbers to be used for synchronizing. * '''stat''' (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error * '''errmsg''' (intent(inout)): If not NULL: On error, it gets assigned an error message - otherwise neither read nor set * '''errmsg_len''' (intent(in)): The string length of `errmsg` - or 0 === Calling SYNC MEMORY === This memory barrier is implemented using [[http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html#index-g_t_005f_005fsync_005fsynchronize-2690|__sync_synchronize]]. === The STOP statement === The STOP statement is handled as in serial programs; if the file has been compiled with -fcoarray=lib, before the stop statement `_gfortran_caf_finalize` is invoked. === The ERROR STOP statement === For the `ERROR STOP` statement, the library is called and a graceful stop is tried before ending the program forcefully. The functions do not return. Two library functions are implemented, their prototype are {{{ void _gfortran_caf_error_stop_str (const char *string, int32_t len); void _gfortran_caf_error_stop (int32_t error) }}} * '''string''' (intent(in)): an string with an error message; it is written in the `ERROR STOP` string to stderr * '''len''' (intent(in)): the length of `string`. * '''error''' (intent(in)): an error number used, if possible, as exit status code and in `ERROR STOP` string, written to stderr. === Registering coarrays === {{{ void * _gfortran_caf_register (ptrdiff_t size, caf_register_t type, caf_token_t ***token, int *stat, char *errmsg, int errmsg_len); }}} * '''Return value:''' The address of the coarrays -- or NULL if an error occurred * '''size''': Byte size of the coarray * '''type''': CAF_REGTYPE_COARRAY_STATIC, CAF_REGTYPE_COARRAY_ALLOC, CAF_REGTYPE_LOCK, CAF_REGTYPE_LOCK_COMP * '''token''': Coarray token * '''stat''' (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error * '''errmsg''' (intent(inout)): If not NULL: On error, it gets assigned an error message - otherwise neither read nor set * '''errmsg_len''' (intent(in)): The string length of `errmsg` - or 0 Note 1: For nonallocatable coarrays, this function needs to be called at startup of the program - or at least before any of the other images uses it. The registering must happen in the same order on all images. With gfortran, the registering happens via constructor functions (`__attribute__((constructor))`) such that usually a `_gfortran_caf_register` call precedes the ` _gfortran_caf_init` call. Note 2: The deregistering for allocatable coarrays is done via an explicit call to _gfortran_caf_deregister, for nonallocatable ones, this is done in _gfortran_caf_finalize. === Deregistering coarrays === {{{ void _gfortran_caf_deregister (const caf_token_t ***token, int *stat, char *errmsg, int errmsg_len); }}} * '''stat''' (intent(out)): If not NULL: Set to 0 on success, set to STAT_STOPPED_IMAGE or other nonzero value on error * '''token''': Coarray token * '''errmsg''': Is set to an error message if an error occured; otherwise it is left unmodified. Set to NULL if not needed * '''errmsg_len''': The length of the buffer `errmsg`; set to 0 if errmsg is NULL Note: This function only applies to allocatable coarrays, the others are automatically deregistered in _gfortran_caf_finalize. === Sending data to a remote image === /!\ The following is still rather incomplete and not completely thought through. Expect changes and additions for missed cases. Recall that communication is possible from local to remote, from remote to local and remote to remote; there can be scalars and (simply) contiguous arrays but also arrays with strides and vectors. Asynchronous communication is possible for defining remote coarrays (be careful of references to the same coarray later in the same segment) and referencing multiple coarrays in the same statement. Note that especially broadcasts (as special support is lacking) should be optimized by doing asynchronous communication - with a wait after the loop. Requires that RHS may not be modified during the loop and does not go out of scope (temporary variable!) {{{ do i = 1, num_images() ! if (i /= this_image()) ! (Optionally) settings[i] = settings end do }}} ==== Sending to a scalar or to a contiguous array section ==== {{{ void _gfortran_caf_send (const caf_token_t ***token, size_t offset, int image_index, const void *data, size_t size, bool asynchonous); }}} * '''token''': The token of the array to be written to. * '''offset''': Difference between the coarray base address and the actual data, used for `caf(3)[2] = 8` or `caf[4]%a(4)%b = 7`. * '''image_index''': Index of the coarray (typically remote, though it can also be on `this_image`). * '''data''': Pointer to the to-be-transferred data. * '''size''': The number of bytes to be transferred. * '''asynchronous''': Return before the data transfer has been complete Remarks: * Transferring data on `this_image()` is possible, including `a[this_image()] = a` such that the library has to handle fully overlap of input and output. * The compiler does not assume a forward or backward move in the assignment, hence, a temporary is generated if there could be a partial but not full overlap. * If the image_index == this_image, the library function shall not return before the data is set to the variable ==== Waiting for asynchronous transactions ==== {{{ void _gfortran_caf_send_wait (const caf_token_t ***token) }}} * '''token''': Token identifying a coarray. Remarks: * This function waits for all pending sending transactions related to the coarray identified by the token * When this function returns, the data is not required to have arrived on the remote image, but accessing the data from the sending image shall give the new value and the memory used for the transfer, can be re-used. * The main reason for this call is ensure that data is not modified while the data transfer is still under way or that the temporary variable used for sending the data is not yet freed. == Implementation Details == === Argument handling === If a coarray is passed as actual argument to a coarray dummy, additional information needs to be transferred. * For descriptor-free arrays and for assumed-shape coarray dummies: There are two hidden arguments, the "token" and the offset (of type ptrdiff_t). The latter contains the offset between the address of the (first element of) object which was passed and the address which has been saved in the token. Think of "call sub(caf(2:8))" or "call sub(caf%comp2)" for examples where this can occur. * For allocatable coarrays, which use a descriptor: The descriptor simply contain besides rank-elements of the array-bound dimension triplets also corank-elements (dim[0] ...dim[rank-1] and dim[rank]...dim[rank+corank-1]); additionally the "token" is stored after the dimension triplets. === Front-end internal representation === Descriptor-free coarrays are arrays of the type GFC_ARRAY_TYPE_P; scalar types are normal scalars with the language-specific node attached. In this lang-specific node, the corank (GFC_TYPE_ARRAY_CORANK), the cobounds (GFC_TYPE_ARRAY_LBOUND and GFC_TYPE_ARRAY_UBOUND), token (GFC_TYPE_ARRAY_CAF_TOKEN) and offset (GFC_TYPE_ARRAY_CAF_OFFSET) is saved. Allocatable coarrays use a descriptor - contrary to normal allocatables, also scalar coarrays have one. The descriptor contains additional elements as outlined above. The passed token and offset arguments for assumed-shape coarrays are stored in the language-specific declaration (DECL_LANG_SPECIFIC) as GFC_DECL_TOKEN and DECL_LANG_SPECIFIC. The token is an opaque object (of type "void*"), which contains in some direct or indirect way the base address of the coarray. The details are left to the library implementation. Possible choices would be the base address itself (e.g. for libcaf_single.a) or the base address of the coarray on all images (e.g. for libcaf_mpi.a).