This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

(Fortran) Coarrays in GCC - library/build questions, RFC & announcement


Hello all,

I have a library-usage question to the SC, a (wrapping) library
building/shipping question to GCC@  readers, and, of course, I also want
to announce the project a bit wider and seek for comments.

 * * *

Coarrays are an extension of Fortran, which date back to the 1990s but
have now been integrated into the upcoming* Fortran 2008 standard
(ISO/IEC 1539-1:2010). Coarrays can be used to parallelize programs
using a partitioned global address space (PGAS) and following the
single-program--multiple-data (SPMD) scheme. As coarrays are part of the
language, a strong type checking is provided. Each process (called
image) has its own private variables. Only variables which have a
so-called codimension are addressable from other images.

The C (C99) analogue is called Unified Parallel C (UPC), which is,
however, not (yet) an international standard. There exists a GCC UPC
compiler since a couple of years and there are plans to merge it
("GUPC") into GCC 4.6 trunk, cf.
http://gcc.gnu.org/ml/gcc/2010-04/msg00117.html (There are also plans to
standardize the UPC--Coarray-Fortran interoperability.)

A bit longer description of coarrays, references to the standard, to
introductory texts, to talks (including Toon's GCC Summit talk), the
current status, and an unsorted collection of thoughts can be found at
http://users.physik.fu-berlin.de/~tburnus/coarray/README.txt

Currently, single-image support (i.e. compiling a coarray program as
serial program for one image) is (nearly fully) implemented in the GCC
trunk (4.6, -fcoarray=single). The next step is add support for multiple
images. It is planned to implement a shared-memory thread-based version
and a library version.

 * * *

Regarding the library version (which will be implemented first): There
are several suitable libraries available:

a) MPI (message passing interface, http://www.mpi-forum.org/), which is
widely used and several Open Source implementations exist, such as Open
MPI and MPICH(2). MPI is also well documented (the API, how to use a
given implementation, and MPI in general). MPIv1.x allows for two-sided
communication, MPIv2 added additionally single-sided communication.

b) GASNet (http://gasnet.cs.berkeley.edu/) a single-sided communication,
BSD-licensed library by UC Berkeley. This library is also used by GUPS
via Berkeley's UPC library.

c) ARMCI (Aggregated Remote Memory Copy,
http://www.emsl.pnl.gov/docs/parsoft/armci/) + GA (Global Array), a
single-sided communication library by DoE's EMSL; rather free licence
and redistributable, but requires registration for download at the EMSL
homepage

d) Not really available, but as in-between solution: One can implement
the threaded version used a library, which might be a faster way to get
additionally a threaded version than implementing thread version
directly in the front end. (Which is also planned.)

[Both (b) and (c) are used with PGAS languages on HPC systems and are
said to scale well. There are claims that the PGAS programming scheme
allows to write faster communication libraries than MPI does, but as the
underlying task is the same, I only expect minute difference, which more
depend on the actual implementation than on the interface/programming
model. The plan is to provide at the end MPI, GASNet, and ARMCI+GA
wrappers, which will then allow to do comparisons.]


Question to the GCC Steering Committee: Do you see any problems of
supporting those libraries? For Berkeley's GASNet the question also
applies to GUPS. (GUPS uses shared-memory via threads but also can use
Berkeley's UPC library, which is based on GASNet.)


Implementation: The current plan is to start with (a), i.e. MPI, and try
hard to avoid race conditions and thus possibly tries to avoid
single-sided communication.** - Next would be probably (d), (b), or a
version of (a) which fully relies on MPI's single-sided-communication
[let's see]. There might be also two versions for each library - one
which tuned for performance and one for debugging, possibly with
different API.

As - contrary to, e.g., UPC - one cannot read C header files in Fortran,
one needs a always a wrapper library. For MPI it also depends on the MPI
implementation. Thus, I was thinking of simply providing
gfortran_caf_<library>.c files to be used as:
  mpicc -c $(CFLAGS) gfortran_caf_mpi.c
  mpif90 $(FFLAGS) coarray_program.f90 gfortran_caf_mpi.o

That way also LTO nicely works (even without gold); however, the
question is only how to best ship this library. The thread version could
be simply compiled and shipped with gfortran, but the others ... Ideas?
Suggestions?


For the implementation, the current agenda is:

a) Finishing the remaining to-do items for the single-image version
b) Design an MPIv1 version (maybe simultaneously start implementing the
more obvious parts, such as startup, barriers, shutdown, and error abort)
c) Implement the the actual coarray initialization/communication part
d) Test it

I would be happy to have some more support for (b), (c) and (d);
especially, I would like those having experience in either GCC internals
(especially backend and for different targets) or in high-performance
computing would have a look at the (upcoming) design draft or actual
implementation to suggest improvements for stability (such as avoiding
race conditions) and performance. There are already some on board, which
have, but the more expertise the better :-)


Testing: One problem with regards to testing (correctness, performance,
scaling): There does not seem to be any larger, publicly available
coarray program; and the smaller tests I know do not cover things like
locks or atomic operations. I hope that this will improve, but if you
have a coarray program that you could make available - publicly or
privately, or if you could test it - that would be awesome!

The goal is to support parallelization for small 2, 4, or 8 core work
stations (via threads and out of the box), smaller 8, 16, 32 clusters
(Gigabit/Infiniband with MPI/GASNet/ARMCI), but also for large HPC
systems such as x86-64 with 20,000+ cores or Blue Gene with 200,000+
processors (on such systems, GCC is usually installed as backup/fallback
compiler***).

Tobias

PS: To my knowledge, coarrays are currently supported by the Cray
compiler (since many years, but I think the latest version now also
accepts the modified syntax of Fortran 2008), by the Rice meta-compiler
(old syntax only), and by g95 (since about two years). Additionally, one
can expect that the major commercial vendors will have coarray support
relatively soon as there seems to be a large demand and as Fortran 2008
is now almost an ISO standard.* The GCC Fortran bugreport PR 18918 dates
back to 2004 and also on the gfortran list and on comp.lang.fortran
there was considerable interest in this feature.

* The Fortran 2008 standard should be in Stage ~40.99, i.e. an FDIS
(final draft international standard) exists, which now needs to go
through a (last?) round of ISO member balloting before it can be published.

** Cf. http://gcc.gnu.org/ml/fortran/2010-03/msg00201.html for a
starting point for the implementation of multi-image support via MPI.

*** That really happens. Two years back, I was told that "gfortran saved
a Ph. D. thesis"; the vendor compiler had a bug - and until it was
fixed, GCC was used - with the vendor's library and a quite good
performance. (The cascade of creating a test case for the bug,
convincing the HPC centre, which then reported the bug via the vendor
support to the actual compiler developers, and with subsequent fixing,
testing, releasing the new version, and, finally, installing the new
version took many, many months.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]