This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: Coarray article for the upcoming GCC Summit.
- From: Bill Long <longb at cray dot com>
- To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
- Cc: MOENE Toon <toon dot moene at cnrm dot meteo dot fr>, fortran at gcc dot gnu dot org, j dot k dot reid at rl dot ac dot uk, dannagle at verizon dot net
- Date: Wed, 23 Apr 2008 15:34:04 -0500
- Subject: Re: Coarray article for the upcoming GCC Summit.
- Organization: Cray Inc.
- References: <4809C31C.9080502@cnrm.meteo.fr> <480CC606.8090603@cray.com> <480F279E.9010908@cnrm.meteo.fr> <480F6834.50605@cray.com> <480F9597.1030302@gmail.com>
- Reply-to: longb at cray dot com
Janne Blomqvist wrote:
Bill Long wrote:
MOENE Toon wrote:
Is there some difficulty with coarrays here that I am overlooking ?
The Rice University group has had two problems in this area, though
neither affect our (Cray's) implementation. As background, our
general implementation of coarrays on our vector systems works like
this: Coarrays are placed in a separate "symmetric" heap that starts
at the same base address on each image and contains only coarrays.
Because of the restrictions on allocatable coarrays, it is always
possible to store coarrays such that the base address for a
particular coarray is the same on each image. This allows you to
know the address of a remote coarray reference using only the local
address information for the same coarray. For ordinary and
allocatable coarrays this is pretty straightforward, and Rice seems
to have no problem addressing static coarrays.
The second problem seems to have multiple names, one of which is
"pinning of memory" on the images. Even if you handle the symmetric
heap in some special way, the targets of pointer components and the
actual memory for allocatable components can be anywhere in the local
memory of each node. Some hardware DMA protocols evidently require
that remotely accessed memory has to be "registered" or "pinned"
somehow so the hardware in the network can access it. The Cray
vector implementation gets around this issue by doing two things: 1)
we disable demand paging on any node running a coarray (or UPC)
image, and 2) we (effectively) pin/register all of the physical
memory on the node by using large pages and remote address
translation tables. This results in very good performance, but is
more restrictive than a generic implementation. Considering that
libraries like MPI need to get around this same issue, I assume
gfortran will have some solution available. But, I think it is
important to be aware of it from the start, and think about the best
solution when doing the basic design work.
This could be tricky, if we want something portable, performant and
robust (pick one, ha ha). Here's one article ranting about RDMA that
got quite a lot of press a few years ago:
http://www.hpcwire.com/hpc/815242.html
and the response
http://www.hpcwire.com/hpc/885757.html
I think the only sensible solution here would be to use some
appropriate abstraction layer like gasnet or armci.
Yes, I agree. I believe Rice is using armci, though my current bias is
for gasnet (see below).
Do these also solve the first problem you mention?
The first problem is really internal to the compiler rather than a
network issue. I don't see that as a problem for gfortran as long as
allocatable and pointer components are represented as some sort of dope
vector in the parent structure. Rice's problem is that they had no way
to control the data layout of the structure or the dope vector.
Is Cray planning to help out with coarray gfortran on portals?
I can't commit to anything other than my free commentary. :) However, I
would point out that we have cooperated on the portals implementation of
gasnet (mainly for a UPC customer). So, if gfortran also targeted
gasnet, it should port to the XT systems pretty easily. (We currently
ship UPC for the XT systems using gasnet over portals.) The
implementation for the vector systems is embedded in the compiler code
generator since the hardware memory load and store instructions can
specify addresses on other nodes directly by ORing the image number into
the upper bits of the address. I think it would not be worthwhile for
gfortran to target that architecture, at least for now.
I suppose most gfortran contributors have experience with programming
and using MPI applications, but MPP systems programming is somewhat
outside our experience. So I think any help in this are would be very
welcome.
I'm not really a systems programmer, but more often than not these days,
the nodes on an MPP system are running Linux (possibly stripped down).
So, I suspect you have more experience than you think. Most of the
work comes in "start.c" - the user mode code that does initial setup
before calling the main program. In there you define the local image
number and the number of images, and initialize any data structures
needed for synchronizations or maintaining the symmetric heap. Probably
not all that different from MPI_Initialize(), only it is executed
automatically before the user code starts.
Cheers,
Bill
--
Bill Long longb@cray.com
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120