This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: Coarray article for the upcoming GCC Summit.
- From: Bill Long <longb at cray dot com>
- To: MOENE Toon <toon dot moene at cnrm dot meteo dot fr>
- Cc: fortran at gcc dot gnu dot org, j dot k dot reid at rl dot ac dot uk, dannagle at verizon dot net
- Date: Wed, 23 Apr 2008 11:47:48 -0500
- Subject: Re: Coarray article for the upcoming GCC Summit.
- Organization: Cray Inc.
- References: <4809C31C.9080502@cnrm.meteo.fr> <480CC606.8090603@cray.com> <480F279E.9010908@cnrm.meteo.fr>
- Reply-to: longb at cray dot com
MOENE Toon wrote:
Bill Long wrote:
6) While this might have been omitted because of space limits, the
capability to have allocatable and pointer components of a coarray
structure is a very important feature in the language.
Thanks for all your useful suggestions. I have thought a bit about
this one. As the point of view of my paper is: How difficult is it
to add coarrays to GNU Fortran, I try to shy away from too much detail
that doesn't make the implementation difficult (after all, arrays of
structures with components are already possible - and dealt with by
the GNU Fortran front end).
Is there some difficulty with coarrays here that I am overlooking ?
The Rice University group has had two problems in this area, though
neither affect our (Cray's) implementation. As background, our general
implementation of coarrays on our vector systems works like this:
Coarrays are placed in a separate "symmetric" heap that starts at the
same base address on each image and contains only coarrays. Because of
the restrictions on allocatable coarrays, it is always possible to store
coarrays such that the base address for a particular coarray is the same
on each image. This allows you to know the address of a remote coarray
reference using only the local address information for the same
coarray. For ordinary and allocatable coarrays this is pretty
straightforward, and Rice seems to have no problem addressing static
coarrays.
In the case of allocatable components of a coarray structure, the
entity with the symmetric (i.e. same across images) address is the dope
vector that represents the allocatable array in the structure. The
actual memory for the allocated object is in the local, ordinary heap.
This allows each image to have different sized components, which is a
valuable feature. You do two lookups to get the address of a remote
object - one to get the (or part of the) dope vector using the usual
symmetric addressing scheme for coarrays. Then you look in the dope
vector for the actual remote address of the allocatable component data.
Paying the network latency twice to get the address seems like a
problem, except that most of the time the ultimate object is an array
and you pay the double latency only once for all the elements. For the
others, you can reuse the address information already fetched for the
first element. The Rice group has a fundamental design problem here
because they translate to Fortran 95 plus library calls, and then target
any f95 compiler. As a result, they have no control over the format of
the dope vector, or how allocatable components are represented
internally. I assume the gfortran environment is more controlled and
you will not have any problems with this issue. At least for our
implementation, the case of pointer components is essentially the same.
The pointer can be associated with any local memory, and is accessed
indirectly through the component dope vector.
The second problem seems to have multiple names, one of which is
"pinning of memory" on the images. Even if you handle the symmetric heap
in some special way, the targets of pointer components and the actual
memory for allocatable components can be anywhere in the local memory of
each node. Some hardware DMA protocols evidently require that remotely
accessed memory has to be "registered" or "pinned" somehow so the
hardware in the network can access it. The Cray vector implementation
gets around this issue by doing two things: 1) we disable demand paging
on any node running a coarray (or UPC) image, and 2) we (effectively)
pin/register all of the physical memory on the node by using large pages
and remote address translation tables. This results in very good
performance, but is more restrictive than a generic implementation.
Considering that libraries like MPI need to get around this same issue,
I assume gfortran will have some solution available. But, I think it is
important to be aware of it from the start, and think about the best
solution when doing the basic design work.
Cheers,
Bill
You can use this e-mail address to reply: toon.moene@cnrm.meteo.fr
until the end of the week (Friday, 25th of April).
After that date, please use toon@moene.indiv.nluug.nl
Thanks in advance,
Toon Moene.
--
Bill Long longb@cray.com
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120