This is the mail archive of the
mailing list for the GNU Fortran project.
Re: RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)
- From: Janne Blomqvist <blomqvist dot janne at gmail dot com>
- To: Tobias Burnus <burnus at net-b dot de>
- Cc: GCC Mailing List <gcc at gcc dot gnu dot org>, gfortran <fortran at gcc dot gnu dot org>
- Date: Fri, 15 Apr 2011 12:52:16 +0300
- Subject: Re: RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)
- References: <4DA809AC.firstname.lastname@example.org>
On Fri, Apr 15, 2011 at 12:02, Tobias Burnus <email@example.com> wrote:
> Dear all,
> I have question how one can and should tell the middle end about
> asynchonous/single-sided memory access; the goal is to produce fast but
> race-free code. All the following is about Fortran 2003 (asynchronous) and
> Fortran 2008 (coarrays), but the problem itself should occur with all(?)
> supported languages. It definitely occurs when using C with MPI or using C
> with the asynchronous I/O functions, though I do not know in how far one
> currently relys on luck.
> There are two issues, which need to be solved by informing the middle end -
> either by variable attributes or by inserted function calls:
> - Prohibiting some code movements
> - Making assumptions about the memory content
> a) ASYNCHRONOUS attribute and asynchronous I/O
> Fortran allows asynchronous I/O, which means for the programmer that between
> initiating the asynchronous reading/writing and the finishing read/write,
> the variable may not be accessed (for READ) or not be changed (for WRITE).
> The compiler needs to make sure that it does not move code such that this
> constraint is violated. All variables involved in asynchronous operations
> are marked as ASYNCHRONOUS.
> Thus, for asynchronous operations, code movements involving opaque function
> calls should not happen - but contrary to VOLATILE, there is no need to take
> the value all time from the memory if it is still in the register.
> Â integer, ASYNCHRONOUS :: async_int
> Â WRITE (unit, ASYNCHRONOUS='yes') async_int
> Â ! ...
> Â WAIT (unit)
> Â a = async_int
> Â do i = 1, 10
> Â Â b(i) = async_int + 1
> Â end do
> Here, "a = async_int" may not be moved before the WAIT line. However,
> contrary to VOLATILE, Âone can move the "async_int + 1" before the loop and
> use the value from the registry in the loop. Note additionally that the
> initiation of an asynchronous operation (WRITE statement above) is known at
> compile time; however, it is not known when it ends - the
> WAIT can be in a different translation unit. See also PR 25829.
> The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS
> attribute itself; it simply states that it is for asynchronous I/O.
> (However, it describes then how async I/O works,
> including WAIT, INQUIRE, and what a programmer may do until the async I/O is
> finished.) The closed to an ASYNCHRONOUS definition is the non-normative
> note 5.4 of Fortran 2008:
> "The ASYNCHRONOUS attribute specifies the variables that might be associated
> with a pending input/output storage sequence (the actual memory locations on
> which asynchronous input/output is being performed) while the scoping unit
> is in execution. This information could be used by the compiler to disable
> certain code motion optimizations."
> Seemingly intended, but not that clear in the F2003/F2008 standard, is to
> allow for asynchronous user operations; this will presumbly refined in TR
> 29113 which is currently being drafted - and/or in an interpretation
> request. The main requestee for this feature is the MPI Forum, which works
> on MPI3. In any case the following should work analogously and "buf" should
> not be moved before the "MPI_Wait" line:
> ÂCALL MPI_Irecv(buf, rq)
> ÂCALL MPI_Wait(rq)
> Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have the
> ASYNCHRONOUS attribute.
> My question is now: How to properly tell this the middle end?
> VOLATILE seems to be wrong as it prevents way too many optimizations and I
> think it does not completely prevent code moving. Using a call to some
> built-in function does not work as in principle the end of an asynchronous
> operation is not known. It could end with a WAIT - possibly also wrapped in
> a function, which is in a different translation unit - or also with an
> INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets assigned a .false.
> (Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS; I
> think might be implemented by preventing all code movements which involve
> swapping an ASYNCHRONOUS variable with a function call, which is not pure.
> Otherwise, in terms of the variable value, it acts like a normal variable,
> i.e. if one does: "a = 7" and does not set "a" afterwards (assignment or via
> function calls), it remains 7. The changing of the variable is explicit -
> even if it only becomes effective with some delay.)
A compiler memory barrier in GCC is
asm volatile("" ::: "memory");
That being said, does the middle end really move loads and stores past
function calls? If not, a call is effectively also a compiler memory
> B) COARRAYS
> The memory model of coarrays is that all memory is private to the image -
> except for coarrays. Coarrays exists on all images. For "integer ::
> coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..."
> while remote accesses are "coarray(:) = ..." or "a = coarray(3)",
> where the data is set on image 7 or pulled from image 2.
> Let's start directly with an example:
> Â module m
> Â Â integer, save :: caf_int[*] Â! Global variable
> Â end module m
> Â subroutine foo()
> Â Â use m
> Â Â caf_int = 7 Â! Set local variable to 7 (effectively: on image 1 only)
> Â Â SYNC ALL ! Memory barrier/fence
> Â Â SYNC ALL
> Â Â ! caf_int should now be 8, cf. below; thus the following if shall
> Â Â ! neither optimized way not be executed at run time.
> Â Â if (caf_int == 7) call abort()
> Â end subroutine foo
> Â subroutine bar()
> Â Â use m
> Â Â SYNC ALL
> Â Â caf_int = 8 ! Set variable on image 1 to 8
> Â Â SYNC ALL
> Â end subroutine bar
> Â program caf_example
> Â Â if (this_image() == 1) CALL foo()
> Â Â if (this_image() == 2) CALL bar()
> Â end program caf_example
> - The coarray "caf_int" will be registered in the communication library at
> startup of the main program.
> - For image 1 one always accesses "caf_int" in local memory. The variable
> also does not alias with anything - except that the value might change via
> single-sided communication.
> Thus: SYNC ALL acts as memory fence - for coarrays only. In principle, all
> other variables might be moved across the fence. Besides preventing code
> moves, the value of the variable cannot be assumed to be the same as before
> the fence. I think a simple call to "__sync_synchronize()" (alias
> BUILT_IN_SYNCHRONIZE) should take care of this, but I want to
> confirm that it indeed does so. I assume I can still keep all coarrays as
> restricted pointers - even though there is single-sided communication.
> Q1: Is __sync_synchronize() sufficient?
I don't think this is correct. __sync_synchronize() just issues a
hardware memory fence instruction. That is, it prevents loads and
stores from moving past the fence *on the processor that executes the
fence instruction*. There is no synchronization with other
processors. SYNC ALL, OTOH is a full barrier; the implementation must
be something like a counter that every co-image increments atomically,
and then waits (blocking or spinning) until the counter equals the
number of co-images.
> Q2: Can this be optimized in some way?
Probably not. For general issues with the shared-memory model, perhaps
shared memory Co-arrays can piggyback on the work being done for the
C++0x memory model, see
Hans Boehm maintains a collection of links to various papers on the topic at