This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)


Dear all,

I have question how one can and should tell the middle end about asynchonous/single-sided memory access; the goal is to produce fast but race-free code. All the following is about Fortran 2003 (asynchronous) and Fortran 2008 (coarrays), but the problem itself should occur with all(?) supported languages. It definitely occurs when using C with MPI or using C with the asynchronous I/O functions, though I do not know in how far one currently relys on luck.

There are two issues, which need to be solved by informing the middle end - either by variable attributes or by inserted function calls:
- Prohibiting some code movements
- Making assumptions about the memory content


a) ASYNCHRONOUS attribute and asynchronous I/O

Fortran allows asynchronous I/O, which means for the programmer that between initiating the asynchronous reading/writing and the finishing read/write, the variable may not be accessed (for READ) or not be changed (for WRITE). The compiler needs to make sure that it does not move code such that this constraint is violated. All variables involved in asynchronous operations are marked as ASYNCHRONOUS.

Thus, for asynchronous operations, code movements involving opaque function calls should not happen - but contrary to VOLATILE, there is no need to take the value all time from the memory if it is still in the register.

Example:

integer, ASYNCHRONOUS :: async_int

   WRITE (unit, ASYNCHRONOUS='yes') async_int
   ! ...
   WAIT (unit)
   a = async_int
   do i = 1, 10
     b(i) = async_int + 1
   end do

Here, "a = async_int" may not be moved before the WAIT line. However, contrary to VOLATILE, one can move the "async_int + 1" before the loop and use the value from the registry in the loop. Note additionally that the initiation of an asynchronous operation (WRITE statement above) is known at compile time; however, it is not known when it ends - the
WAIT can be in a different translation unit. See also PR 25829.


The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS attribute itself; it simply states that it is for asynchronous I/O. (However, it describes then how async I/O works,
including WAIT, INQUIRE, and what a programmer may do until the async I/O is finished.) The closed to an ASYNCHRONOUS definition is the non-normative note 5.4 of Fortran 2008:


"The ASYNCHRONOUS attribute specifies the variables that might be associated with a pending input/output storage sequence (the actual memory locations on which asynchronous input/output is being performed) while the scoping unit is in execution. This information could be used by the compiler to disable certain code motion optimizations."

Seemingly intended, but not that clear in the F2003/F2008 standard, is to allow for asynchronous user operations; this will presumbly refined in TR 29113 which is currently being drafted - and/or in an interpretation request. The main requestee for this feature is the MPI Forum, which works on MPI3. In any case the following should work analogously and "buf" should not be moved before the "MPI_Wait" line:

  CALL MPI_Irecv(buf, rq)
  CALL MPI_Wait(rq)
  xnew=buf

Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have the ASYNCHRONOUS attribute.

My question is now: How to properly tell this the middle end?

VOLATILE seems to be wrong as it prevents way too many optimizations and I think it does not completely prevent code moving. Using a call to some built-in function does not work as in principle the end of an asynchronous operation is not known. It could end with a WAIT - possibly also wrapped in a function, which is in a different translation unit - or also with an INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets assigned a .false.

(Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS; I think might be implemented by preventing all code movements which involve swapping an ASYNCHRONOUS variable with a function call, which is not pure. Otherwise, in terms of the variable value, it acts like a normal variable, i.e. if one does: "a = 7" and does not set "a" afterwards (assignment or via function calls), it remains 7. The changing of the variable is explicit - even if it only becomes effective with some delay.)


B) COARRAYS


The memory model of coarrays is that all memory is private to the image - except for coarrays. Coarrays exists on all images. For "integer :: coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..." while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]", where the data is set on image 7 or pulled from image 2.

Let's start directly with an example:

   module m
     integer, save :: caf_int[*]  ! Global variable
   end module m

   subroutine foo()
     use m
     caf_int = 7  ! Set local variable to 7 (effectively: on image 1 only)
     SYNC ALL ! Memory barrier/fence
     SYNC ALL
     ! caf_int should now be 8, cf. below; thus the following if shall
     ! neither optimized way not be executed at run time.
     if (caf_int == 7) call abort()
   end subroutine foo

   subroutine bar()
     use m
     SYNC ALL
     caf_int[1] = 8 ! Set variable on image 1 to 8
     SYNC ALL
   end subroutine bar

   program caf_example
     if (this_image() == 1) CALL foo()
     if (this_image() == 2) CALL bar()
   end program caf_example

Notes:
- The coarray "caf_int" will be registered in the communication library at startup of the main program.
- For image 1 one always accesses "caf_int" in local memory. The variable also does not alias with anything - except that the value might change via single-sided communication.


Thus: SYNC ALL acts as memory fence - for coarrays only. In principle, all other variables might be moved across the fence. Besides preventing code moves, the value of the variable cannot be assumed to be the same as before the fence. I think a simple call to "__sync_synchronize()" (alias BUILT_IN_SYNCHRONIZE) should take care of this, but I want to
confirm that it indeed does so. I assume I can still keep all coarrays as restricted pointers - even though there is single-sided communication.


Q1: Is __sync_synchronize() sufficient?
Q2: Can this be optimized in some way?

Tobias


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]