This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/48419] New: Reduce gfortran stack usage for procedures doing IO


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48419

           Summary: Reduce gfortran stack usage for procedures doing IO
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: fortran
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: jb@gcc.gnu.org


Consider

program iostack
  integer, save :: diff = 0

  print *, "Stack usage when executing print"
  call x(1)
  print *, "Stack usage in equivalent non-IO procedure"
  call y(1)

contains
  recursive subroutine x(i)
    print *, diff - loc(i)
    diff = loc(i)
    if (i < 10) call x(i+1)
    return
  end subroutine x

  recursive subroutine y(i)   
    call printy(diff - loc(i))
    diff = loc(i)
    if (i < 10) call y(i+1)
    return
  end subroutine y

  recursive subroutine printy(i)
    print *, i
  end subroutine printy

end program iostack

On i686 I get

 Stack usage when executing print
  -134515072
  1210538920
         400
         400
         400
         400
         400
         400
         400
         400
 Stack usage in equivalent non-IO procedure
 -1210542120
  1210538904
          32
          32
          32
          32
          32
          32
          32
          32

Which means that for the procedure with an IO call, we need an extra 400-32 =
368 bytes stack space. On 64-bit targets, more. This a) will limit recursion
depth as most systems are by default configured with quite small stack sizes,
and b) perhaps reduce performance as we put a largish amount of sparsely
accessed data right in the cache-hottest memory.

This large stack usage is due to allocating a large struct on the stack prior
to making any IO calls. There are two main reasons for this.

1. Most of the struct is for library private use. Apart from storing the
pointer to the gfc_unit, so that we don't need to lock the unit tree and
traverse it for every transfer_*() call, I see little reason why the rest of
this data cannot be part of the gfc_unit itself (which is stored on the heap),
or in heap memory with only a pointer to it kept. Depending on how wants to
structure it, one could maybe argue that it's useful to retain the pointer to
the transfer function and another pointer to the transfer data if one wants to
keep it separate from gfc_unit. But still, 1-3 pointers vs. the current 16
pointers and 32 ints.

2. Another part of the struct is fields for every possible IO specifier. As
most IO calls use just a few specifiers, most of this data is never accessed
(one field in the struct is a bitmap specifying which of the other fields are
valid).

Case #1 should be relatively easy to fix. For #2 there are a few options. Say,

a) A char array containing all the data. Walk over the flags variable, and for
each set bit, read the appropriate number of bytes from the char array and bump
a pointer to the current position. This is probably the most space efficient,
but might run into alignment issues on some targets? So maybe one needs to
memcpy() the data to some properly aligned data before actually using it.

b) Create a union of all the specifier data structs. Then pass an array of this
union type, with the flags variable specifying the type of each element (that
is, the length of the array is the number of set bits in the flag variable).
This might waste a bit of space compared to the previous approach, but should
ensure that everything is properly aligned. From a brief scan of current
sources, the biggest element is probably for character variables, which is a
pointer to the string and the length, so in no case will the union type be
particularly large.

c) Get rid of the flags variable, and instead pass a variable specifying the
array length, the array elements being a struct of an enum specifying the
element type, and the union type described in the previous approach.

Personally, I think option c) looks the cleanest.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]