This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug fortran/48419] New: Reduce gfortran stack usage for procedures doing IO
- From: "jb at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 2 Apr 2011 21:08:27 +0000
- Subject: [Bug fortran/48419] New: Reduce gfortran stack usage for procedures doing IO
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48419
Summary: Reduce gfortran stack usage for procedures doing IO
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: fortran
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: jb@gcc.gnu.org
Consider
program iostack
integer, save :: diff = 0
print *, "Stack usage when executing print"
call x(1)
print *, "Stack usage in equivalent non-IO procedure"
call y(1)
contains
recursive subroutine x(i)
print *, diff - loc(i)
diff = loc(i)
if (i < 10) call x(i+1)
return
end subroutine x
recursive subroutine y(i)
call printy(diff - loc(i))
diff = loc(i)
if (i < 10) call y(i+1)
return
end subroutine y
recursive subroutine printy(i)
print *, i
end subroutine printy
end program iostack
On i686 I get
Stack usage when executing print
-134515072
1210538920
400
400
400
400
400
400
400
400
Stack usage in equivalent non-IO procedure
-1210542120
1210538904
32
32
32
32
32
32
32
32
Which means that for the procedure with an IO call, we need an extra 400-32 =
368 bytes stack space. On 64-bit targets, more. This a) will limit recursion
depth as most systems are by default configured with quite small stack sizes,
and b) perhaps reduce performance as we put a largish amount of sparsely
accessed data right in the cache-hottest memory.
This large stack usage is due to allocating a large struct on the stack prior
to making any IO calls. There are two main reasons for this.
1. Most of the struct is for library private use. Apart from storing the
pointer to the gfc_unit, so that we don't need to lock the unit tree and
traverse it for every transfer_*() call, I see little reason why the rest of
this data cannot be part of the gfc_unit itself (which is stored on the heap),
or in heap memory with only a pointer to it kept. Depending on how wants to
structure it, one could maybe argue that it's useful to retain the pointer to
the transfer function and another pointer to the transfer data if one wants to
keep it separate from gfc_unit. But still, 1-3 pointers vs. the current 16
pointers and 32 ints.
2. Another part of the struct is fields for every possible IO specifier. As
most IO calls use just a few specifiers, most of this data is never accessed
(one field in the struct is a bitmap specifying which of the other fields are
valid).
Case #1 should be relatively easy to fix. For #2 there are a few options. Say,
a) A char array containing all the data. Walk over the flags variable, and for
each set bit, read the appropriate number of bytes from the char array and bump
a pointer to the current position. This is probably the most space efficient,
but might run into alignment issues on some targets? So maybe one needs to
memcpy() the data to some properly aligned data before actually using it.
b) Create a union of all the specifier data structs. Then pass an array of this
union type, with the flags variable specifying the type of each element (that
is, the length of the array is the number of set bits in the flag variable).
This might waste a bit of space compared to the previous approach, but should
ensure that everything is properly aligned. From a brief scan of current
sources, the biggest element is probably for character variables, which is a
pointer to the string and the length, so in no case will the union type be
particularly large.
c) Get rid of the flags variable, and instead pass a variable specifying the
array length, the array elements being a struct of an enum specifying the
element type, and the union type described in the previous approach.
Personally, I think option c) looks the cleanest.