Bug 41911 - Unnecessary copying of array descriptors
Summary: Unnecessary copying of array descriptors
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Reported: 2009-11-02 18:49 UTC by Tobias Burnus
Modified: 2009-11-02 23:32 UTC (History)
1 user (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2009-11-02 19:29:47

Example patch for trans-array.c; needs checking. Missing: Patch for trans-decl.c (865 bytes, patch)
2009-11-02 22:00 UTC, Tobias Burnus
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2009-11-02 18:49:15 UTC
Follow up to PR 41907.

module m
  subroutine one(a)
    real, dimension(:,:), intent(inout) :: a
    call two(a)
  end subroutine one
  subroutine two(a)
    real, dimension(:,:), intent(inout) :: a
  end subroutine two
end module

As both "one" and "two" have a assumed-shape argument (deferred shape would be the same), there is absolutely no reason to copy the array descriptor. But gfortran does so:

  a.0 = (real(kind=4)[0:D.1405] * restrict) a->data;
  parm.6.data = (void *) &(*a.0)[0];
  two (&parm.6);

For assume-size/explicit-size arrays, gfortran does do it correctly. Also for deferred-size arrays. It also works with allocatable components, e.g
  call two(t%a)
where "a" is "allocatable, dimension(:)".
Comment 1 Tobias Burnus 2009-11-02 21:23:28 UTC
Actually, it is not as simple as it seems. Currently, one gets for

  subroutine one(a)
    integer, dimension(:) :: a
    call two(a)
  end subroutine one
  subroutine two(a)
    integer, dimension(:) :: a
  end subroutine two

the following dump:

one (struct array1_integer(kind=4) & restrict a)
  integer(kind=8) ubound.0;
  integer(kind=8) stride.1;
  integer(kind=8) offset.2;
  integer(kind=8) size.3;
  integer(kind=4)[0:D.1401] * restrict a.0;
  integer(kind=8) D.1401;
  bit_size_type D.1402;
  <unnamed-unsigned:64> D.1403;

    integer(kind=8) D.1400;

    D.1400 = a->dim[0].stride;
    stride.1 = D.1400 != 0 ? D.1400 : 1;
    a.0 = (integer(kind=4)[0:D.1401] * restrict) a->data;
    ubound.0 = (a->dim[0].ubound - a->dim[0].lbound) + 1;
    size.3 = stride.1 * NON_LVALUE_EXPR <ubound.0>;
    offset.2 = -stride.1;
    D.1401 = size.3 + -1;
    D.1402 = (bit_size_type) size.3 * 32;
    D.1403 = (<unnamed-unsigned:64>) size.3 * 4;
    integer(kind=8) D.1399;
    integer(kind=8) D.1398;
    struct array1_integer(kind=4) parm.4;
    integer(kind=8) D.1394;

    D.1394 = ubound.0;
    parm.4.dtype = 265;
    D.1398 = offset.2;
    D.1399 = stride.1;
    parm.4.dim[0].lbound = 1;
    parm.4.dim[0].ubound = D.1394;
    parm.4.dim[0].stride = NON_LVALUE_EXPR <D.1399>;
    parm.4.data = (void *) &(*a.0)[0];
    parm.4.offset = NON_LVALUE_EXPR <D.1398>;
    two (&parm.4);

In principle. passing "two(&a)" would be sufficient. However:

The "a -> a.0" conversion happens at the very beginning of the procedure (trans-decl.c I presume). Passing "a.0" (== parmse.expr in gfc_conv_procedure_call) is wrong as "a.0" is not a descriptor but just a pointer to the (noncontiguous) array.

Passing "a" as in "two (&a)" is presumably also wrong as this presumably collides with "restricted".

 * * *

One way out would be -- assuming ABI breakage -- to do the same conversion for the caller but then do not do conversion in the callee. Thus:

one (struct array1_integer(kind=4) & restrict a)
   a.data[0] = 5
rather than
   /* generate a whole stuff of plain variables used like a descriptor */
   a.0[0] = 5
   two (&a)

That way one uses the restricted variable directly, and one saves the generation of a extra variables and arithmetic.

The question is whether one needs to take care of things like:
    stride.1 = D.1400 != a->dim[0].stride ? D.1400 : 1;

Actually, it might be no ABI breakage - assuming all descriptor variables are properly initialized in current 4.(3-5). One needs to check this.
Comment 2 Tobias Burnus 2009-11-02 21:37:02 UTC
There is one potential issue with the bounds. In the callee the lower bounds start at 1, unless something special is specified. However, one could think of using an array descriptor for trans-decl.c which replaces the collection of variables. Then one can simply pass the descriptor on:
  (descriptor) a -> (descriptor) a.0
  call two (&a.0)
instead of
  (descriptor) a.data -> a.0; a.* -> *.0
  *.0 -> parm.*
  call two(&parm)
Comment 3 Tobias Burnus 2009-11-02 22:00:14 UTC
Created attachment 18951 [details]
Example patch for trans-array.c; needs checking. Missing: Patch for trans-decl.c

Proposed solution:

Create a descriptor right from the beginning in trans-decl - then there is no need to do a conversion later on for the procedure call.

Something like the attached trans-array.c is then needed to make sure that one does not create a temporary. However, one somehow should distinguish between having a simple data pointer (like the current "a.0") versus having a pointer to a array descriptor. -- The issue is that one gfc_symbol, those seem to be indistinguishable -- on TREE level they are clearly different.
Comment 4 Tobias Burnus 2009-11-02 23:32:54 UTC
When implemented, updated optional check (cf. PR 41907) and check why
  rank > 0 && fsym == NULL && sym->attr.intrinsic
does not work. (Or in other words, when is an array descriptor passed when "fsym == NULL".)