Bug 31009 - Generate conditional code to assign arrays, depending on their stride
Summary: Generate conditional code to assign arrays, depending on their stride
Status: WAITING
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-03-01 14:38 UTC by Daniel Franke
Modified: 2019-01-20 11:30 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-08-12 10:23:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Franke 2007-03-01 14:38:38 UTC
In January, there were two patches from Roger Sayle [1,2] which were quite an improvement for me. I'd like to suggest to do the same for derived type components. Example:

TYPE :: summed_amplitude
  COMPLEX, DIMENSION(:,:), POINTER :: alm
END TYPE

SUBROUTINE summed_amplitude_init_copy(this, other)
  TYPE(summed_amplitude), INTENT(out) :: this
  TYPE(summed_amplitude), INTENT(in)  :: other
  ALLOCATE(this%alm(size(other%alm,1), size(other%alm,2)))
  this%alm = other%alm
END SUBROUTINE

Here, gfortran copies the arrays element-wise. In my code, this accounts for about 20% of the runtime (as shown by gprof).

[1] http://gcc.gnu.org/ml/fortran/2007-01/msg00113.html (implement a(:) = b(:) using memcpy when possible)
[2] http://gcc.gnu.org/ml/fortran/2007-01/msg00419.html (Use memcpy for array constructor assignments)
Comment 1 Tobias Burnus 2007-03-01 16:33:13 UTC
> I'd like to suggest to do the same for derived type components.

The point is not components or not, the point is: Known size at compile time or not. (A different thing are arrays of derived types.)
The same tree without memcopy is produced for e.g.

SUBROUTINE summed_amplitude_init_copy(this, other)
  COMPLEX, DIMENSION(:,:), INTENT(out) :: this
  COMPLEX, DIMENSION(:,:), INTENT(in)  :: other
  this = other
END SUBROUTINE

And if one replaces (:,:) by e.g. (5,5), __builtin_memcpy is used.

As the bounds are not constant, __builtin_memcpy cannot be used for (:,:) and __memcpy has to be used.

The complication is that the memory might be not contiguous, example:

 real :: r(5,5)
 interface
  subroutine foo(a)
    real :: a(:,:)
  end subroutine foo
 end interface
 call foo(r(1:3,1:3))
 call foo(r(1:5:2,1:5:3))
end

(If no interface for "foo" is given, gfortran creates via _gfortran_internal_pack a temporary array which is then contiguous.)

Thus you want to generate (schematically):
  if(contiguous)
     __memcopy
  else
     for(i = ...) {this[i] = that[i]}

For small arrays and for noncontiguous arrays, the contiguous check even slows down the assignment, whereas I would expect a gain for big, contiguous arrays.

An implementation of the contiguous test can be found in libgfortran/*/in_pack*, which is quite complicated and not necessarily fast for small, noncontiguous arrays.
Maybe implementing it for one dimensional arrays is worthwhile as one has only two simple extra if statments if the array is non contiguos.

  if (stride * (ubound-lbound) <= 0)
    return;
  if(stride == 1)
     memcpy(this[lbound],that[lbound], ubound-lbound);
  else
    for(i = lbound; i <= ubound; i++) that[i] = this[i]

For two-dimensional arrays, it is already more complicated, but it might still be worthwhile.
Comment 2 Daniel Franke 2007-03-01 16:58:11 UTC
Tobias, I wouldn't expect gfortran to use memcpy if the array is not continuous, as in your example. 

OTOH, my naive assumption is, that given "this = other", "this(:) = other(:)" or even "this(a:b) = other(c:d)", it should, in general, be possible to handle with memcopy, or memmove if this==other and a:b, c:d intersect, as long as the array shapes are compatible. 

Since the finer details of fortran still elude me, is it possible at all that in a statement as "this = other" were both shall be arrays of compatible shape, either stride may not equal '1'?

> if(stride == 1)
> else
>   for(i = lbound; i <= ubound; i++) that[i] = this[i]
for(i = lbound; i <= ubound; i += stride) that[i] = this[i], probably?
Comment 3 Thomas Koenig 2007-03-01 19:41:51 UTC
(In reply to comment #2)

> Since the finer details of fortran still elude me, is it possible at all that
> in a statement as "this = other" were both shall be arrays of compatible shape,
> either stride may not equal '1'?

Yes.

The following is legal:

program main
  real, dimension(4) :: a
  a = (/ 1., 2., 3., 4. /)
  call foo(a(1:3:2), a(2:4:2))
  print *,a
contains
  subroutine foo(x,y)
    real, dimension(:), intent(in) :: x
    real, dimension(:), intent(out) :: y
    y = x
  end subroutine foo
end program main
Comment 4 Daniel Franke 2007-03-02 09:57:46 UTC
Tobias, do the cases given in PR31016 include the one above? 
If yes, this PR could be closed as dupe?!
Comment 5 Tobias Burnus 2007-03-02 10:43:20 UTC
> Tobias, do the cases given in PR31016 include the one above? 
> If yes, this PR could be closed as dupe?!

Actually not. PR 31016 (and related PR 31014) are about cases where one actually knows that the memory is contiguous (with the size known either at compile time or only at run time). There using memcpy or memset should always be a win (except, maybe, for a one-element array).

This PR is about cases were the memory might not be contiguous; thus one needs create code for both the contiguous and non-contiguous case and a check whether either case is present. I believe this needs some thinking (and testing) to make sure that the size of the generated code does not increase too much (at least not for -Os) and that it is an overall gain without loosing to much speed for real-world code with on non-contiguous arrays (strides).
Comment 6 Dominique d'Humieres 2018-03-04 22:06:29 UTC
Eleven years later does it make sense to keep this PR opened? Should not the CONTIGUOUS attribute do the trick now?
Comment 7 Jürgen Reuter 2019-01-20 11:30:58 UTC
Seems also to me that this should be reconsidered whether there is still need for optimization for the case of arrays not declared as contiguous.