[Bug fortran/92698] Unnecessary copy in overlapping array assignment

tkoenig at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Nov 29 20:35:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92698

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tkoenig at gcc dot gnu.org

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to mjr19 from comment #0)
> subroutine cpy(a,src,dest,len)
>   integer, intent(in) :: src,dest,len
>   real(kind(1d0)), intent(inout) :: a(:)
> 
>   a(dest:dest+len-1)=a(src:src+len-1)
> 
> end subroutine cpy
> 
> 
> seems to compile to malloc tmp array, inline copy to tmp, inline copy from
> tmp, free tmp in gfortran 7.4 and 8.3. Gfortran 9.2 modifies this by
> replacing the inline copies with memcpy at -O3.
> 
> Fortran permits the source and destination to overlap, so a single call to
> memcpy would be wrong.

It would also be wrong for another reason: a is not known to be contiguous
at compile-time.  The subroutine has to account for the fact that the
caller could pass a non-contiguous array slice, for example via

call cpy (a(1:10:2),1,2,2)

If the test case said

subroutine cpy(a,src,dest,len)
  integer, intent(in) :: src,dest,len
  real(kind(1d0)), intent(inout), contiguous :: a(:)

or

subroutine cpy(a,src,dest,len,n)
  integer, intent(in) :: src,dest,len,n
  real(kind(1d0)), intent(inout), contiguous :: a(n)

then putting in a memmove could indeed help, and the caller has to repack
the array on call, and unpack on return (which would defeat the purpose
of the optimization).

However, I am not convinced that this is something worth pursuing.
Instead of calling a subroutine, the user might as well write an
assignment statement directly into the rogram.  This also has the
advantage that, if the relationship between src and dest is known,
for example via

   a(n:n+len-1) = a(n+1:n+len)

the compiler will actually optimize this into a memmove (provided
it knows the array is contiguous).


More information about the Gcc-bugs mailing list