Using today's trunk with http://gcc.gnu.org/ml/fortran/2010-08/msg00003.html applied. The following example were extracted from FLEUR (www.flapw.de), where -Warray-temporaries showed that a temporary has been generated. !-- ONE ---------------------- subroutine t1(n1,n2, gfft, ufft) implicit none integer :: n1, n2, i real :: gfft(n1,n2), ufft(n2) DO i=1, n1 gfft(i,:)=gfft(i,:)*ufft(i) ! No temporary needed END DO end subroutine !-- TWO ---------------------- we(1:noccbd) = we(n_start:n_end) If "1" is the (known) lower bound, no temporary is needed (if one copies from the lower to the upper bound). Using memmove also no temporary is needed (as it allows for this) - using memcpy this is not possible (memory needs to be disjunct). For this case of having on the RHS only a variable, one should probably do a middle-end assignment with ARRAY_RANGE_REF. Cf. PR 40598. !-- THREE ---------------------- wannint(pw1,pw2,pw3,:)=wannint(pw1,pw2,pw3,:)+wannz2(j,:) gets a temporary generated (no pointer - and no target with argument aliasing involved.)
! --- FOUR ---- (also occurs for Octopus) subroutine t2(b,c) implicit none REAL :: b(3,3),c(3) c = matmul(b, c) end subroutine That's a pattern I see quite often: LHS variable on the RHS as second (or first) argument to MATMUL. However, no temporary is needed - but one probably needs to handle this carefully and it depends how one generates the MATMUL internally. While most implementation ways should work, one probably can program it such that avoiding a temporary will lead to wrong results ... Additionally, one should confirm that it works with-fexternal-blas As a variant: The same but with one or two items TRANSPOSEd - or CONJugated. ! --- FIVE --- (Found in octopus, www.tddft.org [GPL]) if(any(shape(a).ne.shape(b))) then That generates two temporaries - one for each SHAPE.
(In reply to comment #1) > ! --- FOUR ---- (also occurs for Octopus) > c = matmul(b, c) As pointed out by Dominique, one needs to be careful. I think one can optimize: c = matmul(b, transpose(c)) c = matmul(c,b) where c is a rank-2 matrix. I think one cannot optimize it if c is just a vector. (Add "conj" as you wish.) And check whether I have not confused columns and rows Side remark: sum(array*scalar) = sum(array)*scalar
Confirmed for the test cases in comment #0. You were right that my patch at http://gcc.gnu.org/ml/fortran/2010-08/msg00003.html doesn't fix this. I'll have a look...
This piece of code /* If no intention of reversing or reversing is explicitly inhibited, convert backward dependence to overlap. */ if ((reverse == NULL && this_dep == GFC_DEP_BACKWARD) || (reverse && reverse[n] == GFC_CANNOT_REVERSE)) this_dep = GFC_DEP_OVERLAP; looks more fishy. It sets this_dep to GFC_DEP_OVERLAP even if we don't want to reverse at all. Shouldn't this all be conditional on if (this_dep == GFC_DEP_BACKWARD) Paul, do you have any ideas?
> As pointed out by Dominique, one needs to be careful. I think one can optimize: > > c = matmul(b, transpose(c)) > c = matmul(c,b) > where c is a rank-2 matrix. I think one cannot optimize it if c is just a > vector. (Add "conj" as you wish.) And check whether I have not confused columns > and rows So far I have never seen a product matrix*matrix done "in place" and I seriously doubt it can be done. The best I can see is to have a temporary of rank 1 only instead of the full matrix. In addition roughly speaking the matrix product is O(n**3) while the cost of the temporary is O(n**2), probably negligible for large matrices (for small matrices calling the present matmul is much slower than any other possible algorithm). Note also that the polyhedron test nf.f90 is compiled with unnecessary temporaries: [macbook] lin/test% gfc -Warray-temporaries nf.f90 nf.f90:293.30: if ( i>i1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:280.21: g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:261.29: if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) 1 Warning: Creating array temporary at (1) nf.f90:248.20: g(i:i+nx-1) = g(i:i+nx-1) - au2(i-nx:i-1)*g(i-nx:i-1) 1 Warning: Creating array temporary at (1) i.e., disjoint sections are not spotted.
(In reply to comment #5) > > As pointed out by Dominique, one needs to be careful. I think one can optimize: > > c = matmul(b, transpose(c)) > > c = matmul(c,b) > > where c is a rank-2 matrix. I think one cannot optimize it if c is just a > > vector. (Add "conj" as you wish.) And check whether I have not confused columns > > and rows > > So far I have never seen a product matrix*matrix done "in place" and I > seriously doubt it can be done. > The best I can see is to have a temporary of rank 1 I fail to see why a scalar temporary is not enough: do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = a(i,k)*b(k,j) end do a(i,j) = tmp end do end do Note: it won't work with a scalar if one reverses the i and the j loop. [It would work with a rank-1 "B" argument, but then "A = matmul(B,A)" won't work as the result is a rank-1 array ...] > Note also that the polyhedron test nf.f90 is compiled with unnecessary > temporaries: My feeling is that ONE, THREE (of comment 0), FIVE (of comment 1), and your example (comment 5) point all to the same dependency-analysis problem.
> I fail to see why a scalar temporary is not enough: > > do j = 1, N > do i = 1, N > tmp = 0.0 > do k = 1, N > tmp = a(i,k)*b(k,j) > end do > a(i,j) = tmp > end do > end do > > Note: it won't work with a scalar if one reverses the i and the j loop. Well, try: real :: a(3,3), b(3,3), c(3,3), tmp integer :: i, j, k, N=3 a = reshape([(i,i=1,9)],[3,3]) b = reshape([(10-i,i=1,9)],[3,3]) c = matmul(a, b) do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = tmp + a(i,k)*b(k,j) end do a(i,j) = tmp end do end do print *, a print *, c if(any(a/=c)) call abort() print *, 'OK' end The problem is that in order to compute a(1,2) you need a(1,1)*b(1,2), so you cannot have already written a(1,1). Note that the effective way to compute the matrix product on modern CPUs is to exchange the i and k loops and use a rank 1 tmp: real :: a(3,3), b(3,3), c(3,3), tmp(3) integer :: i, j, k, N=3 a = reshape([(i,i=1,9)],[3,3]) b = reshape([(10-i,i=1,9)],[3,3]) c = matmul(a, b) do j = 1, N tmp = 0.0 do k = 1, N do i = 1, N tmp(i) = tmp(i) + a(i,k)*b(k,j) end do end do b(:,j) = tmp end do if(any(b/=c)) call abort() print *, 'OK' end and this work with a rank 1 temporary. If one want to enter this kind of reduced temporaries, another candidate is a=transpose(a) which requires a scalar temporary only.
(In reply to comment #7) > > I fail to see why a scalar temporary is not enough: > Well, try: I concede defeat. > If one want to enter this kind of reduced temporaries, another candidate is > a=transpose(a) which requires a scalar temporary only. Well, with an array descriptor it one only changes the strides/bounds ...
> Well, with an array descriptor it one only changes the strides/bounds ... From polyhedron test induct.f90 [macbook] lin/test% gfc -Warray-temporaries induct.f90 induct.f90:1629.20: rotate_quad = transpose(rotate_quad) 1 Warning: Creating array temporary at (1) induct.f90:1741.20: rotate_quad = transpose(rotate_quad) 1 Warning: Creating array temporary at (1) induct.f90:2016.20: rotate_quad = transpose(rotate_quad) 1 Warning: Creating array temporary at (1) induct.f90:2159.20: rotate_quad = transpose(rotate_quad) 1 Warning: Creating array temporary at (1) induct.f90:2509.24: rotate1 = transpose(rotate1) 1 Warning: Creating array temporary at (1) induct.f90:2560.24: rotate2 = transpose(rotate2) 1 Warning: Creating array temporary at (1) induct.f90:2702.24: rotate2 = transpose(rotate2) 1 Warning: Creating array temporary at (1) induct.f90:2845.24: rotate1 = transpose(rotate1) 1 Warning: Creating array temporary at (1) induct.f90:2896.24: rotate2 = transpose(rotate2) 1 Warning: Creating array temporary at (1) induct.f90:3038.24: rotate2 = transpose(rotate2) 1 Warning: Creating array temporary at (1) The same trick can probably used for RESHAPE, from capacita.f90 ... Warning: Creating array temporary at (1) capacita.f90:137.30: maxval(abs( reshape(Ar1,(/Ng1,Ng2/)) - Y )) 1 Warning: Creating array temporary at (1) capacita.f90:137.30: maxval(abs( reshape(Ar1,(/Ng1,Ng2/)) - Y )) 1 Warning: Creating array temporary at (1) capacita.f90:137.18: maxval(abs( reshape(Ar1,(/Ng1,Ng2/)) - Y )) 1 Warning: Creating array temporary at (1)
This fixes the test case from comment #0, and looks much more sane. Let's see if this survives regression-testing... Index: dependency.c =================================================================== --- dependency.c (Revision 162824) +++ dependency.c (Arbeitskopie) @@ -1716,8 +1716,8 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gf /* If no intention of reversing or reversing is explicitly inhibited, convert backward dependence to overlap. */ - if ((reverse == NULL && this_dep == GFC_DEP_BACKWARD) - || (reverse && reverse[n] == GFC_CANNOT_REVERSE)) + if (this_dep == GFC_DEP_BACKWARD + && (reverse == NULL || reverse[n] == GFC_CANNOT_REVERSE)) this_dep = GFC_DEP_OVERLAP; }
Subject: Bug 45159 Author: tkoenig Date: Mon Aug 2 22:04:36 2010 New Revision: 162829 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162829 Log: 2010-08-02 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * depencency.c (gfc_dep_resolver): Fix logic for when a loop can be reversed. 2010-08-02 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_29.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_29.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/testsuite/ChangeLog
(In reply to comment #5) > [macbook] lin/test% gfc -Warray-temporaries nf.f90 > nf.f90:293.30: > > if ( i>i1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1) > 1 > Warning: Creating array temporary at (1) > nf.f90:280.21: > > g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1) > 1 > Warning: Creating array temporary at (1) > nf.f90:261.29: > > if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) > 1 > Warning: Creating array temporary at (1) > nf.f90:248.20: > > g(i:i+nx-1) = g(i:i+nx-1) - au2(i-nx:i-1)*g(i-nx:i-1) > 1 > Warning: Creating array temporary at (1) > > i.e., disjoint sections are not spotted. This vanishes with -m32. The reason is that we, for some strange reason, convert the upper bound to the pointer width on a particular platform. For 64-bit, this results in subroutine foo(a,b,i,j,k,n) implicit none integer, intent(in) :: i, j, k, n real, dimension(n) :: a,b a(k:i-1) = a(i:j) end subroutine foo ending up with -fdump-parse-treee as ASSIGN foo:a(foo:k:__convert_i4_i8[[(((- foo:i 1)))]]) foo:a(foo:i:__convert_i4_i8[[((foo:j))]]) which confuses gfc_dep_compare_expr, which is too stupid to see through the type conversion. I know where to start, but it is late in the evening ;-)
Subject: Bug 45159 Author: tkoenig Date: Tue Aug 3 22:02:30 2010 New Revision: 162848 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162848 Log: 2010-08-03 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * dependency.c (gfc_deb_compare_expr): Remove any integer conversion functions to larger types from both arguments. Remove handling these functions futher down. 2010-08-03 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_30.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_30.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 45159 Author: tkoenig Date: Fri Aug 6 22:33:37 2010 New Revision: 162966 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162966 Log: 2010-08-06 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * dependency.c (check_section_vs_section): Handle cases where the start expression coincides with the lower or upper bound of the array. 2010-08-06 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_31.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_31.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/testsuite/ChangeLog
Here's another case where we generate a temporary: program main integer a(100) a(10:16) = a(11:17:1) end program main ig25@linux-fd1f:~/Krempel/Dep-9> gfortran -Warray-temporaries d1.f90 d1.f90:3.13: a(10:16) = a(11:17:1) 1 Warnung: Creating array temporary at (1)
(In reply to comment #15) > Here's another case where we generate a temporary: > > program main > integer a(100) > a(10:16) = a(11:17:1) > end program main Here's a tentative patch: Index: dependency.c =================================================================== --- dependency.c (Revision 163040) +++ dependency.c (Arbeitskopie) @@ -1023,6 +1023,7 @@ check_section_vs_section (gfc_array_ref *l_ar, gfc gfc_expr *r_lower; gfc_expr *r_upper; int r_dir; + bool identical_strides; /* If they are the same range, return without more ado. */ if (gfc_is_same_range (l_ar, r_ar, n, 0)) @@ -1076,6 +1077,23 @@ check_section_vs_section (gfc_array_ref *l_ar, gfc if (l_dir == 0 || r_dir == 0) return GFC_DEP_OVERLAP; + /* Determine if the strides are equal. */ + + if (l_stride) + { + if (r_stride) + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) == 1; + else + identical_strides = gfc_expr_is_one (l_stride, 0) == 0; + } + else + { + if (r_stride) + identical_strides = gfc_expr_is_one (r_stride, 0) == 1; + else + identical_strides = true; + } + /* Determine LHS upper and lower bounds. */ if (l_dir == 1) { @@ -1175,12 +1193,8 @@ check_section_vs_section (gfc_array_ref *l_ar, gfc && l_start && r_start && gfc_dep_compare_expr (l_start, r_start) == -1 && l_end && r_end && gfc_dep_compare_expr (l_end, r_end) == -1) { - /* Check that the strides are the same. */ - if (!l_stride && !r_stride) + if (identical_strides) return GFC_DEP_FORWARD; - if (l_stride && r_stride - && gfc_dep_compare_expr (l_stride, r_stride) == 0) - return GFC_DEP_FORWARD; } /* Check for forward dependencies x:y:-1 vs. x-1:z:-1. */ @@ -1188,20 +1202,12 @@ check_section_vs_section (gfc_array_ref *l_ar, gfc && l_start && r_start && gfc_dep_compare_expr (l_start, r_start) == 1 && l_end && r_end && gfc_dep_compare_expr (l_end, r_end) == 1) { - /* Check that the strides are the same. */ - if (!l_stride && !r_stride) + if (identical_strides) return GFC_DEP_FORWARD; - if (l_stride && r_stride - && gfc_dep_compare_expr (l_stride, r_stride) == 0) - return GFC_DEP_FORWARD; } - /* Are the strides the same? */ - if ((!l_stride && !r_stride) - || - (l_stride && r_stride - && gfc_dep_compare_expr (l_stride, r_stride) == 0)) + if (identical_strides) { if (l_start && IS_ARRAY_EXPLICIT (l_ar->as))
With the patch in comment#16, there is no temporary created for the code in comment #15, but one is created for a(10:16:1) = a(11:17) This seems to be fixed if I replace + if (r_stride) + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) == 1; + else + identical_strides = gfc_expr_is_one (l_stride, 0) == 0; with + if (r_stride) + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) == 1; + else + identical_strides = gfc_expr_is_one (l_stride, 0) == 1; I think that + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) == 1; should also be replaced with + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) == 0;
Subject: Re: Unnecessary temporaries Am Dienstag, den 10.08.2010, 08:45 +0000 schrieb dominiq at lps dot ens dot fr: > I think that > > + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) > == 1; > > should also be replaced with > > + identical_strides = gfc_dep_compare_expr (l_stride, r_stride) > == 0; Correct. Although it is probably better set the stride during resolution. Here's a tentative patch for that. There are probably very many places that NULL checks can be elided with this. Index: resolve.c =================================================================== --- resolve.c (Revision 163041) +++ resolve.c (Arbeitskopie) @@ -4378,12 +4378,19 @@ return FAILURE; } + /* Always fill in a stride of one. */ + + if (ar->dimen_type[i] == DIMEN_RANGE + && ar->stride[i] == NULL) + ar->stride[i] = gfc_get_int_expr (gfc_index_integer_kind, + &ar->where, 1); + /* Fill in the upper bound, which may be lower than the specified one for something like a(2:10:5), which is identical to a(2:7:5). Only relevant for strides not equal to one. */ if (ar->dimen_type[i] == DIMEN_RANGE - && ar->stride[i] != NULL && ar->stride[i]->expr_type == EXPR_CONSTANT + && ar->stride[i]->expr_type == EXPR_CONSTANT && mpz_cmp_si (ar->stride[i]->value.integer, 1L) != 0) { mpz_t size, end;
(In reply to comment #18) > Although it is probably better set the stride during resolution. Probably. A few comments before leaving for ten days without access to the net. (1) I think all the changes done recently to avoid unneeded temporaries can be extended by reversing the loops. (2) The following code generates an unneeded temporary: integer, parameter :: n = 10 integer :: i integer :: a(0:n+1) = (/(i, i =0, n), 0/) a(:6:2) = a(::3) print *, a end It is in the class abs(r_stride)>max(0,abs(l_stride)) for which there is no need for temporary if l_start<r_start+r_stride for r_stride>max(0,l_stride) or l_start>r_start+r_stride for r_stride<min(0,l_stride) If I am not mistaken, this is the last case for which one can conclude for any size of the sections. (I have done the proof on the back of an envelope that I cannot find right now.) For all the remaining cases it exists some set of triplets for which there is a dependency even if for some others there is none (depending upon the dependent indices are inside or not of the actual range). Although this can be implemented, I seriously doubt it is worth the pain. (3) I think the coverage for missing temporaries is rather limited right now and I am planning to write some torture test(s) to improve this situation.
Subject: Bug 45159 Author: tkoenig Date: Fri Aug 27 12:08:47 2010 New Revision: 163584 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163584 Log: 2010-08-27 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * dependency.c (check_section_vs_section): Single test for identical strides which takes into account that only one of the strides may be NULL. 2010-08-27 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_33.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_33.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 45159 Author: tkoenig Date: Fri Sep 3 16:16:34 2010 New Revision: 163834 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163834 Log: 2010-09-03 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * dependency.c (gfc_deb_compare_expr): Compare equal for equal arglists for pure user functions, or for those intrinsic functions which are also pure. * intrinsics.c (add_conv): Mark conversion functions as pure. (add_char_conversions): Likewise. 2010-09-03 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_34.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_34.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/fortran/intrinsic.c trunk/gcc/testsuite/ChangeLog
Created attachment 22546 [details] Patch for some more improvements Here's a patch for some more improvements.
Author: tkoenig Date: Fri Dec 3 10:35:12 2010 New Revision: 167413 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167413 Log: 2010-12-02 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * dependency.c (check_section_vs_section): Pre-calculate the relationship between the strides and the relationship between the start values. Use an integer constant one for that purpose. Forward dependencies for positive strides apply for where the lhs start <= rhs start and lhs stride <= rhs stride and vice versa for negative stride. No need to compare end expressions in either case (assume no bounds violation). 2010-12-02 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/45159 * gfortran.dg/dependency_38.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/dependency_38.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/testsuite/ChangeLog
Anything left to be done?
> Anything left to be done? I see [macbook] f90/bug% gfc -Warray-temporaries pr45159_4_red.f90 pr45159_4_red.f90:7.15: a(-3:9:3) = a(-6:18:6) 1 Warning: Creating array temporary at (1) which is an instance of l_start<r_start+r_stride for r_stride>max(0,l_stride) (see comment #19) for which there is no dependency.
(In reply to comment #19) Hi Dominique, > (I have done the proof on the back of an envelope that I cannot find right > now.) Can you find that particular envelope again? I tried to do the logic for this myself, but couldn't find a proof.
To allow expressions like a(n:2*n:2) = a(n+1:2*n+1:2) to be optimized, I will try to write a function which calculates the difference between two gfc_expr() for easy cases. This could also help in other cases, such as determining string lenghts for expressions like c(n:n+2).
Created attachment 29340 [details] patch which implements comment #27 Still have to verify that this one is correct in all cases.
I think we can close this bug with http://gcc.gnu.org/ml/gcc-cvs/2013-03/msg00846.html It's been around for some time, if somebody finds anything else, let us open a fresh one :-)