At revision 161670, gfortran creates unneeded temporaries (not created up to r161462), for instance [macbook] lin/test% gfc -O3 -ffast-math -Warray-temporaries channel.f90 channel.f90:148.11: dudx = ddx(u(:,:,mid)) 1 Warning: Creating array temporary at (1) channel.f90:149.11: dvdy = ddy(v(:,:,mid)) 1 Warning: Creating array temporary at (1) channel.f90:150.11: dhdx = ddx(h(:,:,mid)) 1 Warning: Creating array temporary at (1) channel.f90:151.11: dhdy = ddy(h(:,:,mid)) 1 Warning: Creating array temporary at (1) rsulting in a ~70% increase in the execution time [macbook] lin/test% time a.out > /dev/null 5.099u 0.032s 0:05.13 99.8% 0+0k 0+0io 0pf+0w compared to [macbook] lin/test% gfcf -O3 -ffast-math -Warray-temporaries channel.f90 [macbook] lin/test% time a.out > /dev/null 2.964u 0.006s 0:02.99 98.9% 0+0k 0+0io 0pf+0w I suspect that revision 161550 is the cause.
> At revision 161670, gfortran creates unneeded temporaries (not created up > to r161462). (r161462 = PR 44582) channel.f90 has: program sw double precision,dimension(M,N):: f,dudx,dvdy,dhdx,dhdy dudx = ddx(u(:,:,mid)) contains function ddx(array) And the problem is: How should the compiler know that "ddx" does not use the (host associated) "dudx"? Note: The function "ddx" is not declared as PURE - and also cannot simply marked as pure as it host-associates the variables "I" and "J" - thus a simple check for no host association would not work. Hence, I fail to see how the compiler can handle it. Unless you find an example where the compiler could know it, I fear one has to close the PR as WONTFIX.
> And the problem is: How should the compiler know that "ddx" does not use the > (host associated) "dudx"? Note: The function "ddx" is not declared as PURE - > and also cannot simply marked as pure as it host-associates the variables "I" > and "J" - thus a simple check for no host association would not work. > > Hence, I fail to see how the compiler can handle it. Unless you find an example > where the compiler could know it, I fear one has to close the PR as WONTFIX. (1) Please refrain to close this pr. (2) I'll have a closer look to what's happening at the inlining level (i.e., before revision 161550 the functions ddx and ddy were inlined. I don't know if it is still the case). (3) The body of ddx does not use dudx or dhdx (ddy does not use dydy or dhdy) and u,v, and h don't allias with dudx, ..., so it is not difficult for the user to check that there is no need for temporaries. Why should it be impossible for the compiler?
It could be useful at several places to have a function traversing the code of a procedure and telling whether an array is read or written to by that procedure. It should be quite doable for the write case (only assignments and procedure calls to care about). For the read case, however, it could take some time to get it right, as every statement has its own parameters with its own restrictions and assumptions. This is pure speculation anyway. By the way, what is the complete (or reduced) test case ?
> By the way, what is the complete (or reduced) test case ? The polyhedron test channel.f90: http://www.polyhedron.co.uk/MFL6VW74649 . If there is an interest, I can try to reduce the test during the week-end. I have checked that the timings do not depends of the inlining of ddx and ddy.
Yes, please reduce and lets see if we can discover something more specific wrong here. Then also consider Mikael's idea.
> Yes, please reduce and lets see if we can discover something more specific > wrong here. Then also consider Mikael's idea. I don't think there is anything "specific" to discover. The fix for PR44582 is too conservative and creates unneeded temporaries (in channel.f90, test_fpu.f90 and probably others that I have missed). I'll be extremely sad that to fix (ab)uses of the standard (codes you should never write), legitimate "real life" codes get badly penalized. You can also see the effect at http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html where the time went from ~16.5s to ~18.7s. The relative effect is less dramatic because channel.f90 is memory bound for caches smaller than ~4Mb: I don't know the cache size of the AMD proc for the above test (~512kb?) compared to the 3Mb on my macbook. So the effect of the new temporaries is to increase the memory access by ~2s, i.e., ~10% on the AMD tests, but ~70% on my machine. So I doubt that a reduced test wil give any new insight in the problem, nevertheless I'll try!-).
Confirmed. Wrong-code takes precedence over missed-optimization, but this is still something that should be fixed.
> Note: The function "ddx" is not declared as PURE - > and also cannot simply marked as pure as it host-associates the variables "I" > and "J" - thus a simple check for no host association would not work. There is no temporaries if ddx and ddy are declared pure (after adding intent(in) for array, and replacing I and J by say nI and nJ). So at least this works, require to change the code.
*** Bug 44867 has been marked as a duplicate of this bug. ***
Created attachment 21142 [details] A first step to fix this bug This does the right thing but has not been regtested because my tree is so broken that even "hello world" does not run. However, I am confident that it can be persuaded to regtest and will do so tonight. Cheers Paul
+ /* A temporary is not needed if the lhs has never been host + associated and the procedure is contained. */ + if (!sym->attr.host_assoc_in_contained + && expr2->value.function.esym->attr.contained) + return false; + /* A temporary is not needed if the variable is local and not a pointer, a target or a result. */ if (sym->ns->parent I have not read the patch in context, but I fear that you might miss a TARGET/POINTER check. Otherwise, I like your patch.
Subject: Re: [4.6 Regression] Unnecessary temporaries increase the runtime for channel.f90 by ~70% Tobias, That is the context - apply it and see. Paul On Thu, Jul 8, 2010 at 3:33 PM, burnus at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #11 from burnus at gcc dot gnu dot org 2010-07-08 13:33 ------- > + /* A temporary is not needed if the lhs has never been host > + associated and the procedure is contained. */ > + if (!sym->attr.host_assoc_in_contained > + && expr2->value.function.esym->attr.contained) > + return false; > + > /* A temporary is not needed if the variable is local and not > a pointer, a target or a result. */ > if (sym->ns->parent > > I have not read the patch in context, but I fear that you might miss a > TARGET/POINTER check. Otherwise, I like your patch. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44773 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
Subject: Re: [4.6 Regression] Unnecessary temporaries increase the runtime for channel.f90 by ~70% Hello Paul, > That is the context - apply it and see. What about module baz implicit none contains subroutine bar(x) real, dimension(:,:,:) :: x x = 3.0 end subroutine bar end module baz program sw use baz implicit none integer, parameter :: n = 4, m = 7, k = 5 real ,dimension(M,N):: f,dudx,dvdy,dhdx,dhdy real, dimension(m,n,k) :: u u = 1.0 dudx = ddx(u(:,:,2)) print *,dudx contains real function ddx(array) real, dimension(:,:) :: array call bar(u) ddx = array(1,1) end function ddx end program sw AFAICS, this is legal and would require a temporary.
(In reply to comment #12) > That is the context - apply it and see. I saw it now (the comment was a bit misleading - the comment combines the outer check with the succeeding check). On the way home, I was wondering how one can fix it for the case of host-associating the variable in some but not in all procedures. My feeling was that a linked list in the function's gfc_symbol would be best as this also will work in case of submodules (if one initializes the variable with "UNKOWN") whereas saving this information as linked list in the host variable makes it more difficult for submodules. Actually, your patch also does not work with submodules - maybe you should add a TODO/FIXME pointing to submodules - to make sure we don't forget this, when we implement submodules.
The patch in comment #10 avoids the extra temporaries and recovers the original timings. Regstrapped without regression and passed my tests. Note also that it "fixes" pr44744. Concerning the test in comment #13 it gives 3.0 with/without the patch (as well with g95). I don't know if it should give 1.0 (the value before "call bar(u)") or not, but the result does not depend on the patch. Thanks for the quick fix.
(In reply to comment #13) > What about > dudx = ddx(u(:,:,2)) > real function ddx(array) > real, dimension(:,:) :: array > call bar(u) > ddx = array(1,1) > AFAICS, this is legal and would require a temporary. I don't see why: You never have the same variable on the LHS and on the RHS. On the LHS is "dudx" == "ddx" while on the RHS is "u" == "array". As I cannot see how "u" and "dudx" can alias, I don't think one needs any temporary. Not even in any function call as the dummies are non-CONTIGUOUS assumed-shape arrays. (Otherwise, a temporary is needed as the actual argument is not simple contiguous [actually: it is known not to be contiguous].) The result 3.0 looks also fine and all my compiles produce it. > Note also that it "fixes" pr44744. Nevertheless, one should leave the PR open as the bounds checking is obviously missing. (Though, one should change the title after committal.)
Subject: Bug 44773 Author: pault Date: Sat Jul 10 14:57:25 2010 New Revision: 162038 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162038 Log: 2010-07-10 Paul Thomas <pault@gcc.gnu.org> PR fortran/44773 * trans-expr.c (arrayfunc_assign_needs_temporary): No temporary if the lhs has never been host associated, as well as not being use associated, a pointer or a target. * resolve.c (resolve_variable): Mark variables that are host associated. * gfortran.h: Add the host_assoc bit to the symbol_attribute structure. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/gfortran.h trunk/gcc/fortran/resolve.c trunk/gcc/fortran/trans-expr.c
Subject: Bug 44773 Author: pault Date: Sat Jul 10 17:08:48 2010 New Revision: 162041 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162041 Log: 2010-07-10 Paul Thomas <pault@gcc.gnu.org> PR fortran/44582 * trans-expr.c (arrayfunc_assign_needs_temporary): New function to determine if a function assignment can be made without a temporary. (gfc_trans_arrayfunc_assign): Move all the conditions that suppress the direct function call to the above new functon and call it. PR fortran/44773 * trans-expr.c (arrayfunc_assign_needs_temporary): No temporary if the lhs has never been host associated, as well as not being use associated, a pointer or a target. * resolve.c (resolve_variable): Mark variables that are host associated. * gfortran.h: Add the host_assoc bit to the symbol_attribute structure. 2010-07-10 Paul Thomas <pault@gcc.gnu.org> PR fortran/44582 * gfortran.dg/aliasing_array_result_1.f90 : New test. Added: branches/gcc-4_4-branch/gcc/testsuite/gfortran.dg/aliasing_array_result_1.f90 Modified: branches/gcc-4_4-branch/gcc/fortran/ChangeLog branches/gcc-4_4-branch/gcc/fortran/gfortran.h branches/gcc-4_4-branch/gcc/fortran/resolve.c branches/gcc-4_4-branch/gcc/fortran/trans-expr.c branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
Subject: Bug 44773 Author: pault Date: Sun Jul 11 16:06:53 2010 New Revision: 162059 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162059 Log: 2010-07-11 Paul Thomas <pault@gcc.gnu.org> PR fortran/44773 * trans-expr.c (arrayfunc_assign_needs_temporary): No temporary if the lhs has never been host associated, as well as not being use associated, a pointer or a target. * resolve.c (resolve_variable): Mark variables that are host associated. * gfortran.h: Add the host_assoc bit to the symbol_attribute structure. Modified: branches/gcc-4_5-branch/gcc/fortran/ChangeLog branches/gcc-4_5-branch/gcc/fortran/gfortran.h branches/gcc-4_5-branch/gcc/fortran/resolve.c branches/gcc-4_5-branch/gcc/fortran/trans-expr.c
Subject: Re: [4.6 Regression] Unnecessary temporaries increase the runtime for channel.f90 by ~70% 4.3 is not so easy - it's throwing a load of regressions. I'll spend some time tonight to try to understand why. If I don't see it, I will close this PR as FIXED; after all this bug goes gack to gfortran-3.5, so it has taken 10years for it to come up :-) Paul On Sun, Jul 11, 2010 at 6:07 PM, pault at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #19 from pault at gcc dot gnu dot org 2010-07-11 16:07 ------- > Subject: Bug 44773 > > Author: pault > Date: Sun Jul 11 16:06:53 2010 > New Revision: 162059 > > URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162059 > Log: > 2010-07-11 Paul Thomas <pault@gcc.gnu.org> > > PR fortran/44773 > * trans-expr.c (arrayfunc_assign_needs_temporary): No temporary > if the lhs has never been host associated, as well as not being > use associated, a pointer or a target. > * resolve.c (resolve_variable): Mark variables that are host > associated. > * gfortran.h: Add the host_assoc bit to the symbol_attribute > structure. > > > Modified: > branches/gcc-4_5-branch/gcc/fortran/ChangeLog > branches/gcc-4_5-branch/gcc/fortran/gfortran.h > branches/gcc-4_5-branch/gcc/fortran/resolve.c > branches/gcc-4_5-branch/gcc/fortran/trans-expr.c > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44773 > > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. >
I think this should not go into GCC 4.3 anyway. The problem is not a regression, and the patch is non-obvious, so it's just not appropriate for a stable release branch.