Bug 44773 - [4.6 Regression] Unnecessary temporaries increase the runtime for channel.f90 by ~70%
Summary: [4.6 Regression] Unnecessary temporaries increase the runtime for channel.f90...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.6.0
Assignee: Paul Thomas
URL:
Keywords: missed-optimization
: 44867 (view as bug list)
Depends on:
Blocks: 36854
  Show dependency treegraph
 
Reported: 2010-07-01 21:21 UTC by Dominique d'Humieres
Modified: 2010-07-12 07:26 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.5.0
Known to fail:
Last reconfirmed: 2010-07-03 17:03:45


Attachments
A first step to fix this bug (699 bytes, patch)
2010-07-08 12:29 UTC, Paul Thomas
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique d'Humieres 2010-07-01 21:21:58 UTC
At revision 161670, gfortran creates unneeded temporaries (not created up to r161462), for instance

[macbook] lin/test% gfc -O3 -ffast-math -Warray-temporaries channel.f90
channel.f90:148.11:

    dudx = ddx(u(:,:,mid))
           1
Warning: Creating array temporary at (1)
channel.f90:149.11:

    dvdy = ddy(v(:,:,mid))
           1
Warning: Creating array temporary at (1)
channel.f90:150.11:

    dhdx = ddx(h(:,:,mid))
           1
Warning: Creating array temporary at (1)
channel.f90:151.11:

    dhdy = ddy(h(:,:,mid))
           1
Warning: Creating array temporary at (1)

rsulting in a ~70% increase in the execution time

[macbook] lin/test% time a.out > /dev/null
5.099u 0.032s 0:05.13 99.8%	0+0k 0+0io 0pf+0w

compared to

[macbook] lin/test% gfcf -O3 -ffast-math -Warray-temporaries channel.f90                                                                              [macbook] lin/test% time a.out > /dev/null
2.964u 0.006s 0:02.99 98.9%	0+0k 0+0io 0pf+0w

I suspect that revision 161550 is the cause.
Comment 1 Tobias Burnus 2010-07-02 07:31:03 UTC
> At revision 161670, gfortran creates unneeded temporaries (not created up
> to r161462).
(r161462 = PR 44582)

channel.f90 has:

program sw
  double precision,dimension(M,N)::               f,dudx,dvdy,dhdx,dhdy
  dudx = ddx(u(:,:,mid))
contains
  function ddx(array)

And the problem is: How should the compiler know that "ddx" does not use the (host associated) "dudx"?  Note: The function "ddx" is not declared as PURE - and also cannot simply marked as pure as it host-associates the variables "I" and "J" - thus a simple check for no host association would not work.

Hence, I fail to see how the compiler can handle it. Unless you find an example where the compiler could know it, I fear one has to close the PR as WONTFIX.
Comment 2 Dominique d'Humieres 2010-07-02 08:19:27 UTC
> And the problem is: How should the compiler know that "ddx" does not use the
> (host associated) "dudx"?  Note: The function "ddx" is not declared as PURE -
> and also cannot simply marked as pure as it host-associates the variables "I"
> and "J" - thus a simple check for no host association would not work.
>
> Hence, I fail to see how the compiler can handle it. Unless you find an example
> where the compiler could know it, I fear one has to close the PR as WONTFIX.

(1) Please refrain to close this pr.

(2) I'll have a closer look to what's happening at the inlining level (i.e., before revision 161550 the functions ddx and ddy were inlined. I don't know if it is still the case).

(3) The body of ddx does not use dudx or dhdx (ddy does not use dydy or dhdy) and u,v, and h don't allias with dudx, ..., so it is not difficult for the user to check that there is no need for temporaries. Why should it be impossible for the compiler?
 
Comment 3 Mikael Morin 2010-07-02 08:45:41 UTC
It could be useful at several places to have a function traversing the code of a procedure and telling whether an array is read or written to by that procedure. 

It should be quite doable for the write case (only assignments and procedure calls to care about). 
For the read case, however, it could take some time to get it right, as every statement has its own parameters with its own restrictions and assumptions. 

This is pure speculation anyway. 

By the way, what is the complete (or reduced) test case ?
Comment 4 Dominique d'Humieres 2010-07-02 08:51:40 UTC
> By the way, what is the complete (or reduced) test case ?

The polyhedron test channel.f90: http://www.polyhedron.co.uk/MFL6VW74649 . If there is an interest, I can try to reduce the test during the week-end.

I have checked that the timings do not depends of the inlining of ddx and ddy.
Comment 5 Jerry DeLisle 2010-07-02 15:25:40 UTC
Yes, please reduce and lets see if we can discover something more specific wrong here. Then also consider Mikael's idea.
Comment 6 Dominique d'Humieres 2010-07-02 16:53:41 UTC
> Yes, please reduce and lets see if we can discover something more specific
> wrong here. Then also consider Mikael's idea.

I don't think there is anything "specific" to discover. The fix for PR44582 is too conservative and creates unneeded temporaries (in channel.f90, test_fpu.f90 and probably others that I have missed). I'll be extremely sad that to fix (ab)uses of the standard (codes you should never write), legitimate "real life" codes get badly penalized.

You can also see the effect at http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html where the time went from ~16.5s to ~18.7s. The relative effect is less dramatic because channel.f90 is memory bound for caches smaller than ~4Mb: I don't know the cache size of the AMD proc for the above test (~512kb?) compared to the 3Mb on my macbook. So the effect of the new temporaries is to increase the memory access by ~2s, i.e., ~10% on the AMD tests, but ~70% on my machine. So I doubt that a reduced test wil give any new insight in the problem, nevertheless I'll try!-).
Comment 7 Thomas Koenig 2010-07-03 17:03:45 UTC
Confirmed.

Wrong-code takes precedence over missed-optimization, but this is
still something that should be fixed.
Comment 8 Dominique d'Humieres 2010-07-04 10:48:54 UTC
> Note: The function "ddx" is not declared as PURE -
> and also cannot simply marked as pure as it host-associates the variables "I"
> and "J" - thus a simple check for no host association would not work.

There is no temporaries if ddx and ddy are declared pure (after adding intent(in) for array, and replacing I and J by say nI and nJ). So at least this works, require to change the code.
Comment 9 Uroš Bizjak 2010-07-08 11:40:56 UTC
*** Bug 44867 has been marked as a duplicate of this bug. ***
Comment 10 Paul Thomas 2010-07-08 12:29:44 UTC
Created attachment 21142 [details]
A first step to fix this bug

This does the right thing but has not been regtested because my tree is so broken that even "hello world" does not run.

However, I am confident that it can be persuaded to regtest and will do so tonight.

Cheers

Paul
Comment 11 Tobias Burnus 2010-07-08 13:33:32 UTC
+      /* A temporary is not needed if the lhs has never been host
+	 associated and the procedure is contained.  */
+      if (!sym->attr.host_assoc_in_contained
+	    && expr2->value.function.esym->attr.contained)
+	return false;
+
       /* A temporary is not needed if the variable is local and not
 	 a pointer, a target or a result.  */
       if (sym->ns->parent

I have not read the patch in context, but I fear that you might miss a TARGET/POINTER check. Otherwise, I like your patch.
Comment 12 paul.richard.thomas@gmail.com 2010-07-08 15:32:32 UTC
Subject: Re:  [4.6 Regression] Unnecessary temporaries 
	increase the runtime for channel.f90 by ~70%

Tobias,

That is the context - apply it and see.

Paul

On Thu, Jul 8, 2010 at 3:33 PM, burnus at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #11 from burnus at gcc dot gnu dot org  2010-07-08 13:33 -------
> +      /* A temporary is not needed if the lhs has never been host
> +        associated and the procedure is contained.  */
> +      if (!sym->attr.host_assoc_in_contained
> +           && expr2->value.function.esym->attr.contained)
> +       return false;
> +
>       /* A temporary is not needed if the variable is local and not
>         a pointer, a target or a result.  */
>       if (sym->ns->parent
>
> I have not read the patch in context, but I fear that you might miss a
> TARGET/POINTER check. Otherwise, I like your patch.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44773
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>



Comment 13 tkoenig@netcologne.de 2010-07-08 19:55:49 UTC
Subject: Re:  [4.6 Regression] Unnecessary temporaries
 increase the runtime for channel.f90 by ~70%

Hello Paul,

> That is the context - apply it and see.

What about

module baz
  implicit none
contains
  subroutine bar(x)
    real, dimension(:,:,:) :: x
    x = 3.0
  end subroutine bar
end module baz

program sw
  use baz
  implicit none
  integer, parameter :: n = 4, m = 7,  k = 5
  real ,dimension(M,N)::               f,dudx,dvdy,dhdx,dhdy
  real, dimension(m,n,k) :: u
  u = 1.0
  dudx = ddx(u(:,:,2))
  print *,dudx
contains
  real function ddx(array)
  real, dimension(:,:) :: array
  call bar(u)
  ddx = array(1,1)
  end function ddx
end program sw

AFAICS, this is legal and would require a temporary.

Comment 14 Tobias Burnus 2010-07-08 19:57:42 UTC
(In reply to comment #12)
> That is the context - apply it and see.

I saw it now (the comment was a bit misleading - the comment combines the outer check with the succeeding check).

On the way home, I was wondering how one can fix it for the case of host-associating the variable in some but not in all procedures. My feeling was that a linked list in the function's gfc_symbol would be best as this also will work in case of submodules (if one initializes the variable with "UNKOWN") whereas saving this information as linked list in the host variable makes it more difficult for submodules.

Actually, your patch also does not work with submodules - maybe you should add a TODO/FIXME pointing to submodules - to make sure we don't forget this, when we implement submodules.
Comment 15 Dominique d'Humieres 2010-07-09 12:02:03 UTC
The patch in comment #10 avoids the extra temporaries and recovers the original timings. Regstrapped without regression and passed my tests. Note also that it "fixes" pr44744.

Concerning the test in comment #13 it gives 3.0 with/without the patch (as well with g95). I don't know if it should give 1.0 (the value before "call bar(u)") or not, but the result does not depend on the patch.

Thanks for the quick fix.
Comment 16 Tobias Burnus 2010-07-09 12:26:54 UTC
(In reply to comment #13)
> What about

>   dudx = ddx(u(:,:,2))

>   real function ddx(array)
>     real, dimension(:,:) :: array
>     call bar(u)
>     ddx = array(1,1)

> AFAICS, this is legal and would require a temporary.

I don't see why: You never have the same variable on the LHS and on the RHS. On the LHS is "dudx" == "ddx" while on the RHS is "u" == "array". As I cannot see how "u" and "dudx" can alias, I don't think one needs any temporary. 

Not even in any function call as the dummies are non-CONTIGUOUS assumed-shape arrays. (Otherwise, a temporary is needed as the actual argument is not simple contiguous [actually: it is known not to be contiguous].)

The result 3.0 looks also fine and all my compiles produce it.


> Note also that it "fixes" pr44744.

Nevertheless, one should leave the PR open as the bounds checking is obviously missing. (Though, one should change the title after committal.)
Comment 17 Paul Thomas 2010-07-10 14:57:41 UTC
Subject: Bug 44773

Author: pault
Date: Sat Jul 10 14:57:25 2010
New Revision: 162038

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162038
Log:
2010-07-10  Paul Thomas  <pault@gcc.gnu.org>

	PR fortran/44773
	* trans-expr.c (arrayfunc_assign_needs_temporary): No temporary
	if the lhs has never been host associated, as well as not being
	use associated, a pointer or a target.
	* resolve.c (resolve_variable): Mark variables that are host
	associated.
	* gfortran.h: Add the host_assoc bit to the symbol_attribute
	structure.


Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/gfortran.h
    trunk/gcc/fortran/resolve.c
    trunk/gcc/fortran/trans-expr.c

Comment 18 Paul Thomas 2010-07-10 17:09:03 UTC
Subject: Bug 44773

Author: pault
Date: Sat Jul 10 17:08:48 2010
New Revision: 162041

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162041
Log:
2010-07-10  Paul Thomas  <pault@gcc.gnu.org>

	PR fortran/44582
	* trans-expr.c (arrayfunc_assign_needs_temporary): New function
	to determine if a function assignment can be made without a
	temporary.
	(gfc_trans_arrayfunc_assign): Move all the conditions that
	suppress the direct function call to the above new functon and
	call it.

	PR fortran/44773
	* trans-expr.c (arrayfunc_assign_needs_temporary): No temporary
	if the lhs has never been host associated, as well as not being
	use associated, a pointer or a target.
	* resolve.c (resolve_variable): Mark variables that are host
	associated.
	* gfortran.h: Add the host_assoc bit to the symbol_attribute
	structure.

2010-07-10  Paul Thomas  <pault@gcc.gnu.org>

	PR fortran/44582
	* gfortran.dg/aliasing_array_result_1.f90 : New test.


Added:
    branches/gcc-4_4-branch/gcc/testsuite/gfortran.dg/aliasing_array_result_1.f90
Modified:
    branches/gcc-4_4-branch/gcc/fortran/ChangeLog
    branches/gcc-4_4-branch/gcc/fortran/gfortran.h
    branches/gcc-4_4-branch/gcc/fortran/resolve.c
    branches/gcc-4_4-branch/gcc/fortran/trans-expr.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 19 Paul Thomas 2010-07-11 16:07:14 UTC
Subject: Bug 44773

Author: pault
Date: Sun Jul 11 16:06:53 2010
New Revision: 162059

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162059
Log:
2010-07-11  Paul Thomas  <pault@gcc.gnu.org>

	PR fortran/44773
	* trans-expr.c (arrayfunc_assign_needs_temporary): No temporary
	if the lhs has never been host associated, as well as not being
	use associated, a pointer or a target.
	* resolve.c (resolve_variable): Mark variables that are host
	associated.
	* gfortran.h: Add the host_assoc bit to the symbol_attribute
	structure.


Modified:
    branches/gcc-4_5-branch/gcc/fortran/ChangeLog
    branches/gcc-4_5-branch/gcc/fortran/gfortran.h
    branches/gcc-4_5-branch/gcc/fortran/resolve.c
    branches/gcc-4_5-branch/gcc/fortran/trans-expr.c

Comment 20 paul.richard.thomas@gmail.com 2010-07-12 06:31:40 UTC
Subject: Re:  [4.6 Regression] Unnecessary temporaries 
	increase the runtime for channel.f90 by ~70%

4.3 is not so easy - it's throwing a load of regressions.  I'll spend
some time tonight to try to understand why.  If I don't see it, I will
close this PR as FIXED; after all this bug goes gack to gfortran-3.5,
so it has taken 10years for it to come up :-)

Paul

On Sun, Jul 11, 2010 at 6:07 PM, pault at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #19 from pault at gcc dot gnu dot org  2010-07-11 16:07 -------
> Subject: Bug 44773
>
> Author: pault
> Date: Sun Jul 11 16:06:53 2010
> New Revision: 162059
>
> URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162059
> Log:
> 2010-07-11  Paul Thomas  <pault@gcc.gnu.org>
>
>        PR fortran/44773
>        * trans-expr.c (arrayfunc_assign_needs_temporary): No temporary
>        if the lhs has never been host associated, as well as not being
>        use associated, a pointer or a target.
>        * resolve.c (resolve_variable): Mark variables that are host
>        associated.
>        * gfortran.h: Add the host_assoc bit to the symbol_attribute
>        structure.
>
>
> Modified:
>    branches/gcc-4_5-branch/gcc/fortran/ChangeLog
>    branches/gcc-4_5-branch/gcc/fortran/gfortran.h
>    branches/gcc-4_5-branch/gcc/fortran/resolve.c
>    branches/gcc-4_5-branch/gcc/fortran/trans-expr.c
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44773
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
>



Comment 21 Steven Bosscher 2010-07-12 07:26:32 UTC
I think this should not go into GCC 4.3 anyway. The problem is not a regression, and the patch is non-obvious, so it's just not appropriate for a stable release branch.