Bug 35339

Summary: Improve translation of implied do loop in transfer
Product: gcc Reporter: Jerry DeLisle <jvdelisle2>
Component: fortranAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: gcc-bugs, manfred99, tkoenig
Priority: P3 Keywords: missed-optimization
Version: 4.4.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2008-03-13 05:19:18
Bug Depends on: 80945    
Bug Blocks: 25829    
Attachments: Early patch for simplifying impl do loops
Early patch for simplifying impl do loops - 2
Proposed patch for simplifying impl do loops

Description Jerry DeLisle 2008-02-24 01:41:05 UTC
Currently we create a looping structure which is executed to traverse an array and transfer it one element at a time.  This works, but is not too efficient.  We could improve this by converting the implied do loops into the appropriate array descriptor and call transfer_array.

The two cases are:

real, dimension(10) :: a
write(10,'(10f8.3)') a
write(10,'(10f8.3)') (a(i), i=1,10)

Gives:

    _gfortran_st_write (&dt_parm.3);
    {
      struct array1_real(kind=4) parm.4;

      parm.4.dtype = 281;
      parm.4.dim[0].lbound = 1;
      parm.4.dim[0].ubound = 10;
      parm.4.dim[0].stride = 1;
      parm.4.data = (void *) &a[0];
      parm.4.offset = -1;
      _gfortran_transfer_array (&dt_parm.3, &parm.4, 4, 0);
    }
    _gfortran_st_write_done (&dt_parm.3);

and

    _gfortran_st_write (&dt_parm.3);
    i = 1;
    if (i <= 10)
      {
        while (1)
          {
            {
              logical(kind=4) D.958;

              _gfortran_transfer_real (&dt_parm.3, &a[(integer(kind=8)) i + -1], 4);
              L.2:;
              D.958 = i == 10;
              i = i + 1;
              if (D.958) goto L.3;
            }
          }
      }
    L.3:;
    _gfortran_st_write_done (&dt_parm.3);

The former is needed to simplify asynchronous I/O where we need to be able to convey to the I/O thread the task to be done and not the actual code to do the looping.

Putting it another way, if the implied loop has 10000 count.  We need to pass that count to the I/O routines, so they can take it and run with it.
Comment 1 Francois-Xavier Coudert 2008-02-25 10:00:53 UTC
(In reply to comment #0)
> real, dimension(10) :: a
> write(10,'(10f8.3)') a
> write(10,'(10f8.3)') (a(i), i=1,10)

The problem is, this isn't possible in the most generic case:

  write(10,'(10f8.3)') (a(i**-2*i+13), i=1,10)
Comment 2 Jerry DeLisle 2008-03-01 16:22:56 UTC
OK, I see your point.  Would it be possible to create a hidden iterator function that could be called internally by the I/O library to return the index into the array?

In the meantime, I am thinking through a different approach for aio that avoids the issue here.

Comment 3 Paul Thomas 2008-03-02 07:59:58 UTC
> In the meantime, I am thinking through a different approach for aio that avoids
> the issue here.
> 

Yes it would - use gfc_conv_expr_descriptor to convert the expression and pass the resulting array descriptor.  For the cases that you are concerned with, this would be a temporary.  However, gfortran_transfer array would be used and the io itself taken out of the loop. Obviously, this would only work for a WRITE operation and gfc_conv_subref_array_arg would have to be used for a READ - as in the existing code in gfc_trans_transfer.

In fact, I believe that the exsting code in gfc_trans_transfer would be able to handle iterator expressions, were they passed to it.  However, the frontend does this scalarization; see io.c(match_io_element):2396 onwards. This builds up a DO loop and a call to EXEC_TRANSFER for each element.  Where there is only one element, you will get what you want by turning the iterator expression into an EXPR_ARRAY and writing the gfc_code to pass that to EXEC_TRANSFER.  I think that it should be a very straightforward job.

Best of luck!

Paul




Comment 4 Jerry DeLisle 2008-03-13 05:19:18 UTC
I have a method figured out for async I/O that will handle things as they are now, However, it would greatly improve the situation if we fix this implied do loop business.
Comment 5 Thomas Koenig 2013-08-31 18:27:22 UTC
Depending how much ground we want to cover, this
can be tricky.

Off my head, I can think of the following examples:

(a(i), i=1,10) --> a(1:10)
(a(i), i=1,10,2) --> a(1:10:2)
(a(2*i), i=1,10) --> a(2:20:2)
((a(i,j), i=1,10), j=1,2) --> a(1:10,1:20)
((a(i,j), j=1,2), i=1,10) --> no equivalent
((a(i,j), j=1,2) --> a(i,1:2)
(a(i,2*i), i=1,10) --> no equivalent
(a(i**2), i=1,10) --> no equivalent
Comment 6 Nicolas Koenig 2017-05-25 17:50:02 UTC
Created attachment 41419 [details]
Early patch for simplifying impl do loops

Attached is a patch that simplifies implied do loops in the way suggested by Thomas. There are, however, still a few problems with this patch that I have a hard time understanding. For example:

gcc@dcm-linux:~/pr/35339> cat z1.f90
program main
    implicit none
    integer:: i
    integer, dimension(2,2):: a = reshape([1, 2, 3, 4], shape(a))
    write (*,*) (a(i, 1), i=1,2)
end program
gcc@dcm-linux:~/pr/35339> gfortran -O z1.f90
z1.f90:5:0:

     write (*,*) (a(i, 1), i=1,4)
 
internal compiler error: Segmentation fault
0xbe58af crash_signal
        ../../trunk/gcc/toplev.c:337
0x6bea8d gfc_conv_scalarized_array_ref
        ../../trunk/gcc/fortran/trans-array.c:3228
0x6bfb1c gfc_conv_array_ref(gfc_se*, gfc_array_ref*, gfc_expr*, locus*)
        ../../trunk/gcc/fortran/trans-array.c:3382
0x6f95cd gfc_conv_variable
        ../../trunk/gcc/fortran/trans-expr.c:2680
0x6f529a gfc_conv_expr(gfc_se*, gfc_expr*)
        ../../trunk/gcc/fortran/trans-expr.c:7815
0x6fd41e gfc_conv_expr_reference(gfc_se*, gfc_expr*)
        ../../trunk/gcc/fortran/trans-expr.c:7915
0x724862 gfc_trans_transfer(gfc_code*)
        ../../trunk/gcc/fortran/trans-io.c:2539
0x6b3d37 trans_code
        ../../trunk/gcc/fortran/trans.c:2017
0x7214f2 build_dt
        ../../trunk/gcc/fortran/trans-io.c:2017
0x6b3d57 trans_code
        ../../trunk/gcc/fortran/trans.c:1989
0x6e5498 gfc_generate_function_code(gfc_namespace*)
        ../../trunk/gcc/fortran/trans-decl.c:6332
0x66c576 translate_all_program_units
        ../../trunk/gcc/fortran/parse.c:6074
0x66c576 gfc_parse_file()
        ../../trunk/gcc/fortran/parse.c:6274
0x6afe9f gfc_be_parse_file
        ../../trunk/gcc/fortran/f95-lang.c:204
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


probably because the new 'transfer'-statement isn't properly inserted in the code tree.
Comment 7 Nicolas Koenig 2017-05-25 18:36:26 UTC
Created attachment 41420 [details]
Early patch for simplifying impl do loops - 2

Sorry, wrong patch _and_ wrong testcase... Still fails for mysterious reasons, though.

gcc@dcm-linux:~/pr/35339> cat z3.f90
program main
    implicit none
    integer:: i
    integer, parameter:: k = 2
    integer, dimension(4):: a = [(i, i=1,4)]
    write (*,*) (a(k), i=1,4) 
end program
gcc@dcm-linux:~/pr/35339> gfortran -O z3.f90
z3.f90:6:0:

     write (*,*) (a(k), i=1,4)
 
internal compiler error: in gfc_conv_expr_descriptor, at fortran/trans-array.c:6792
0x73175d gfc_conv_expr_descriptor(gfc_se*, gfc_expr*)
        ../../trunk/gcc/fortran/trans-array.c:6792
0x7a811b gfc_trans_transfer(gfc_code*)
        ../../trunk/gcc/fortran/trans-io.c:2581
0x71dffb trans_code
        ../../trunk/gcc/fortran/trans.c:2017
0x71e164 gfc_trans_code_cond(gfc_code*, tree_node*)
        ../../trunk/gcc/fortran/trans.c:2120
0x7c4fa2 gfc_trans_simple_do
        ../../trunk/gcc/fortran/trans-stmt.c:1942
0x7c5473 gfc_trans_do(gfc_code*, tree_node*)
        ../../trunk/gcc/fortran/trans-stmt.c:2075
0x71ddf4 trans_code
        ../../trunk/gcc/fortran/trans.c:1917
0x71e164 gfc_trans_code_cond(gfc_code*, tree_node*)
        ../../trunk/gcc/fortran/trans.c:2120
0x7a6bad build_dt
        ../../trunk/gcc/fortran/trans-io.c:2017
0x7a6c88 gfc_trans_write(gfc_code*)
        ../../trunk/gcc/fortran/trans-io.c:2056
0x71df71 trans_code
        ../../trunk/gcc/fortran/trans.c:1989
0x71e183 gfc_trans_code(gfc_code*)
        ../../trunk/gcc/fortran/trans.c:2128
0x7590ed gfc_generate_function_code(gfc_namespace*)
        ../../trunk/gcc/fortran/trans-decl.c:6332
0x71e1c7 gfc_generate_code(gfc_namespace*)
        ../../trunk/gcc/fortran/trans.c:2145
0x6b0646 translate_all_program_units
        ../../trunk/gcc/fortran/parse.c:6074
0x6b0c56 gfc_parse_file()
        ../../trunk/gcc/fortran/parse.c:6274
0x706d3f gfc_be_parse_file
        ../../trunk/gcc/fortran/f95-lang.c:204
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Comment 8 Jerry DeLisle 2017-05-26 15:33:52 UTC
(In reply to Nicolas Koenig from comment #7)
> Created attachment 41420 [details]
> Early patch for simplifying impl do loops - 2
> 
> Sorry, wrong patch _and_ wrong testcase... Still fails for mysterious
> reasons, though.

I was thinking the whole change would be in trans-io.c. This was before frontend-passes.c existed.
Comment 9 Nicolas Koenig 2017-05-27 19:22:22 UTC
Created attachment 41430 [details]
Proposed patch for simplifying impl do loops

Fixed the bugs, it works now. I will submit it later this day.

@Jerry: This isn't the entire solution, I fear there will have to be changes in trans-io.c to completely fix this PR ;)
Comment 10 Manfred Schwarb 2017-05-28 22:32:45 UTC
As I understand this patch applies to read and write.
How does this optimization behave regarding my pet issue (short array reads)?

I.e.
      program internalread
      integer m
      parameter(m=10000000)
      character value*10
      integer i,j,intvalues(m)

      DO j=1,100
        write(value,'(i3,a5)') j," 5 69"
CC        read(value,*,end=20) (intvalues(i),i=1,m)
        read(value,*,end=20) intvalues
   20   print*,"20"
        write(*,*) j,i
      ENDDO
      end program internalread

On my box this is 2.8s vs. 31s, i.e. the array version is more
than 10 times slower than the implied do loop.
Comment 11 Jerry DeLisle 2017-05-28 23:23:20 UTC
(In reply to Manfred Schwarb from comment #10)
> As I understand this patch applies to read and write.
> How does this optimization behave regarding my pet issue (short array reads)?
> 
> I.e.
>       program internalread
>       integer m
>       parameter(m=10000000)
>       character value*10
>       integer i,j,intvalues(m)
> 
>       DO j=1,100
>         write(value,'(i3,a5)') j," 5 69"
> CC        read(value,*,end=20) (intvalues(i),i=1,m)
>         read(value,*,end=20) intvalues
>    20   print*,"20"
>         write(*,*) j,i
>       ENDDO
>       end program internalread
> 
> On my box this is 2.8s vs. 31s, i.e. the array version is more
> than 10 times slower than the implied do loop.

The patch to 53029 fixes this one. And I think Nicolas patch will improve further the implied do loop version by converting to an array section.  Both patches would be needed for that. We are still testing of course.
Comment 12 Nicolas Koenig 2017-06-05 12:35:44 UTC
Author: koenigni
Date: Mon Jun  5 12:35:11 2017
New Revision: 248877

URL: https://gcc.gnu.org/viewcvs?rev=248877&root=gcc&view=rev
Log:

2017-06-05  Nicolas Koenig  <koenigni@student.ethz.ch>

	PR fortran/35339
	* frontend-passes.c (traverse_io_block): New function.
	(simplify_io_impl_do): New function.
	(optimize_namespace): Invoke gfc_code_walker with
	simplify_io_impl_do.

2017-06-05  Nicolas Koenig  <koenigni@student.ethz.ch>

	PR fortran/35339
	* gfortran.dg/implied_do_io_1.f90: New Test.
	* gfortran.dg/implied_do_io_2.f90: New Test.



Added:
    trunk/gcc/testsuite/gfortran.dg/implied_do_io_1.f90
    trunk/gcc/testsuite/gfortran.dg/implied_do_io_2.f90
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/frontend-passes.c
    trunk/gcc/testsuite/ChangeLog
Comment 13 Thomas Koenig 2018-02-19 18:22:17 UTC
Author: tkoenig
Date: Mon Feb 19 18:21:45 2018
New Revision: 257814

URL: https://gcc.gnu.org/viewcvs?rev=257814&root=gcc&view=rev
Log:
2018-02-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/35339
	* frontend-passes.c (traverse_io_block): Remove workaround for
	PR 80945.

2018-02-19  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/35339
	* gfortran.dg/implied_do_io_4.f90: New test.


Added:
    trunk/gcc/testsuite/gfortran.dg/implied_do_io_4.f90
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/frontend-passes.c
    trunk/gcc/testsuite/ChangeLog
Comment 14 Thomas Koenig 2018-02-19 18:23:56 UTC
Fixed (finally).

Closing.
Comment 15 Jerry DeLisle 2018-02-19 19:52:31 UTC
(In reply to Thomas Koenig from comment #14)
> Fixed (finally).
> 
> Closing.

Thanks Thomas!