Bug 78549 - Very slow formatted internal file output
Summary: Very slow formatted internal file output
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libfortran (show other bugs)
Version: 6.2.1
: P3 normal
Target Milestone: 8.2
Assignee: Jerry DeLisle
URL:
Keywords:
: 82938 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-11-27 12:35 UTC by mecej4
Modified: 2018-06-20 01:11 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-11-28 00:00:00


Attachments
Program to demonstrate bug (272 bytes, text/plain)
2016-11-27 12:35 UTC, mecej4
Details
patch which failes with dtio_14 and dtio_15, among others (1.20 KB, patch)
2017-10-07 16:55 UTC, Thomas Koenig
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mecej4 2016-11-27 12:35:58 UTC
Created attachment 40163 [details]
Program to demonstrate bug

Formatted internal writes to a string variable are very slow in many recent versions of GFortran, especially the Windows ports. The attached F77 program, and F95 versions of the same, demonstrate the problem. The heart of the problem is the slowness of

     write (str,'(1P,10E12.3)') x

(where x is a real array of dimension 10), compared to the equivalent call to sprintf in C. Unlike in some old Gfortran bug reports, there are no issues related to Windows versus Linux file system and I/O system discrepancies. NO external FILE I/O is involved.

Here are some timings, in seconds, on two systems.

SYSTEM: Linux, XEON E5-2630, GFortran/GCC 6.1.1/F2C

   gfortran -Ofast                     6.4
   F2C and gcc -Ofast                  4.8
   gcc -Ofast                          1.8

SYSTEM: Windows 10-64 laptop, Haswell i5-4200U CPU

   F2C, gcc 6.2 -O2 -march=haswell     7.8
   G77 -O2 -march=pentium             12.7
   gfortran 5.2 32-bit Cygwin         49.6
   gfortran 6.2 Mingw-64, Eq.com      65.6

   gcc 6.2                             5.2
   Ifort 17.0.1                        2.3

Here is the source code of the C program, which I am placing here since I am allowed only one attachment.

   #include <stdio.h>
   int main(){
      char str[121];
      float x[10]={ 7.569, 5.556, -1.640,  9.362,  1.057,
                   -2.385,-9.541,  1.449, -7.885, -1.108};
      int i,j;
      for(i=0; i<1000000; i++){       sprintf(str,"%12.3e%12.3e%12.3e%12.3e%12.3e%12.3e%12.3e%12.3e%12.3e%12.3e",
      x[0],x[1],x[2],x[3],x[4],x[5],x[6],x[7],x[8],x[9]);
      if((i+1)%200000 == 0)printf("%8d %s\n",i+1,str);
      }
}
Comment 1 Martin Liška 2016-11-28 09:52:41 UTC
You can add another attachments after you submit the bug.

I can confirm that it takes (GCC 6.2.1):
gfortran -Ofast pr78549.c  && time ./a.out
...
0m1.524s

gfortran -Ofast pr78549.f90  && time ./a.out
...
0m4.985s
Comment 2 mecej4 2016-11-28 14:03:35 UTC
Thanks, Martin. After posting, I found that I could add more add more attachments, but I was a bit worried that, as in other forums, the initial post could not be edited or modified.

I presume that you ran your confirmation runs on Linux. Even if the Fortran version is three times slower than the C version, as you found, perhaps one could live with that. However, on Windows(+Cygwin,Mingw,etc.) the performance of the Fortran version is terrible, and I think that this reflects poorly on the otherwise fine Gfortran compiler, which often outperforms expensive commercial Fortran compilers in the quality of the generated code.

A collaborator tells me that the problem did not exist in the Windows/MinGW release of Gfortran 4.9.3. I could not test that assertion myself because the installer failed on my Windows 10 system.
Comment 3 Jerry DeLisle 2016-11-28 21:34:39 UTC
We cannot address the Windows(+Cygwin,Mingw,etc.) issues which have more to do with the OS related layers I assume.

I will see what I get on Cygwin and report back in a few days.
Comment 4 mecej4 2016-11-29 02:24:34 UTC
(In reply to Jerry DeLisle from comment #3)
> We cannot address the Windows(+Cygwin,Mingw,etc.) issues which have more to
> do with the OS related layers I assume.
> 
> I will see what I get on Cygwin and report back in a few days.

Fair enough. For what it is worth, here are timings for the same test programs built with Gfortran/GCC 6.2.0 under Windows Subsystem for Linux (Ubuntu 14) on Windows-10:

   GCC 6.2.0      :     1.984 s
   GFortran 6.2.0 :     8.438 s
Comment 5 Jerry DeLisle 2016-11-29 23:23:57 UTC
Assigning to myself so I do not forget to look at it in the pile.

Can you explain what "Windows Subsystem for Linux (Ubuntu 14) on Windows-10" is?
Comment 6 mecej4 2016-11-30 03:06:18 UTC
(In reply to Jerry DeLisle from comment #5)
> 
> Can you explain what "Windows Subsystem for Linux (Ubuntu 14) on Windows-10"
> is?

Sorry if I created a bad portmanteau phrase. 

Here is a link to Microsoft's own description of WSL (Windows Subsystem for Linux) : https://msdn.microsoft.com/en-us/commandline/wsl/about .

This beta product for Windows 10-x64 provides a Bash shell in which character I/O, user-mode Linux applications can be run (only 64-bit a.out supported). I found it fairly simple to install, and it worked well for the tests that I tried. Once Bash has been installed, one can install Ubuntu-14 packages using Apt-Get; in particular, there is a Gfortran 6.2 package that works.

While browsing the sources of LibGfortran, I found the file which seemed to be a candidate for profiling: write_float.def, which I understand you are currently revising.
Comment 7 Jerry DeLisle 2016-11-30 04:01:53 UTC
(In reply to mecej4 from comment #6)
> (In reply to Jerry DeLisle from comment #5)
> > 
> > Can you explain what "Windows Subsystem for Linux (Ubuntu 14) on Windows-10"
> > is?
> 
> Sorry if I created a bad portmanteau phrase. 
> 
> Here is a link to Microsoft's own description of WSL (Windows Subsystem for
> Linux) : https://msdn.microsoft.com/en-us/commandline/wsl/about .
> 
> This beta product for Windows 10-x64 provides a Bash shell in which
> character I/O, user-mode Linux applications can be run (only 64-bit a.out
> supported). I found it fairly simple to install, and it worked well for the
> tests that I tried. Once Bash has been installed, one can install Ubuntu-14
> packages using Apt-Get; in particular, there is a Gfortran 6.2 package that
> works.
> 
> While browsing the sources of LibGfortran, I found the file which seemed to
> be a candidate for profiling: write_float.def, which I understand you are
> currently revising.

Thanks for the explanation. I am done with planned changes to write_float.def. It really is not fair to compare f2c/gcc because f2c does some optimizations we can not do and we have many runtime checks that have to be done to support F95, F2003, F2008. Regardless I am always looking for ways to improve. Unfortunately, some of my ideas would involve significant rewrites of things. (aka changing the entire structure of how we do things in libgfortran)

With that said, I don't think write_float is fundamentally the issue here. We have to handle a plethora of formatting of the float strings, depending on the the format string. decimal status, comma status, blank status, rounding modes, etc. We call snprintf to get the basic numeric strings and build the required output from there. After all that, we send it out.

Regardless, I will be looking at pr78531 (which is a result of doing some speedups for internal units) and I will be thinking more about this PR here.
Comment 8 Jerry DeLisle 2016-11-30 04:05:43 UTC
(In reply to Jerry DeLisle from comment #7)
Correction:
> Regardless, I will be looking at pr78351 (which is a result of doing some
> speedups for internal units) and I will be thinking more about this PR here.
Comment 9 Dominique d'Humieres 2017-10-05 21:03:37 UTC
Gfortran is even slower now (~9.5s), likely revision r251374 (pr78387).
Comment 10 Thomas Koenig 2017-10-06 16:57:23 UTC
Here are some gprof timings:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 28.73      2.06     2.06                             delete_root
 15.76      3.19     1.13                             build_float_string
  5.86      3.61     0.42                             formatted_transfer_scalar_write
  5.30      3.99     0.38                             _gfortrani_parse_format
  4.53      4.32     0.33                             get_float_string
  3.14      4.54     0.23                             data_transfer_init
  3.00      4.76     0.22                             write_float_0
  2.79      4.96     0.20                             next_format0
  2.09      5.11     0.15                             next_char
  1.95      5.25     0.14                             insert
  1.95      5.39     0.14                             parse_format_list
  1.81      5.52     0.13                             _gfortrani_free_format_hash_table
  1.67      5.64     0.12                             get_precision
  1.67      5.76     0.12                             select_buffer
  1.53      5.87     0.11                             get_gfc_unit
  1.26      5.96     0.09                             _gfortrani_next_format
  1.19      6.04     0.09                             _gfortrani_write_block
  1.12      6.12     0.08                             write_float_string
  1.05      6.20     0.08                             _gfortrani_next_record

What is striking is the long time spent in delete_root.
Comment 11 Jerry DeLisle 2017-10-06 17:17:58 UTC
Thanks for getting this profile. I agree, delete_root is to be looked at. Was this profile on trunk? Can you also post one for gcc6 or earlier.
Comment 12 Thomas Koenig 2017-10-06 17:22:23 UTC
This was for trunk.

This patch

Index: io.h
===================================================================
--- io.h        (Revision 253377)
+++ io.h        (Arbeitskopie)
@@ -773,6 +773,7 @@
 extern int newunit_alloc (void);
 internal_proto(newunit_alloc);
 
+extern void delete_unit (gfc_unit *);
 
 /* open.c */
 
Index: transfer.c
===================================================================
--- transfer.c  (Revision 253377)
+++ transfer.c  (Arbeitskopie)
@@ -4169,6 +4169,9 @@
          free_format (dtp);
        }
       unlock_unit (dtp->u.p.current_unit);
+
+      if (is_internal_unit (dtp))
+         delete_unit (dtp->u.p.current_unit);
     }
   library_end ();
 }
Index: unit.c
===================================================================
--- unit.c      (Revision 253377)
+++ unit.c      (Arbeitskopie)
@@ -307,7 +307,7 @@
 
 /* delete_unit()-- Delete a unit from a tree */
 
-static void
+void
 delete_unit (gfc_unit *old)
 {
   unit_root = delete_treap (old, unit_root);

helps only a little:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 26.26      1.88     1.88                             delete_root
 15.78      3.01     1.13                             build_float_string
  5.24      3.39     0.38                             formatted_transfer_scalar_write
  4.89      3.74     0.35                             _gfortrani_parse_format
  4.19      4.04     0.30                             next_format0
  4.12      4.33     0.30                             get_float_string
  3.77      4.60     0.27                             write_float_0
  2.51      4.78     0.18                             get_precision
  2.44      4.96     0.18                             data_transfer_init
  2.23      5.12     0.16                             _gfortrani_free_format_hash_table
Comment 13 Thomas Koenig 2017-10-06 17:54:18 UTC
Here is the timing for gcc 6:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.15      0.99     0.99                             output_float
 12.34      1.37     0.38                             formatted_transfer
 11.04      1.71     0.34                             next_format0
  7.14      1.93     0.22                             write_float
  3.73      2.05     0.12                             _gfortrani_write_block
  3.25      2.15     0.10                             next_char
  2.92      2.24     0.09                             data_transfer_init
  2.60      2.32     0.08                             _gfortrani_next_format
  2.60      2.40     0.08                             get_fnode
  2.60      2.48     0.08                             write_integer
  2.27      2.55     0.07                             _gfortrani_parse_format
  2.27      2.62     0.07                             finalize_transfer
  2.27      2.69     0.07                             parse_format_list
  1.95      2.75     0.06                             _gfortrani_next_record
  1.14      2.78     0.04                             _gfortran_transfer_array
  0.97      2.81     0.03                             _gfortrani_mem_alloc_r4

So, this does appear to be a regression for gcc8 at least.

The time is also longer (~20 s for trunk, ~14 s for gcc 6).
Comment 14 Jerry DeLisle 2017-10-06 21:38:55 UTC
The real issue is that to support DTIO with internal units I had to actually use a gfc_unit structure. Before DTIO we never did this.  At the time of doing DTIO I did not have a 'better idea' since by the nature of it, DTIO passes a UNIT to the child procedures. 

I do have something percolating in my head on this that will be better. I have not finished thinking it through yet. I have to do some factoring on the structures.

(In reply to Dominique d'Humieres from comment #9)
> Gfortran is even slower now (~9.5s), likely revision r251374 (pr78387).

This is because the gfc_unit caching scheme I tried before was incompatble with large numbers of threads. I decided that whole approach to the issue was a rabbit down a rabbit hole. I removed it, and we are slower.

Working on it.
Comment 15 Jerry DeLisle 2017-10-06 21:57:21 UTC
For my own baseline:

gcc6: real	0m6.948s

gcc7: real	0m9.906s

gcc8: real	0m10.415s

I backported removal of the caching mentioned in comment #14 to gcc7. The two should be identical except that gcc7 is a released branch. IIRC released branches have debugging code removed. I do not know if this accounts for the difference between gcc7 and gcc8.
Comment 16 Thomas Koenig 2017-10-07 11:23:10 UTC
Jerry,

what do you think of this approach? This creates a local copy
of the gfc_unit, without putting it into the tree.

Index: transfer.c
===================================================================
--- transfer.c  (Revision 253377)
+++ transfer.c  (Arbeitskopie)
@@ -4170,6 +4170,9 @@ st_write_done (st_parameter_dt *dtp)
        }
       unlock_unit (dtp->u.p.current_unit);
     }
+  if (is_internal_unit (dtp))
+    free (dtp->u.p.current_unit);
+
   library_end ();
 }
 
Index: unit.c
===================================================================
--- unit.c      (Revision 253377)
+++ unit.c      (Arbeitskopie)
@@ -225,7 +225,7 @@ insert (gfc_unit *new, gfc_unit *t)
 /* insert_unit()-- Create a new node, insert it into the treap.  */
 
 static gfc_unit *
-insert_unit (int n)
+insert_unit (int n, int internal)
 {
   gfc_unit *u = xcalloc (1, sizeof (gfc_unit));
   u->unit_number = n;
@@ -238,8 +238,11 @@ static gfc_unit *
   __GTHREAD_MUTEX_INIT_FUNCTION (&u->lock);
 #endif
   __gthread_mutex_lock (&u->lock);
-  u->priority = pseudo_random ();
-  unit_root = insert (u, unit_root);
+  if (!internal)
+    {
+      u->priority = pseudo_random ();
+      unit_root = insert (u, unit_root);
+    }
   return u;
 }
 
@@ -316,14 +319,21 @@ delete_unit (gfc_unit *old)
 
 /* get_gfc_unit()-- Given an integer, return a pointer to the unit
    structure.  Returns NULL if the unit does not exist,
-   otherwise returns a locked unit. */
+   otherwise returns a locked unit.  If internal, do not put it
+   in the unit cache.  */
 
 static gfc_unit *
-get_gfc_unit (int n, int do_create)
+get_gfc_unit (int n, int do_create, int internal)
 {
   gfc_unit *p;
   int c, created = 0;
 
+  if (internal)
+    {
+      p = insert_unit (n, 1);
+      return p;
+    }
+  
   __gthread_mutex_lock (&unit_lock);
 retry:
   for (c = 0; c < CACHE_SIZE; c++)
@@ -347,7 +357,7 @@ retry:
 
   if (p == NULL && do_create)
     {
-      p = insert_unit (n);
+      p = insert_unit (n, 0);
       created = 1;
     }
 
@@ -405,7 +415,7 @@ found:
 gfc_unit *
 find_unit (int n)
 {
-  return get_gfc_unit (n, 0);
+  return get_gfc_unit (n, 0, 0);
 }
 
 
@@ -412,7 +422,7 @@ find_unit (int n)
 gfc_unit *
 find_or_create_unit (int n)
 {
-  return get_gfc_unit (n, 1);
+  return get_gfc_unit (n, 1, 0);
 }
 
 
@@ -548,7 +558,7 @@ get_unit (st_parameter_dt *dtp, int do_create)
 
       dtp->u.p.unit_is_internal = 1;
       dtp->common.unit = newunit_alloc ();
-      unit = get_gfc_unit (dtp->common.unit, do_create);
+      unit = get_gfc_unit (dtp->common.unit, do_create, 1);
       set_internal_unit (dtp, unit, kind);
       fbuf_init (unit, 128);
       return unit;
@@ -563,9 +573,9 @@ get_unit (st_parameter_dt *dtp, int do_create)
      is not allowed, such units must be created with
      OPEN(NEWUNIT=...).  */
   if (dtp->common.unit < 0)
-    return get_gfc_unit (dtp->common.unit, 0);
+    return get_gfc_unit (dtp->common.unit, 0, 0);
 
-  return get_gfc_unit (dtp->common.unit, do_create);
+  return get_gfc_unit (dtp->common.unit, do_create, 0);
 }
 
 
@@ -592,7 +602,7 @@ init_units (void)
 
   if (options.stdin_unit >= 0)
     {                          /* STDIN */
-      u = insert_unit (options.stdin_unit);
+      u = insert_unit (options.stdin_unit, 0);
       u->s = input_stream ();
 
       u->flags.action = ACTION_READ;
@@ -624,7 +634,7 @@ init_units (void)
 
   if (options.stdout_unit >= 0)
     {                          /* STDOUT */
-      u = insert_unit (options.stdout_unit);
+      u = insert_unit (options.stdout_unit, 0);
       u->s = output_stream ();
 
       u->flags.action = ACTION_WRITE;
@@ -655,7 +665,7 @@ init_units (void)
 
   if (options.stderr_unit >= 0)
     {                          /* STDERR */
-      u = insert_unit (options.stderr_unit);
+      u = insert_unit (options.stderr_unit, 0);
       u->s = error_stream ();
 
       u->flags.action = ACTION_WRITE;
Comment 17 Thomas Koenig 2017-10-07 11:24:13 UTC
I mean apart from the fact that this creates a huge memory leak :-)
Comment 18 Dominique d'Humieres 2017-10-07 13:32:53 UTC
I do not see any timing difference between 7.2.1 (r253502) and trunk.
Comment 19 Thomas Koenig 2017-10-07 16:55:03 UTC
Created attachment 42320 [details]
patch which failes with dtio_14 and dtio_15, among others

Well, this one doesn't work yet.
Comment 20 Jerry DeLisle 2017-10-07 18:29:47 UTC
(In reply to Thomas Koenig from comment #19)
> Created attachment 42320 [details]
> patch which failes with dtio_14 and dtio_15, among others
> 
> Well, this one doesn't work yet.

Do you want to continue with it or would you like me to pick up from where you are?
Comment 21 Thomas Koenig 2017-10-07 21:46:31 UTC
Jerry,

I think you know more about this than I do, so please go ahead.
Comment 22 Thomas Koenig 2017-11-10 15:29:27 UTC
*** Bug 82938 has been marked as a duplicate of this bug. ***
Comment 23 Jerry DeLisle 2017-11-17 02:28:53 UTC
Well, instrumenting a little bit I see that delete_root is getting called many many many times. So, every call to newunit_alloc is assigning a new unit number which is getting added to the treap and never reused in anyway.

The result of this is that at the end of the program when any units left on the treap are deleted, we loop through over one million deletes.

So I now have a better idea of what we need to do.
Comment 24 Jerry DeLisle 2017-11-19 01:02:48 UTC
After a minor test tweak.

$ gfc -static pr78549.f 
$ time ./a.out 

real	0m24.049s
user	0m22.941s
sys	0m0.962s

$ gfc6 -static pr78549.f 
$ time ./a.out 

real	0m22.838s
user	0m22.718s
sys	0m0.005s

Without the tweak on trunk:

$ gfc -static pr78549.f 
$ time ./a.out 

real	0m28.916s
user	0m27.697s
sys	0m1.024s

With the minor tweak we are still a bit slower than 6, but then we added a few features since then also, so not too bad.  My initial tweak is going to require a little logic to be added.

Next report will be after I have that sorted out.
Comment 25 Jerry DeLisle 2017-11-21 02:17:42 UTC
Author: jvdelisle
Date: Tue Nov 21 02:17:11 2017
New Revision: 254982

URL: https://gcc.gnu.org/viewcvs?rev=254982&root=gcc&view=rev
Log:
2017-11-20  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	PR libgfortran/78549
	* io/io.h (newunit_free): Add declaration. Clean some whitespace.
	* io/transfer.c (st_read_done, st_write_done): Call newunit_free.
	* io/unit.c (newunit_free): Change type from static void to void.

Modified:
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/io/io.h
    trunk/libgfortran/io/transfer.c
    trunk/libgfortran/io/unit.c
Comment 26 Jerry DeLisle 2017-11-23 17:19:50 UTC
Author: jvdelisle
Date: Thu Nov 23 17:19:18 2017
New Revision: 255108

URL: https://gcc.gnu.org/viewcvs?rev=255108&root=gcc&view=rev
Log:
2017-11-23  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	Backport from trunk
	PR libgfortran/78549
	* io/io.h (newunit_free): Add declaration. Clean some whitespace.
	* io/transfer.c (st_read_done, st_write_done): Call newunit_free.
	* io/unit.c (newunit_free): Change type from static void to void.

Modified:
    branches/gcc-7-branch/libgfortran/ChangeLog
    branches/gcc-7-branch/libgfortran/io/io.h
    branches/gcc-7-branch/libgfortran/io/transfer.c
    branches/gcc-7-branch/libgfortran/io/unit.c
Comment 27 Jerry DeLisle 2017-11-23 17:47:58 UTC
This should be fixed now fairly well.  At this point there are not a lot of major issues in our own libgfortran library. I will leave this bug report open for a while if any issues arise.

Here is my latest profile. On my machine I am actuall getting slightly better performance on trunk (8) vs. gfortran 6

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self          
 time   seconds   seconds name    
 15.11      4.30     4.30 printf_positional
 14.65      8.47     4.17 __printf_fp_l
  7.13     10.50     2.03 __parse_one_specmb
  6.08     12.23     1.73 build_float_string
  4.16     13.42     1.19 formatted_transfer_scalar_write
  2.39     14.10     0.68 next_format0
  2.28     14.75     0.65 _IO_no_init
  2.23     15.38     0.64 __mpn_mul_1
  2.21     16.01     0.63 ____strtol_l_internal
  2.18     16.63     0.62 vfprintf
  1.86     17.16     0.53 __strchrnul_sse2
  1.79     17.67     0.51 _gfortrani_next_format
  1.77     18.18     0.51 _IO_str_init_static_internal
  1.76     18.68     0.50 __mpn_extract_double
  1.72     19.17     0.49 vsnprintf
  1.72     19.66     0.49 _IO_default_xsputn
  1.69     20.14     0.48 _int_malloc
  1.62     20.60     0.46 __memmove_sse2_unaligned_erms
  1.28     20.96     0.37 get_float_string
  1.12     21.28     0.32 _IO_padn
  1.02     21.57     0.29 __memset_sse2_unaligned
  1.02     21.86     0.29 __wmemset_sse2_unaligned
  0.91     22.12     0.26 flt128_ais
Comment 28 Thomas Koenig 2017-11-23 19:37:09 UTC
No longer a regression.
Comment 29 Manfred Schwarb 2017-11-24 00:00:28 UTC
Here are the results from my test case from PR82938 
(without the "print*,f" statement, which consumes ~1s):

  1.1s for GNU Fortran (GCC) 8.0.0 20170828 (experimental) [trunk revision 251373]
  3.3s for GNU Fortran (GCC) 8.0.0 20170829 (experimental) [trunk revision 251395]
  1.3s for GNU Fortran (GCC) 8.0.0 20171123 (experimental) [trunk revision 255090]
  0.4s for version with Fortran wrapper of C function strtof()

So, with Jerry's patch gfortran is now only a little slower than before
the DTIO patches, so still some minor regression for internal read. 

But there is still some large gap compared to the C routine version...
Comment 30 Jerry DeLisle 2017-12-06 03:55:25 UTC
Not quite fixed yet. I have found:

==17070== 88 bytes in 1 blocks are definitely lost in loss record 3 of 9
==17070==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17070==    by 0x4E580E6: _gfortrani_xcalloc (memory.c:83)
==17070==    by 0x503D64E: _gfortrani_open_internal4 (unix.c:1031)
==17070==    by 0x503BCCB: _gfortrani_set_internal_unit (unit.c:495)
==17070==    by 0x503C2E4: _gfortrani_get_unit (unit.c:556)
==17070==    by 0x503A0FF: data_transfer_init (transfer.c:2680)
==17070==    by 0x40122E: MAIN__ (dtio_14.f90:59)
==17070==    by 0x401442: main (dtio_14.f90:41)

and

==17070==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17070==    by 0x4E58048: _gfortrani_xmalloc (memory.c:42)
==17070==    by 0x5043F53: _gfortrani_fbuf_init (fbuf.c:41)
==17070==    by 0x503C2F1: _gfortrani_get_unit (unit.c:557)
==17070==    by 0x503A0FF: data_transfer_init (transfer.c:2680)
==17070==    by 0x401004: MAIN__ (dtio_14.f90:51)
==17070==    by 0x401442: main (dtio_14.f90:41)
==17070== 

with valgrind

We are missing freeing these structures.
Comment 31 Jerry DeLisle 2017-12-14 02:31:21 UTC
Author: jvdelisle
Date: Thu Dec 14 02:30:49 2017
New Revision: 255621

URL: https://gcc.gnu.org/viewcvs?rev=255621&root=gcc&view=rev
Log:
2017-12-12  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

        PR libgfortran/78549
        * io/inquire.c (inquire_via_unit): Adjust test for existence for
        pre-connected internal units.
        * io/transfer.c (finalize_transfer): When done with a transfer
        to internal units, free the format buffer and close the stream.
        (st_read_done): Delete freeing the stream, now handled using
        sclose in finalize_transfer. (st_write_done): Likewise.
        * io/unit.c (get_unit): Return NULL for special reserved unit
        numbers, signifying not accessible to the user.
        (init_units): Insert the two special internal units into the
        unit treap. This makes these unit structures available without
        further allocations for later use by internal unit I/O. These
        units are automatically deleted by normal program termination.
        * io/unix.c (mem_close): Add a guard check to protect from double free.

Modified:
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/io/inquire.c
    trunk/libgfortran/io/transfer.c
    trunk/libgfortran/io/unit.c
    trunk/libgfortran/io/unix.c
Comment 32 Jerry DeLisle 2017-12-16 19:59:45 UTC
Author: jvdelisle
Date: Sat Dec 16 19:59:14 2017
New Revision: 255751

URL: https://gcc.gnu.org/viewcvs?rev=255751&root=gcc&view=rev
Log:
2017-12-16  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	Backport from trunk
	PR libgfortran/78549
	* io/inquire.c (inquire_via_unit): Adjust test for existence for
	pre-connected internal units.
	* io/transfer.c (finalize_transfer): When done with a transfer
	to internal units, free the format buffer and close the stream.
	(st_read_done): Delete freeing the stream, now handled using
	sclose in finalize_transfer. (st_write_done): Likewise.
	* io/unit.c (get_unit): Return NULL for special reserved unit
	numbers, signifying not accessible to the user.
	(init_units): Insert the two special internal units into the
	unit treap. This makes these unit structures available without
	further allocations for later use by internal unit I/O. These
	units are automatically deleted by normal program termination.
	* io/unix.c (mem_close): Add a guard check to protect from double free.

Modified:
    branches/gcc-7-branch/libgfortran/ChangeLog
    branches/gcc-7-branch/libgfortran/io/inquire.c
    branches/gcc-7-branch/libgfortran/io/transfer.c
    branches/gcc-7-branch/libgfortran/io/unit.c
    branches/gcc-7-branch/libgfortran/io/unix.c
Comment 33 Jakub Jelinek 2018-05-02 10:05:22 UTC
GCC 8.1 has been released.
Comment 34 Jerry DeLisle 2018-06-20 01:11:43 UTC
Closing, have done enough.