Bug 23814 - unformatted files from gfortran are incompatible with g77 unformatted files and solaris f95 unformatted files
Summary: unformatted files from gfortran are incompatible with g77 unformatted files a...
Status: RESOLVED DUPLICATE of bug 19303
Alias: None
Product: gcc
Classification: Unclassified
Component: fortran (show other bugs)
Version: 4.1.0
: P1 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-11 03:16 UTC by Rob Ratcliff
Modified: 2005-09-12 00:35 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rob Ratcliff 2005-09-11 03:16:37 UTC
I ran into a problem with unformatted files with gfortran. It appears that it is
padding the delimitters to 8 byte boundaries (or the delimitters are 64 bit
longs) rather than the normal 4 byte delimitter approach. 

For instance, I have a program that does this:

       write(1) 1
       end

g77 and solaris f95 returns (ignoring the byteswapping issues):
od -x fort.1:
0000000 0004 0000 0001 0000 0004 0000

gfortran returns:
0000000 0004 0000 0000 0000 0001 0000 0004 0000
0000020 0000 0000

is there a flag to make the delimitters 4 bytes rather than 8 bytes?
Comment 1 Rob Ratcliff 2005-09-11 04:07:58 UTC
I just found this discussion:
http://gcc.gnu.org/ml/fortran/2005-05/msg00431.html

It doesn't look like from the docs that it was implemented in the main-line yet,
is it available somehow else?

It seems to me that unformatted files on 32 bit machines should be compatible.
I don't know of any other fortran compiler that assumes 64 bit record markers. 
We really need a flag to change the behavior if it is not available already.

BTW, it is pretty typical that aero engineers involved in CFD (computational
fluid dynamics) need to ship around large (50-1000 MB) binary files between
various machines without having to reformat them. Typically they generate files
that are all big-endian using a compiler switch to avoid having to byte swap as
well. So, while we're at it, it'd be great to have a compiler switch that
reversed the byte order of integers (2, 4, 8 byte) and floating point numbers
(4, 8 byte) when they are read from or written to an unformatted file.
Comment 2 Janne Blomqvist 2005-09-11 12:09:13 UTC
Bud Davis is back and working on the pluggable record markers patch. Expect it
to be completed and committed within a few weeks.

There is no simple solution that is right for all situations. Gfortran uses
64-bit record markers by default since we want compatibility between LP32 and
LP64 bit platforms (which incidentally g77 doesn't provide), and we want to
support records bigger than 2 GB.

There has been some discussion about a byteswapio patch, but nothing has been
done. Patches are welcome, of course.

And, I would hardly classify the bug as "critical".

If you want portable binary io you're probably better off using a library such
as netcdf anyway.
Comment 3 Rob Ratcliff 2005-09-11 13:24:28 UTC
I believe it really is critical since myself and many others who may use
gfortran need to interoperate with data generated by legacy codes on the same
system that were compiled with g77 or on other systems (Sun, SGI) compiled with
their native f77 or f90/f95 compilers. Some of the codes are proprietary, many
are from other third parties, it isn't really feasible to force them to use
another binary file library. Plus, I've been working with unformatted FORTRAN
files for 15 years and this is the first time I've had this type of issue with
the structure of an unformatted file. 

So given the current situtation, I'll have to write a format convertor for
unformatted data from any gfortran code to the "standard" g77 format in order to
interoperate. 

I would've thought that the FORTRAN spec would've covered this kind of thing.
Comment 4 Andrew Pinski 2005-09-11 14:27:27 UTC
More than that this is a dup of bug 19303.  unformatted was never supposed to be used with different 
versions of the compiler, or across different targets.  It is just like using
write in C.

*** This bug has been marked as a duplicate of 19303 ***
Comment 5 Rob Ratcliff 2005-09-11 20:47:17 UTC
Well, just to warn you, you're going to have a lot of steamed engineers on your
hands when they discover that they either have to recompile all of their FORTRAN
codes on every platform with gfortran, write all of their data (50 - 1000MB
binaries and larger) as ASCII, or convert their files to the traditional g77
format to interoperate with the rest of their processes. What will actually
happen is that users will get burnt once and then they'll drop gfortran like a
hot potato and put in a request to purchase a commercial compiler from Intel or PGI.

Another reason that this feature is so painful is that engineers tend to
pipeline their unformatted files from one process to the next. There can
literally be 10 or more programs in a process that will read from or write to a
given unformatted file. Any program in the process that was compiled with
gfortran will break the process and probably in such a subtle way that it'll
take each user hours to figure out what went wrong. (I spent four hours
on it yesterday discovering what the problem was in my process and that's with
knowing how to read the output from "od".)

Furthermore, when one writes binary in C, you get exactly what your variables
are sized to in your code. If the platform is a 32 bit machine and is IEEE
compliant, you pretty much know that a short is 16bit, an int is 32 bit, a long
is 32 bit and a long long is 64 bit. Typically, many times developers even
define macros or new types that guarantee that the variables are the same
lengths independent of 32 bit or 64 bit architectures. There are also compiler
switches many times that govern the length of the various primitive types. So if
portability of the data is important to you, the resulting binary file is
interoperable except for big-Endian versus little-Endian issues...(that can be
worked with a flag at the top or always writing in one endianess.) With C binary
files, of course, you don't have to worry about the the silly record markers
either that muck up the works.

I think the goal of allowing record lengths > 2GB is a good long time target,
but having been in the field for many years, I imagine that the current use
cases for record lengths > 2GBs are very very few compared to those involving
interoperability with other compilers and platforms. The few users requiring
>2GB record lengths can easily modify their write statement to output multiple
records as well rather than one large one.

Hopefully, once there is a compiler switch in place, everybody will be happy. :-)

p.s. It is too bad the FORTRAN spec (even the 2003) threw in the towel 
on interoperable binary files. It forces everybody to deal with these issues
in different ways using various third party libaries or ad hoc cobbled-together
solutions. As shown by the CORBA standard and others, the specification of
interoperable binary files is completely doable.

I imagine the FORTRAN vendors will continue to ensure that their binary file
format can be completely specified by the user just to meet the needs of their
customers even though the spec doesn't force them to.
Comment 6 kargls 2005-09-11 22:05:09 UTC
(In reply to comment #5)
> 
> Furthermore, when one writes binary in C, you get exactly what your variables
> are sized to in your code. If the platform is a 32 bit machine and is IEEE
> compliant,

What happens when one or the other of these conditions isn't met?

> I imagine the FORTRAN vendors ....

The correct spelling of the name of the language is Fortran.

Your comments #3 and #5 are nice little rants.  Actual code to fix
the problem speaks volumes over your rants.  In particular, you've
been told that Bud Davis is working on the problem.  If there was
an easy solution to the problem, Bud (or one the others working on
gfortran) would have fixed it long ago.


Comment 7 Rob Ratcliff 2005-09-11 23:22:41 UTC
I'm not sure why I'm getting so much pushback on this silly thing.

I realize that disagreeing with the assumptions made during the design may be
regarded by some as "rants", but what I was attempting to do (perhaps poorly) is
illustrate why simple decisions that might seem fairly benign can have huge
efficiency impacts on a large population of users. There has been a pattern of
these decisions made over the years that have wasted thousands (if not millions)
of hours of people's precious time. (Big Endian vs. Little Endian, \ versus /,
CR vs CR/LF vs LF, 8 byte vs 4 byte markers, etc.)

If you read some of the previous comments, you'll see that some don't think it's
an issue. It really is a problem that should take high priority. I know Bud is
going to apply a variation of the patch he wrote a few months ago soon and I'm
happy about that. I hope there isn't any pushback from the rest of the
developers. I think the default should actually be 4 byte markers, but that's
just my humble opinion.

BTW, I think both spellings of FORTRAN (FORmula TRANslation) 
are correct actually: http://www.ibiblio.org/pub/languages/fortran/ch1-1.html
http://www.engin.umd.umich.edu/CIS/course.des/cis400/fortran/fortran.html
(Not that it really matters in the big scheme of things.)

I'll also post a small C program to convert to the g77 format soon as a
temporary fix until the patch is in place. (I'm completely hammered with
work right now, but I'll try to contribute more in the future. I've already
sent in some code snippets on the little endian/big endian issue.)

Also, if I wanted to be condescended to I'd go talk to my wife. :-)
I hope that we can all keep this professional in the future and 
respect people's time (development, trouble shooting and bug reporting) 
that they put into this to help make a better product for
everybody.
Comment 8 kargls 2005-09-12 00:35:16 UTC
(In reply to comment #7)
> 
> I realize that disagreeing with the assumptions made during the design may be
> regarded by some as "rants", but what I was attempting to do (perhaps poorly) is
> illustrate why simple decisions that might seem fairly benign can have huge
> efficiency impacts on a large population of users.

Why do you think that this was a "simple decision" in the initial design?
The world is moving to 64-bit CPUs, and a 32-bit record marker effects
performance (think about alignment issues).  Bud has thought about this
problem for several months, produced a plausible patch, and then Real Life
got into his way.  A fix to this problem takes time. There is no simple solution. 

> If you read some of the previous comments, you'll see that some don't think it's
> an issue. It really is a problem that should take high priority.

This isn't pushback but reality.  There are only a handful of 
volunteers hacking on the code.  What is a high priority to you
may not be very high on some hacker's lists.  To me, fixing the
known bugs in modules is much higher priority than changing a
functioning portion of the compiler. 

> I know Bud is going to apply a variation of the patch he wrote a
> few months ago soon and I'm happy about that. I hope there isn't
> any pushback from the rest of the developers.

I doubt that there will be pushback.  Yes, we will review the code
and make suggestions. But, most of the developers will welcome Bud's
effort.

> I think the default should actually be 4 byte markers, but that's
> just my humble opinion.

I only use opteron base systems where a 64-bit marker is preferred.

> 
> BTW, I think both spellings of FORTRAN (FORmula TRANslation) 
> are correct actually: http://www.ibiblio.org/pub/languages/fortran/ch1-1.html
> http://www.engin.umd.umich.edu/CIS/course.des/cis400/fortran/fortran.html
> (Not that it really matters in the big scheme of things.)

Read the Standard.  It very carefully uses "FORTRAN 77" to identify
specific references to ISO 1539:1980.  Indeed, the passage in 1.6 
says  "Each Fortran International Standard since ISO 1539:1980 (informally
referred to as FORTRAN 77)".  Note, "ORTRAN" actually appears in small
caps.  Everywhere else the Standard carefully uses Fortran.

> I'll also post a small C program to convert to the g77 format soon as a
> temporary fix until the patch is in place.

Thanks.

> (I'm completely hammered with
> work right now, but I'll try to contribute more in the future. I've already
> sent in some code snippets on the little endian/big endian issue.)

So, you can appreciate the demands on the developers. :-)
I would love to devote several hours a week to gfortran, but
time is occupied by Real Life.

> I hope that we can all keep this professional in the future and 
> respect people's time (development, trouble shooting and bug reporting) 
> that they put into this to help make a better product for
> everybody.

Sorry if my comment appeared to be too strong, but your Comment #3 and
#5 appeared to be "preaching to the choir".  We know there's a problem.
Bud is working on it.