I ran into a problem with unformatted files with gfortran. It appears that it is padding the delimitters to 8 byte boundaries (or the delimitters are 64 bit longs) rather than the normal 4 byte delimitter approach. For instance, I have a program that does this: write(1) 1 end g77 and solaris f95 returns (ignoring the byteswapping issues): od -x fort.1: 0000000 0004 0000 0001 0000 0004 0000 gfortran returns: 0000000 0004 0000 0000 0000 0001 0000 0004 0000 0000020 0000 0000 is there a flag to make the delimitters 4 bytes rather than 8 bytes?
I just found this discussion: http://gcc.gnu.org/ml/fortran/2005-05/msg00431.html It doesn't look like from the docs that it was implemented in the main-line yet, is it available somehow else? It seems to me that unformatted files on 32 bit machines should be compatible. I don't know of any other fortran compiler that assumes 64 bit record markers. We really need a flag to change the behavior if it is not available already. BTW, it is pretty typical that aero engineers involved in CFD (computational fluid dynamics) need to ship around large (50-1000 MB) binary files between various machines without having to reformat them. Typically they generate files that are all big-endian using a compiler switch to avoid having to byte swap as well. So, while we're at it, it'd be great to have a compiler switch that reversed the byte order of integers (2, 4, 8 byte) and floating point numbers (4, 8 byte) when they are read from or written to an unformatted file.
Bud Davis is back and working on the pluggable record markers patch. Expect it to be completed and committed within a few weeks. There is no simple solution that is right for all situations. Gfortran uses 64-bit record markers by default since we want compatibility between LP32 and LP64 bit platforms (which incidentally g77 doesn't provide), and we want to support records bigger than 2 GB. There has been some discussion about a byteswapio patch, but nothing has been done. Patches are welcome, of course. And, I would hardly classify the bug as "critical". If you want portable binary io you're probably better off using a library such as netcdf anyway.
I believe it really is critical since myself and many others who may use gfortran need to interoperate with data generated by legacy codes on the same system that were compiled with g77 or on other systems (Sun, SGI) compiled with their native f77 or f90/f95 compilers. Some of the codes are proprietary, many are from other third parties, it isn't really feasible to force them to use another binary file library. Plus, I've been working with unformatted FORTRAN files for 15 years and this is the first time I've had this type of issue with the structure of an unformatted file. So given the current situtation, I'll have to write a format convertor for unformatted data from any gfortran code to the "standard" g77 format in order to interoperate. I would've thought that the FORTRAN spec would've covered this kind of thing.
More than that this is a dup of bug 19303. unformatted was never supposed to be used with different versions of the compiler, or across different targets. It is just like using write in C. *** This bug has been marked as a duplicate of 19303 ***
Well, just to warn you, you're going to have a lot of steamed engineers on your hands when they discover that they either have to recompile all of their FORTRAN codes on every platform with gfortran, write all of their data (50 - 1000MB binaries and larger) as ASCII, or convert their files to the traditional g77 format to interoperate with the rest of their processes. What will actually happen is that users will get burnt once and then they'll drop gfortran like a hot potato and put in a request to purchase a commercial compiler from Intel or PGI. Another reason that this feature is so painful is that engineers tend to pipeline their unformatted files from one process to the next. There can literally be 10 or more programs in a process that will read from or write to a given unformatted file. Any program in the process that was compiled with gfortran will break the process and probably in such a subtle way that it'll take each user hours to figure out what went wrong. (I spent four hours on it yesterday discovering what the problem was in my process and that's with knowing how to read the output from "od".) Furthermore, when one writes binary in C, you get exactly what your variables are sized to in your code. If the platform is a 32 bit machine and is IEEE compliant, you pretty much know that a short is 16bit, an int is 32 bit, a long is 32 bit and a long long is 64 bit. Typically, many times developers even define macros or new types that guarantee that the variables are the same lengths independent of 32 bit or 64 bit architectures. There are also compiler switches many times that govern the length of the various primitive types. So if portability of the data is important to you, the resulting binary file is interoperable except for big-Endian versus little-Endian issues...(that can be worked with a flag at the top or always writing in one endianess.) With C binary files, of course, you don't have to worry about the the silly record markers either that muck up the works. I think the goal of allowing record lengths > 2GB is a good long time target, but having been in the field for many years, I imagine that the current use cases for record lengths > 2GBs are very very few compared to those involving interoperability with other compilers and platforms. The few users requiring >2GB record lengths can easily modify their write statement to output multiple records as well rather than one large one. Hopefully, once there is a compiler switch in place, everybody will be happy. :-) p.s. It is too bad the FORTRAN spec (even the 2003) threw in the towel on interoperable binary files. It forces everybody to deal with these issues in different ways using various third party libaries or ad hoc cobbled-together solutions. As shown by the CORBA standard and others, the specification of interoperable binary files is completely doable. I imagine the FORTRAN vendors will continue to ensure that their binary file format can be completely specified by the user just to meet the needs of their customers even though the spec doesn't force them to.
(In reply to comment #5) > > Furthermore, when one writes binary in C, you get exactly what your variables > are sized to in your code. If the platform is a 32 bit machine and is IEEE > compliant, What happens when one or the other of these conditions isn't met? > I imagine the FORTRAN vendors .... The correct spelling of the name of the language is Fortran. Your comments #3 and #5 are nice little rants. Actual code to fix the problem speaks volumes over your rants. In particular, you've been told that Bud Davis is working on the problem. If there was an easy solution to the problem, Bud (or one the others working on gfortran) would have fixed it long ago.
I'm not sure why I'm getting so much pushback on this silly thing. I realize that disagreeing with the assumptions made during the design may be regarded by some as "rants", but what I was attempting to do (perhaps poorly) is illustrate why simple decisions that might seem fairly benign can have huge efficiency impacts on a large population of users. There has been a pattern of these decisions made over the years that have wasted thousands (if not millions) of hours of people's precious time. (Big Endian vs. Little Endian, \ versus /, CR vs CR/LF vs LF, 8 byte vs 4 byte markers, etc.) If you read some of the previous comments, you'll see that some don't think it's an issue. It really is a problem that should take high priority. I know Bud is going to apply a variation of the patch he wrote a few months ago soon and I'm happy about that. I hope there isn't any pushback from the rest of the developers. I think the default should actually be 4 byte markers, but that's just my humble opinion. BTW, I think both spellings of FORTRAN (FORmula TRANslation) are correct actually: http://www.ibiblio.org/pub/languages/fortran/ch1-1.html http://www.engin.umd.umich.edu/CIS/course.des/cis400/fortran/fortran.html (Not that it really matters in the big scheme of things.) I'll also post a small C program to convert to the g77 format soon as a temporary fix until the patch is in place. (I'm completely hammered with work right now, but I'll try to contribute more in the future. I've already sent in some code snippets on the little endian/big endian issue.) Also, if I wanted to be condescended to I'd go talk to my wife. :-) I hope that we can all keep this professional in the future and respect people's time (development, trouble shooting and bug reporting) that they put into this to help make a better product for everybody.
(In reply to comment #7) > > I realize that disagreeing with the assumptions made during the design may be > regarded by some as "rants", but what I was attempting to do (perhaps poorly) is > illustrate why simple decisions that might seem fairly benign can have huge > efficiency impacts on a large population of users. Why do you think that this was a "simple decision" in the initial design? The world is moving to 64-bit CPUs, and a 32-bit record marker effects performance (think about alignment issues). Bud has thought about this problem for several months, produced a plausible patch, and then Real Life got into his way. A fix to this problem takes time. There is no simple solution. > If you read some of the previous comments, you'll see that some don't think it's > an issue. It really is a problem that should take high priority. This isn't pushback but reality. There are only a handful of volunteers hacking on the code. What is a high priority to you may not be very high on some hacker's lists. To me, fixing the known bugs in modules is much higher priority than changing a functioning portion of the compiler. > I know Bud is going to apply a variation of the patch he wrote a > few months ago soon and I'm happy about that. I hope there isn't > any pushback from the rest of the developers. I doubt that there will be pushback. Yes, we will review the code and make suggestions. But, most of the developers will welcome Bud's effort. > I think the default should actually be 4 byte markers, but that's > just my humble opinion. I only use opteron base systems where a 64-bit marker is preferred. > > BTW, I think both spellings of FORTRAN (FORmula TRANslation) > are correct actually: http://www.ibiblio.org/pub/languages/fortran/ch1-1.html > http://www.engin.umd.umich.edu/CIS/course.des/cis400/fortran/fortran.html > (Not that it really matters in the big scheme of things.) Read the Standard. It very carefully uses "FORTRAN 77" to identify specific references to ISO 1539:1980. Indeed, the passage in 1.6 says "Each Fortran International Standard since ISO 1539:1980 (informally referred to as FORTRAN 77)". Note, "ORTRAN" actually appears in small caps. Everywhere else the Standard carefully uses Fortran. > I'll also post a small C program to convert to the g77 format soon as a > temporary fix until the patch is in place. Thanks. > (I'm completely hammered with > work right now, but I'll try to contribute more in the future. I've already > sent in some code snippets on the little endian/big endian issue.) So, you can appreciate the demands on the developers. :-) I would love to devote several hours a week to gfortran, but time is occupied by Real Life. > I hope that we can all keep this professional in the future and > respect people's time (development, trouble shooting and bug reporting) > that they put into this to help make a better product for > everybody. Sorry if my comment appeared to be too strong, but your Comment #3 and #5 appeared to be "preaching to the choir". We know there's a problem. Bud is working on it.