Bug 43551 - [4.4/4.5 Regression] Buffered direct I/O reads wrong record
Summary: [4.4/4.5 Regression] Buffered direct I/O reads wrong record
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libfortran (show other bugs)
Version: 4.5.0
: P4 normal
Target Milestone: 4.4.4
Assignee: Tobias Burnus
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2010-03-27 19:34 UTC by Tobias Burnus
Modified: 2010-03-29 06:20 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.0
Known to fail: 4.4.4 4.5.0
Last reconfirmed: 2010-03-28 17:12:33


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2010-03-27 19:34:09 UTC
Quantum Espresso (http://www.quantum-espresso.org/download.php) is miscompiled, cf. http://www.democritos.it/pipermail/pw_forum/2010-March/016356.html

Fetch code, configure with F77=gfortran F90=gfortran and compile. Run the example02 and compare the output for "ph.x < si.dynG" with the reference, with an other compiler or with gfortran 4.3. Compare e.g. the "Dielectric constant" matrix. The error is about 10%; for the "Effective charges" it is even larger.

I am currently hunting the regression and then will try to reduce the code.
Comment 1 Richard Biener 2010-03-27 19:41:19 UTC
This is very little information.  What target, what set of optimization options?
Comment 2 Tobias Burnus 2010-03-27 19:52:11 UTC
Working: 2009-04-05-r145558
Failing: 2009-04-06-r145580

Changelog shows a couple of I/O patches - and indeed if one uses the working version with the current 4.4.4 libgfortran the bug is also present. Now, I need to find a minimal example. The I/O patch I suspect is:
  PR fortran/38654
  http://gcc.gnu.org/viewcvs?view=rev&revision=145571
Comment 3 Tobias Burnus 2010-03-27 20:56:28 UTC
No success so far; I will try it tomorrow. To reproduce: Fetch, as written in comment 0, the source code and untar it.

./configure F77=gfortran F90=gfortran FFLAGS_NOOPT='-O0 -g' BLAS_LIBS=-lblas LAPACK_LIBS=-llapack  FFLAGS='-O1 -g' && make pwall

cd ../espresso-4.1.1; ln -s ../espresso-4.1.2/bin .
cd examples/example02;
./run_example
# abort while: "running the phonon calculation at Gamma for Si..."

You now need and should have:
a) results/si.phG.in
b) $HOME/tmp/si.wfc and $HOME/tmp/si.save
(You can delete all other files under $HOME/tmp/ and under results.) Those files are identical with failing and working libgfortran.

Run now: ../../../bin/ph.x < si.phG.in
The wrong output is in ./si.dynG, stdout and in $HOME/tmp/_phsi.phsave/data-file.xml* Wrong output can for instance be found in *.xml.1 for item "<DIELECTRIC_CONSTANT ...". The first value should be around 1.380642769884322E+001 and wrong is around 1.283252593899003E+001.
Comment 4 Tobias Burnus 2010-03-27 21:43:47 UTC
Just another data point: I have disabled the format caching (format_cache_ok = false in io/format.c), but it did not seem to help (on the trunk).
Comment 5 Richard Biener 2010-03-27 21:55:48 UTC
Fortran.  P4.
Comment 6 Tobias Burnus 2010-03-27 22:22:34 UTC
The XML reading seems to be OK - at least I have added some write statements to iotk_dat.spp (at the "dat =" lines; then "make update" before clean & build) and the output is the same for the 2009-04-05 and the trunk libgfortran.
Comment 7 Jerry DeLisle 2010-03-28 03:02:47 UTC
I am not able to help here until probably Monday night. Turning off format caching is very easy for debugging purposes.  I will keep an eye on this until I can get to a workstation I can use.
Comment 8 Tobias Burnus 2010-03-28 08:19:15 UTC
(In reply to comment #7)
> Turning off format caching is very easy for debugging purposes.
I tried this (see comment 4), but it did not seem to help.

I now added a printf for "*(GFC_REAL_8 *)dest, buffer" in read_f, but "dest" is the same with libgfortran 4.3.x and 4.5.
Comment 9 Tobias Burnus 2010-03-28 08:52:04 UTC
The issue seems to be the buffering. If one uses GFORTRAN_UNBUFFERED_ALL=1 the result is correct, without it is not.
Comment 10 Tobias Burnus 2010-03-28 11:56:10 UTC
And the culprit is unit 30 ("$HOME/tmp/_phsi.ebar" alias unit=iuebar), which is a binary file (672000 bytes), which is automatically deleted (PH/close_phq.f90).

The file is opened with: form='unformatted', access='direct', recl=22400
(= 2800 * DIRECT_IO_FACTOR w/ DIRECT_IO_FACTOR = 8 for real(8).)

 * * *

! Test case:

implicit none
integer, parameter :: size = 2800 ! << needs to be large enough
real(8) :: vec1(size,30), dummy(size)
integer i

CALL RANDOM_NUMBER(vec1)

open(99, file='test.dat', form='unformatted', access='direct', recl=size*8)
do i = 1, 10
  write(99,rec=i) vec1(:,i)
  write(99,rec=i+10) vec1(:,i+10)
  write(99,rec=i+20) vec1(:,i+20)
end do

do i = 1, 10
  read(99,rec=i) dummy
  if (any (dummy /= vec1(:,i))) call abort()
  read(99,rec=i+10) dummy
  if (any (dummy /= vec1(:,i+10))) call abort()
  read(99,rec=i+20) dummy
  if (any (dummy /= vec1(:,i+20))) call abort() ! << aborts here for rec = 21
end do

close(99, status='delete')
end

 * * *

Closer examination shows that reading rec = 1, rec = 11, rec = 21 returns the record 1, 11, 30. That is: The third read returns the *last* record instead of the 21th!
Comment 11 Tobias Burnus 2010-03-28 17:12:33 UTC
Mine. I have a patch: http://gcc.gnu.org/ml/fortran/2010-03/msg00190.html
Comment 12 Tobias Burnus 2010-03-29 06:17:32 UTC
Subject: Bug 43551

Author: burnus
Date: Mon Mar 29 06:17:19 2010
New Revision: 157792

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157792
Log:
2010-03-29  Tobias Burnus  <burnus@net-b.de>

        PR fortran/43551
        * io/unix.c (buf_write): Set physical_offset after lseek.

2010-03-29  Tobias Burnus  <burnus@net-b.de>

        PR fortran/43551
        * gfortran.dg/direct_io_12.f90: New test.


Added:
    trunk/gcc/testsuite/gfortran.dg/direct_io_12.f90
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/io/unix.c

Comment 13 Tobias Burnus 2010-03-29 06:18:25 UTC
Subject: Bug 43551

Author: burnus
Date: Mon Mar 29 06:18:16 2010
New Revision: 157793

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157793
Log:
2010-03-29  Tobias Burnus  <burnus@net-b.de>

        PR fortran/43551
        * io/unix.c (buf_write): Set physical_offset after lseek.

2010-03-29  Tobias Burnus  <burnus@net-b.de>

        PR fortran/43551
        * gfortran.dg/direct_io_12.f90: New test.


Added:
    branches/gcc-4_4-branch/gcc/testsuite/gfortran.dg/direct_io_12.f90
Modified:
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_4-branch/libgfortran/ChangeLog
    branches/gcc-4_4-branch/libgfortran/io/unix.c

Comment 14 Tobias Burnus 2010-03-29 06:20:22 UTC
FIXED.
Reminder: If one has an older version than today's, the work around is to set the environment variable GFORTRAN_UNBUFFERED_ALL=1