Bug 36806 - [4.4 Regression] I/Os hang at rev. 137631 on darwin9
Summary: [4.4 Regression] I/Os hang at rev. 137631 on darwin9
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.0
: P1 blocker
Target Milestone: 4.4.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 36864
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-11 10:55 UTC by Dominique d'Humieres
Modified: 2008-08-06 20:25 UTC (History)
6 users (show)

See Also:
Host: *-apple-darwin9
Target: *-apple-darwin9
Build: *-apple-darwin9
Known to work:
Known to fail:
Last reconfirmed: 2008-07-20 18:48:52


Attachments
patch (921 bytes, patch)
2008-07-11 10:59 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique d'Humieres 2008-07-11 10:55:12 UTC
Starting from revision 137644 up to 137712 (137615 is working), unformatted IOs in FORTRAN give hanging executables. For instance, the executable from the following code

       data=-1
!       print *, 'before'
       write(11) data
!       print *, 'after'
       end

hangs after creating an empty fort.11 file:

[ibook-dhum] f90/bug% time a.out
^C0.000u 0.002s 0:51.34 0.0%    0+0k 0+1io 0pf+0w

If I compile the code with gfortran 4.3.1 and -S, then compile the assembly with gfortran 4.4.0, I get:

Bus error
0.000u 0.002s 0:00.91 0.0%      0+0k 0+1io 0pf+0w

The crash occurs in MAIN__ ()

(gdb) s
main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:21
21        MAIN__ ();
(gdb) s

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00000000 in ?? ()


Now if I compile with gfortran 4.4.0 and -S, then compile the assembly with gfortran 4.3.1, the executable gives the right fort.11 file.

So it seems that that something is miscompiled in libgfortran. Could it be related to pr36765?
Comment 1 Richard Biener 2008-07-11 10:59:06 UTC
Created attachment 15899 [details]
patch

Could be.  You can try the attached patch.
Comment 2 Richard Biener 2008-07-11 12:58:23 UTC
Which has now been checked in on the trunk.
Comment 3 Dominique d'Humieres 2008-07-11 13:26:39 UTC
> Could be.  You can try the attached patch.

The patch does not fix the problem.

Comment 4 Richard Biener 2008-07-11 13:41:04 UTC
I cannot reproduce this on i686-pc-linux-gnu.
Comment 5 Dominique d'Humieres 2008-07-11 13:53:39 UTC
I suspect the problem is specific to Darwin (probably also on ppc: the machinr regress did not test since 2008-07-08 23:54, revision 137630: http://gcc.gnu.org/ml/gcc-testresults/2008-07/msg00776.html).


Comment 6 Andreas Tobler 2008-07-11 13:57:18 UTC
yes, ppc too.
Comment 7 Andreas Tobler 2008-07-11 18:09:19 UTC
r137631 seems to be the commit where the hanging starts. I have successful results from 137630
Comment 8 Richard Biener 2008-07-11 18:44:31 UTC
That's the PRE rewrite.  I think Danny has access to i686-darwin, but reducing
this would be helpful I guess.
Comment 9 Dominique d'Humieres 2008-07-11 22:10:05 UTC
The executable for the following code

       open(unit=11,form='unformatted')
!       print *, 'open'
       end

also hangs.  Stepping with gdb gives:

Breakpoint 1, main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:11
11      {
(gdb) s
13        store_exe_path (argv[0]);
(gdb) s
*__gfortran_store_exe_path (argv0=0xbfffdd04 "/Volumes/MacBook/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out") at ../../../gcc-4.4-work/libgfortran/runtime/main.c:114
114       if (argv0[0] == '/')
(gdb) s
103     {
(gdb) s
114       if (argv0[0] == '/')
(gdb) s
116           exe_path = argv0;
(gdb) s
117           please_free_exe_path_when_done = 0;
(gdb) s
133     }
(gdb) s
main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:16
16        set_args (argc, argv);
(gdb) s
*__gfortran_set_args (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/runtime/main.c:82
82        argc_save = argc;
(gdb) s
83        argv_save = argv;
(gdb) s
84      }
(gdb) s
main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:21
21        MAIN__ ();
(gdb) s
25      }
(gdb) s
0x00001ee6 in start ()
(gdb) s
Single stepping until exit from function start, 
which has no line number information.
0x00003014 in dyld_stub_exit ()
(gdb) s
Single stepping until exit from function dyld_stub_exit, 
which has no line number information.
0x8fe18b20 in __dyld_fast_stub_binding_helper_interface ()
(gdb) s
Single stepping until exit from function __dyld_fast_stub_binding_helper_interface, 
which has no line number information.
0x8fe18b22 in __dyld_stub_binding_helper_interface ()
(gdb) s
Single stepping until exit from function __dyld_stub_binding_helper_interface, 
which has no line number information.
0x8fe18b42 in __dyld_misaligned_stack_error ()
(gdb) s
Single stepping until exit from function __dyld_misaligned_stack_error, 
which has no line number information.
0x8fe18b5a in __dyld_stub_binding_helper_interface2 ()
(gdb) s
Single stepping until exit from function __dyld_stub_binding_helper_interface2, 
which has no line number information.
0x8fe06e40 in __dyld__ZN4dyld14bindLazySymbolEPK11mach_headerPm ()
(gdb) s
Single stepping until exit from function __dyld__ZN4dyld14bindLazySymbolEPK11mach_headerPm, 
which has no line number information.
0x8fe0e430 in __dyld__ZNK11ImageLoader15containsAddressEPKv ()
(gdb) s
Single stepping until exit from function __dyld__ZNK11ImageLoader15containsAddressEPKv, 
which has no line number information.
0x8fe13250 in __dyld__ZNK16ImageLoaderMachO13beginSegmentsEv ()
(gdb) s
Single stepping until exit from function __dyld__ZNK16ImageLoaderMachO13beginSegmentsEv, 
which has no line number information.
0x8fe0e457 in __dyld__ZNK11ImageLoader15containsAddressEPKv ()
(gdb) s
Single stepping until exit from function __dyld__ZNK11ImageLoader15containsAddressEPKv, 
which has no line number information.
0x8fe06f22 in __dyld__ZN4dyld14bindLazySymbolEPK11mach_headerPm ()
(gdb) s
Single stepping until exit from function __dyld__ZN4dyld14bindLazySymbolEPK11mach_headerPm, 
which has no line number information.
0x8fe18b6f in __dyld_stub_binding_helper_interface2 ()
(gdb) s
Single stepping until exit from function __dyld_stub_binding_helper_interface2, 
which has no line number information.
0x93b6aeaf in exit ()
(gdb) s
Single stepping until exit from function exit, 
which has no line number information.
0xa0a74539 in dyld_stub___cxa_finalize ()
(gdb) s
Single stepping until exit from function dyld_stub___cxa_finalize, 
which has no line number information.
0x93b6aeeb in __cxa_finalize ()
(gdb) s
Single stepping until exit from function __cxa_finalize, 
which has no line number information.
0x8fe04ee0 in __dyld__ZN4dyld14runTerminatorsEPv ()
(gdb) s
Single stepping until exit from function __dyld__ZN4dyld14runTerminatorsEPv, 
which has no line number information.
0x8fe12ee0 in __dyld__ZN16ImageLoaderMachO13doTerminationERKN11ImageLoader11LinkContextE ()
(gdb) s
Single stepping until exit from function __dyld__ZN16ImageLoaderMachO13doTerminationERKN11ImageLoader11LinkContextE, 
which has no line number information.
cleanup () at ../../../gcc-4.4-work/libgfortran/runtime/main.c:174
174     {
(gdb) s
0x000fa493 in __i686.get_pc_thunk.bx ()
(gdb) s
Single stepping until exit from function __i686.get_pc_thunk.bx, 
which has no line number information.
cleanup () at ../../../gcc-4.4-work/libgfortran/runtime/main.c:175
175       close_units ();
(gdb) s
*__gfortrani_close_units () at ../../../gcc-4.4-work/libgfortran/io/unit.c:688
688       __gthread_mutex_lock (&unit_lock);
(gdb) s
** Simulating stepping into inlined subroutine.  **
694       if (__gthread_active_p ())
(gdb) p __gthread_active_p ()
No symbol "__gthread_active_p" in current context.
(gdb) s
689       while (unit_root != NULL)
(gdb) p unit_root
No symbol "unit_root" in current context.
(gdb) s
695         return __gthrw_(pthread_mutex_lock) (mutex);
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0x93b424ee in semaphore_wait_signal_trap ()

If I uncomment the commented line I get:

Breakpoint 1, main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:11
11      {
(gdb) s
13        store_exe_path (argv[0]);
(gdb) s
*__gfortran_store_exe_path (argv0=0xbfffdd04 "/Volumes/MacBook/Users/dominiq/Documents/Fortran/g95bench/win/f90/bug/a.out") at ../../../gcc-4.4-work/libgfortran/runtime/main.c:114
114       if (argv0[0] == '/')
(gdb) s
103     {
(gdb) s
114       if (argv0[0] == '/')
(gdb) s
116           exe_path = argv0;
(gdb) s
117           please_free_exe_path_when_done = 0;
(gdb) s
133     }
(gdb) s
main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:16
16        set_args (argc, argv);
(gdb) s
*__gfortran_set_args (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/runtime/main.c:82
82        argc_save = argc;
(gdb) s
83        argv_save = argv;
(gdb) s
84      }
(gdb) s
main (argc=1, argv=0xbfffdb98) at ../../../gcc-4.4-work/libgfortran/fmain.c:21
21        MAIN__ ();
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0x93b424ee in semaphore_wait_signal_trap ()

From pr30617 I noticed that the "mutex" behavior is not the same on Darwin and on Linux, but I don't know anything about the whole stuff.

Comment 10 Dominique d'Humieres 2008-07-12 09:39:30 UTC
Formatted OPEN also hangs:

       open(unit=11,form='formatted')
       end

It seems that all I/Os involving external files hang, but 'print *,' or 'write(tmp,*)' (where 'tmp' is an internal buffer) works.

Comment 11 Dominique d'Humieres 2008-07-13 16:19:39 UTC
I confirm that the problem appears at revision 137631 (137630 works fine).

Using gdb, the hanging depends on the code: for the comment #10 I get the following backtrace

#0  0x93b424ee in semaphore_wait_signal_trap ()
#1  0x93b49fc5 in pthread_mutex_lock ()
#2  0x0019c172 in *__gfortrani_close_units () at ../gcc/gthr-posix.h:695
#3  0x000fc291 in cleanup () at ../../../gcc-4.4-work/libgfortran/runtime/main.c:175
#4  0x8fe12fc3 in __dyld__ZN16ImageLoaderMachO13doTerminationERKN11ImageLoader11LinkContextE ()
#5  0x8fe04f7b in __dyld__ZN4dyld14runTerminatorsEPv ()
#6  0x93b6afdc in __cxa_finalize ()
#7  0x93b6aed0 in exit ()
#8  0x00001eef in start ()

If I add 

print *, 'open'
close(11)

before the 'end', I get

#0  0x93b424ee in semaphore_wait_signal_trap ()
#1  0x93b49fc5 in pthread_mutex_lock ()
#2  0x0019c1e5 in get_external_unit (n=6, do_create=1) at ../../../gcc-4.4-work/libgfortran/io/unit.c:288
#3  0x0019a501 in data_transfer_init (dtp=0xbfffd9d8, read_flag=0) at ../../../gcc-4.4-work/libgfortran/io/transfer.c:1828
#4  0x00001efd in MAIN__ ()
#5  0x00001f98 in main (argc=1, argv=0xbfffdb84) at ../../../gcc-4.4-work/libgfortran/fmain.c:21

In all cases I have looked at the hanging occurs after:

...
(gdb) stepi
0x93b42cf7 in _sysenter_trap ()
(gdb) stepi
^C  
Program received signal SIGINT, Interrupt.
0x93b424ee in semaphore_wait_signal_trap ()
(gdb) disassemble 0x93b42cf0 0x93b42d00
Dump of assembler code from 0x93b42cf0 to 0x93b42d00:
0x93b42cf0 <cerror+56>: inc    %ebx
0x93b42cf2 <cerror+58>: xchg   %ax,%ax
0x93b42cf4 <_sysenter_trap+0>:  pop    %edx
0x93b42cf5 <_sysenter_trap+1>:  mov    %esp,%ecx
0x93b42cf7 <_sysenter_trap+3>:  sysenter 
0x93b42cf9 <_sysenter_trap+5>:  nopl   (%eax)
0x93b42cfc <i386_get_ldt+0>:    mov    $0x6,%eax
End of assembler dump.

Comment 12 Richard Biener 2008-07-18 17:02:10 UTC
What's the status here?
Comment 13 Andreas Tobler 2008-07-18 18:52:16 UTC
still hanging on x86_64-apple-darwin, rev 137959.

Did I miss a patch to test ?
Comment 14 Andreas Tobler 2008-07-18 20:13:37 UTC
As long as 36864 is open we can not test this one on i686-apple-darwin.
Comment 15 rguenther@suse.de 2008-07-18 20:23:10 UTC
Subject: Re:  [4.4 Regression] I/Os hang at rev. 137631 on
 darwin9

On Fri, 18 Jul 2008, andreast at gcc dot gnu dot org wrote:

> ------- Comment #13 from andreast at gcc dot gnu dot org  2008-07-18 18:52 -------
> still hanging on x86_64-apple-darwin, rev 137959.
> 
> Did I miss a patch to test ?

There were several PRE fixes, so I was just wondering.

Richard.
Comment 16 Andreas Tobler 2008-07-20 18:40:05 UTC
Still hangs as of rev 138009.
Comment 17 Andreas Tobler 2008-07-24 19:03:15 UTC
Just a comment, on the tuples branch, merged from main on 2008-07-23, the hangs do not happen anymore. At least on x86_64-apple-darwin. i686-apple-darwin build in progress.
Comment 18 Andreas Tobler 2008-07-24 20:23:06 UTC
I confirm that i686-apple-darwin on tuples branch does _not_ hang in these test cases

Proposal, I know about plans to merge tuples in the next few days to trunk, so we wait until this merge happened and see how things go, ok?
Comment 19 Dominique d'Humieres 2008-07-25 06:56:57 UTC
> I confirm that i686-apple-darwin on tuples branch does _not_ hang in these test cases

Is this a real fix, i.e., the cause(s) of the hanging has(ve) been understood and fixed? or does it just happen that the hanging is no longer there?

Comment 20 Daniel Berlin 2008-07-25 07:11:30 UTC
Subject: Re:  [4.4 Regression] I/Os hang at rev. 137631 on darwin9

Yes.


On Thu, Jul 24, 2008 at 11:56 PM, dominiq at lps dot ens dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #19 from dominiq at lps dot ens dot fr  2008-07-25 06:56 -------
>> I confirm that i686-apple-darwin on tuples branch does _not_ hang in these test cases
>
> Is this a real fix, i.e., the cause(s) of the hanging has(ve) been understood
> and fixed? or does it just happen that the hanging is no longer there?
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36806
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 21 Dominique d'Humieres 2008-07-25 07:26:30 UTC
> Yes

To the first or the second question? If it is to the first, would it be possible to document what was going on?

Comment 22 rguenther@suse.de 2008-07-25 08:11:12 UTC
Subject: Re:  [4.4 Regression] I/Os hang at rev. 137631 on
 darwin9

On Fri, 25 Jul 2008, dominiq at lps dot ens dot fr wrote:

> ------- Comment #21 from dominiq at lps dot ens dot fr  2008-07-25 07:26 -------
> > Yes
> 
> To the first or the second question? If it is to the first, would it be
> possible to document what was going on?

It's just no longer happening.  While there were fixes for bugs we
understood AFAIK nobody really analyzed this particular PR.

Richard.
Comment 23 Daniel Berlin 2008-07-25 08:11:56 UTC
Subject: Re:  [4.4 Regression] I/Os hang at rev. 137631 on darwin9

The first.
For various reasons, get_external_unit in io/unit.c was being miscompiled.


On Fri, Jul 25, 2008 at 12:26 AM, dominiq at lps dot ens dot fr
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #21 from dominiq at lps dot ens dot fr  2008-07-25 07:26 -------
>> Yes
>
> To the first or the second question? If it is to the first, would it be
> possible to document what was going on?
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36806
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 24 Thomas Koenig 2008-07-31 11:38:38 UTC
Is this still an issue after the tuples merge?
Comment 25 Andreas Tobler 2008-08-06 20:25:46 UTC
I'd say it is not an issue anymore. I close this bug now. On different opinions please reopen.