[Patch, libfortran] Thread safety and simplification of error printing

Mon May 9 02:31:00 GMT 2011

On Sun, May 8, 2011 at 16:42, N.M. Maclaren <nmm1@cam.ac.uk> wrote:
> On May 8 2011, Janne Blomqvist wrote:
>>>
>>> the error printing functionality (in io/unix.c) st_printf and
>>> st_vprintf are not thread-safe as they use a static buffer. ...
>>
>> While this patch makes error printing thread-safe, it's no longer
>> async-signal-safe as the stderr lock might lead to a deadlock. So I'm
>> retracting this patch and thinking some more about this problem.
>
> It's theoretically insoluble, given the constraints you are working
> under.  Sorry.  It is possible to do reasonably well, but there will
> always be likely scenarios where all you can do is to say "Aargh!
> I give up."

Well, I realize perfection is impossible, so I'm settling for merely
improving the status quo!

> Both I and the VMS people adopted the ratchet design.  You have N
> levels of error recovery, each level allocates all of the resources
> it needs before startup, and any exception during level K increases
> the level to K+1 and calls the level K+1 error handler.  When you
> have an exception at level N, you just die.

To some extent we have a crude version of this, in that when we're
entering many of the fatal error handling functions we do a recursion
check and if that fails, die. Also, in a recent patch of mine
(http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00584.html ) the fatal
signal handler function has been reworked to hopefully deal better
with other signal(s) being delivered before it's done; that code is
modeled after an example in the glibc manual, and I'm a bit unsure if
the recursion check thingy really works or we just end up in an
infinite recursion (that is, do we need to re-set to the default
handler before re-raising? I have a vague memory that the signal
handler for SIGXXX must finish before starting the handler for another
SIGXXX pending signal, which would make the current version safe).

> That imposes the constraint that all diagnostics have a fixed upper
> bound on the resources they need (not just buffer space, but that's
> the main one).  It's a real bummer when the system has some critical
> resources that you can't reserve, so you have to treat an allocation
> failure as an exception, but buffer space is not one such.
>
> That also tackles the thread problem, not very satisfactorily.  If a
> resource needs to be locked, you can try to get it for a bit, and
> then raise a higher exception if you can't.  And, typically, one or
> more of the highest levels are for closing down the process, and
> simply suspend any subsequent threads that call them (i.e. just leave
> them waiting for a lock they won't get).

I think in our case the situation is a bit easier in that we're not
trying to recover from a serious failure, merely print some diagnostic
information without getting stuck in a deadlock.

-- 
Janne Blomqvist