This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]

To: jwe at bevo dot che dot wisc dot edu
Subject: Re: [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]
From: craig at jcb-sc dot com
Date: 16 Jul 1999 09:26:45 -0000
Cc: dmg at bell-labs dot com, bug-octave at bevo dot che dot wisc dot edu, egcs-bugs at egcs dot cygnus dot com, hoffmann at ehmgs2 dot et dot tu-dresden dot de
Cc: craig at jcb-sc dot com
References: <19990714171536.16205.qmail@deer><378E3B3C.AD755950@moene.indiv.nluug.nl><19990715211127.6460.qmail@deer> <14222.38681.147742.489835@tillamook-sharp.bogus.domain>

>On 15-Jul-1999, craig@jcb-sc.com <craig@jcb-sc.com> wrote:
>
>| >John Eaton's Octave example points out why this do-not-abort-on-complex-
>| >divide-by-zero can be useful (especially when coupled to "passing the
>| >buck up"):  Octave could deliver a meaningful message to the user,
>| >instead of dumping core itself; or it could enable a user program to
>| >receive the trap [I do not really know if that can be done in Octave's
>| >language - perhaps John can point that out].
>
>I would just like to be able to get IEEE NaNs and Infs (if
>appropriate, of course) for complex division instead of a call to
>abort.
>
>Perhaps the default behavior on generating NaNs and Infs should be
>something other than continuing, but I don't think the default should
>be to call sig_die inside z_div and c_div.  There are surely other
>places in libf2c where division by zero can occur (check pow_di.cc,
>for example) yet c_div and z_div are the only two routines that call
>sig_die for division by zero.

Yes.  IMO it's probably best overall for a product like g77 to *not*
go to a lot of trouble to override "local" defaults -- the default
behaviors for traps, exceptions, signals, and so on, as provided by
the platform, i.e. the CPU/ABI/OS/low-level-libs combo -- as well as the
choices of numerics made by the underlying iron.  That makes for too
many surprises when users switch among compilers/languages, and risks
getting subtle details wrong.  (Which can affect how well the platform
handles security/robustness issues, for example.)

Better to leave the default decisions to the platform, and notify users
that, if they want *other* decisions, they need to learn how to use the
platform's native facilities to change them (so, okay to offer the hooks
they might need in the compiler/library product to do that).  And if they
don't like those defaults, they can take it up with the platform vendor.
(I'm even beginning to see things like initial alignment of the stack
for 64-bit entities as an example of things we should leave to the
platform to get right.)

One of the biggest downsides of this approach is that the compiler
(or library) product doesn't offer, by default, a consistent *cross-platform*
environment.

So, IMO, the *other* "probably best" approach is for a product to
make specific guarantees about its default behavior regardless of
the underlying platform, and follow through on those guarantees.

The downsides of *that* include the risks of encountering a platform
people want supported that *cannot* be persuaded to fit the "mold"
without performing so miserably that users would hate it.  What does
the product then say about itself -- "works the same way on *most* systems"?
In all these discussions, I see much evidence of the slippery slope that
results from making even the least little exception to such a guarantee.

Also, merely *implementing* such a product (porting it to all of the
pertinent platforms) can be a huge challenge, since overriding the
default behavior of the underlying platform implies doing extra testing
and other types of validation that, by choosing relying on the underlying
defaults instead, could be viewed as having (presumably) been done by
the platform vendor (which, for a system like GNU/Linux, includes its
"proponent culture" ;-).

(Further, when problems occur, the platform vendor is much more likely to
help out when the defaults offered by that vendor remain in force.
Though, if the vendor actually *recommends*, and appears to stand
behind, certain combinations of alternate settings for things like
how exceptions are handled, FPU defaults, and so on, it's reasonable
to consider these an "alternate" default for purposes of this rant,
if that default offers what the *vendor* considers to be a "preferred
environment" for products like the compiler/library under consideration
here.)

Between these two positions (the latter being "extreme" in a *good* way)
lie what libf2c does.  (libf2c normally causes, when a Fortran program
starts running, certain defaults in the run-time environment to be
changed, such as by setting up signal handlers.  These help handle things
like flushing I/O buffers before actual death.)

In that case, I'm quite happy to accept what libf2c as, in essence,
a third-party user of it vis-a-vis g77, for the following reasons:

  -  libf2c's behaviors (in the pertinent regards) have been pretty
     stable for some time, so programmers expect them.

  -  The libf2c maintainer changes libf2c usually for only *good* reasons,
     apparently taking into account a reasonably wide range of systems.

  -  To the extent g77 tries to override, or even simply patch, libf2c,
     its maintainers take responsibility for the different behaviors
     that result.  I'm not confident the present g77 maintainer
     (myself) has nearly enough system-wide knowledge to do that
     properly.  Further, generally speaking, the more differences users
     see between their g77-based ("f2c-compatible") environment and
     their f2c-based environment, the *worse* off they are -- even
     though we can make case-by-case arguments that they're better off,
     e.g. by taking advantages of opportunities f2c can't enjoy due to
     its not being part of a behemoth like gcc.

>FWIW, Octave does not currently give the user a way to control what
>happens for IEEE floating point exceptions, but eventually I hope it
>will.

The more "transparent", or at least "translucent", a product is, the
better for everyone.

So, the more transparent g77/f2c/libf2c become, the easier for products
like Octave to "see their way clear" to doing the right thing, in terms
of becoming more "transparent", or becoming more *perfect* anyway,
themselves.

(I know next to nothing about Octave, by the way.  One of my many failings!)

>On systems that have it, sig_die generates SIGIOT, then calls abort.
>So I don't think that catching the signal and then continuing will
>help much.  (If I read the man page correctly, catching SIGABRT then
>returning normally from the signal handler will still result in
>program termination.)

That definitely sounds like bad juju.

Still, as current g77 maintainer, I'll leave this decision to Dave,
wearing his hat as libf2c maintainer.  It *seems* good to me to remove
the code (that calls sig_die on complex division by zero) per your patch.
But if Dave does it, I'll *accept* that it's good; if he decides not to,
I'll accept that, and if he explains why, try to put an appropriate
explanation in the g77 docs.

One thing for sure: reading Kahan's expositions (JAVAhurt.pdf, which
I'm mostly through, though I'm basically ignoring the equations --
which look like gobbledygood due to using pdftotext's output anyway,
so let's all pretend it's not due to my no longer being particular
proficient at math ;-) has certainly woken me up even more to both
the importance of a product like g77 getting these issues as Right as
it possibly can *and* my own inadequacy, as well as (IMO) that of the
g77 "team" (probably, by extension, the current gcc maintainers), when
it comes to attempting to go our own way, if that way is any
different from relying on the underlying platform's defaults for our
own.  (In this sense, I consider libf2c to be a component of g77's
underlying platform.  It isn't *quite*, since libg2c, g77's version
of libf2c, still uses local patches that change the default behaviors
of libf2c, but they've recently come down in number, and I'd like
to see that trend continue.)

In other words: we had better get a clearer understanding of whether,
and just how and when, we're going to disagree with people like
Kahan regarding what we need to provide at *minimum* in terms of
reasonable behavior, language design, etc.  My impression, in the
discussions I've seen so far, is that hardly anybody who chimes in
has as much understanding as Kahan has.  Given that, I'd recommend
we just "do what he says".  (Whether his story represents enough of
the picture to make that possible, I don't know.)  If we choose to
go our own way, however, we should be prepared to explain why in
as much detail, with as much specification and documentation, as
Kahan provides -- i.e. be prepared to write "What Every Programmer
Must Know About GCC's Floating-Point Arithmetic", and make the
implementation consistent with it.

Certainly, I don't believe ad-hoc hand-waving of various points
Kahan makes should be accepted any longer as a valid way to handle
issues raised by people like him (people who are widely respected
for their experience and understanding in their field), or raised
by others but already adequately commented on by people like him.

In that sense, I guess I'm saying we should view the designs (or
at least writings) of certain people as sort of part of our
"platform".

That way, when people have problems with what *we're* doing, we
can point them in the direction of some Great Person's writings
and/or personage and say "that's where you go to raise these issues;
we're just following orders".

        tq vm, (burley)

P.S. I almost can't believe I'm saying all this, since it goes against my
own desires and opinions circa most of my computing career.  I *love* going
my own way.  And, I'm still entirely capable of doing that, since my desires
and opinions haven't changed much.  But my best attempts at an objective,
rational analysis of what it actually takes to produce a product like gcc
or g77 and make it useful to lots of people have thoroughly persuaded me
to "go against type" (my own) -- so, I end up ranting *against* the NIH
syndrome (in its various forms), despite being generally a practical
proponent of it in so many other situations.

References:
- [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]
  - From: craig
- Re: [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]
  - From: Toon Moene
- Re: [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]
  - From: craig
- Re: [jwe@bevo.che.wisc.edu: Crashing after complex division by zero]
  - From: John W. Eaton

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]