This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: egcs/Solaris status


>  The program crashes randomly (1 failure out of ~100 runs). 

Some programs have inherit exposures to random behavior.  For example,
games that pick random numbers, programs that are sensitive to timing
of asynchronous events in their *design* (such as transaction-processing
programs), and so on.

But even programs that don't have these inherit exposures can have
bugs that manifest as random problems due to other factors.

Among these other factors are asynchronous *implementations* upon which
the program relies, via library routines it uses, or via the underlying
system architecture.

When working on SPARCs, a substantial source of underlying
asynchronous behavior is the register-window mechanism.  This
mechanism can be triggered at any time, causing memory in the
stack to be written (or read), even if the stack frame
doesn't belong to the currently active procedure.

In particular, if a program has a bug involving an uninitialized
variable, and that variable happens to "live" in the stack frame
at an appropriate location, it might happen to *normally* end up
with the "correct" value *except* when it is asynchronously trashed,
before being used, by a register-window spill operation.

Debugging this sort of thing is a nightmare.  And all it takes
to make it start happening, when it never seemed to happen before,
is for a new version of a compiler to decide to allocate some variable
somewhere else, make a stack frame for one or more procedures (perhaps
not directly related to the one containing a bug) smaller or larger,
etc.  Even keeping everything the same except for the order in which
.o files are linked on the command line can make the bug happen more
or less often.  Ditto for running the exact same executable on a
different SPARC implementation -- one that has more or less register
files in the CPU, or that has a greater or smaller load, for example.
(That doesn't mean a change in frequency of crashes necessarily points
to the spill being the trigger -- there are other asynchronous events
in most systems, after all.  But for straight numerical code a la
typical Fortran, for example, I'd look at the window spills first.)

Archives of comp.arch (USENET) might provide more info on this problem,
and other net resources might exist that offer advice about how to
search for the source of a bug that might be triggered by a window spill.

        tq vm, (burley)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]