This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Experiencing unreproducible internal compiler errors <<whinge>>


On Fri, Mar 17, 2000 at 09:02:21AM +1030, Matt Lowry wrote:
> 
> On Thu, 16 Mar 2000, Toon Moene wrote:
> 
> > Zack Weinberg wrote:
> > 
> > > On Thu, Mar 16, 2000 at 12:23:38PM +1030, Matt Lowry wrote:
> > 
> > > > I have encountered internal compiler errors with gcc 2.95.2, but they are
> > > > unreproducible in that with all cases the problem did not reappear when
> > > > make was immediatley reinvoked.
> > 
> > > 99 times out of 100, this is a hardware problem, such as an
> > > excessively overclocked CPU or a faulty memory board.  Please see
> > > http://www.bitwizard.nl/sig11/
> > 
> > However, in the past we've also seen this behaviour with experimental OS
> > kernels (like the odd-numbered Linux ones).
> > 
> > Remember that a buggy device driver is a great way to change memory you
> > think is private to your process in inpredictable ways.
> > 
> 
> Sorry folks but this explaination just does not wash with me. I appreciate
> the perils of a flakey OS and realise that hardware can do funny things,
> but Mandrake 7.0 runs kernel 2.2.14 and my machine is new with (presumably)
> decent memory on a decent board with a K7 that is not overclocked.

Okay, so that cuts the number of possible causes down by a large
chunk.  You can probably forget buggy device drivers, although I would
look carefully at the release notes for the 2.2.15preX series and see
if any of the bugfixes apply to you.  

But 'decent memory on a decent board' means nothing in the realm of PC
hardware.  You can't rule out a mildly out of spec chip, or a bad
trace that fails 1 time in 10,000 writes, or something like that.

> OK so let's posit it's the hardware. Why isn't every single process running
> on the machine liable to having it's memory corrupted or whatever. What's
> so special about GCC that means it and nothing else randomly segfaults or
> has non-deterministic behaviour despite a decidedly deterministic function
> in life? I've yet to see anything else on my machine exhibit this kind of
> behaviour.

GCC may be the most stressful program your machine ever runs.  It runs
the CPU at full throttle for minutes to hours, depending on the size
of the build.  It has almost-random memory access patterns, and its
active set - memory being referenced constantly - can grow to hundreds
of megs.  This puts way more strain on the hardware than any casual
testing utility.  If you'd actually read the FAQ I pointed you at, it
would have explained this in great detail.

What you want to do now is:  Pick a file that causes an ICE.  Rerun
the compile with -v -save-temps, so you get an intermediate file.
Then run the compiler proper about 10,000 times on the same input,
with the same command line, and under control of the 'catchsegv'
utility which ships with Mandrake.  Count the number of times it
fails.  Also, look carefully at the output of the utility for each
failure - it'll look vaguely like this:

*** Segmentation fault
Register dump:

 EAX: bffffae4   EBX: 400f2cb8   ECX: 080483b0   EDX: 00000001
 ESI: 40013010   EDI: 400e3453   EBP: bffffa7c   ESP: bffffa7c

 EIP: 080483b3   EFLAGS: 00010297

 CS: 0023   DS: 002b   ES: 002b   FS: 0000   GS: 0000   SS: 002b

 Trap: 0000000e   Error: 00000006   OldMask: 00000000
 ESP/signal: bffffa7c   CR2: 00000000

Backtrace:
??:0(??)[0x80483b3]
...


If the register dump and backtrace are the same every time it fails,
you might have a case for a bug in GCC.  Otherwise, start swapping
memory boards.

zw

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]