Experiencing unreproducible internal compiler errors <<whinge>>
Zack Weinberg
zack@wolery.cumb.org
Thu Mar 16 14:49:00 GMT 2000
On Fri, Mar 17, 2000 at 09:02:21AM +1030, Matt Lowry wrote:
>
> On Thu, 16 Mar 2000, Toon Moene wrote:
>
> > Zack Weinberg wrote:
> >
> > > On Thu, Mar 16, 2000 at 12:23:38PM +1030, Matt Lowry wrote:
> >
> > > > I have encountered internal compiler errors with gcc 2.95.2, but they are
> > > > unreproducible in that with all cases the problem did not reappear when
> > > > make was immediatley reinvoked.
> >
> > > 99 times out of 100, this is a hardware problem, such as an
> > > excessively overclocked CPU or a faulty memory board. Please see
> > > http://www.bitwizard.nl/sig11/
> >
> > However, in the past we've also seen this behaviour with experimental OS
> > kernels (like the odd-numbered Linux ones).
> >
> > Remember that a buggy device driver is a great way to change memory you
> > think is private to your process in inpredictable ways.
> >
>
> Sorry folks but this explaination just does not wash with me. I appreciate
> the perils of a flakey OS and realise that hardware can do funny things,
> but Mandrake 7.0 runs kernel 2.2.14 and my machine is new with (presumably)
> decent memory on a decent board with a K7 that is not overclocked.
Okay, so that cuts the number of possible causes down by a large
chunk. You can probably forget buggy device drivers, although I would
look carefully at the release notes for the 2.2.15preX series and see
if any of the bugfixes apply to you.
But 'decent memory on a decent board' means nothing in the realm of PC
hardware. You can't rule out a mildly out of spec chip, or a bad
trace that fails 1 time in 10,000 writes, or something like that.
> OK so let's posit it's the hardware. Why isn't every single process running
> on the machine liable to having it's memory corrupted or whatever. What's
> so special about GCC that means it and nothing else randomly segfaults or
> has non-deterministic behaviour despite a decidedly deterministic function
> in life? I've yet to see anything else on my machine exhibit this kind of
> behaviour.
GCC may be the most stressful program your machine ever runs. It runs
the CPU at full throttle for minutes to hours, depending on the size
of the build. It has almost-random memory access patterns, and its
active set - memory being referenced constantly - can grow to hundreds
of megs. This puts way more strain on the hardware than any casual
testing utility. If you'd actually read the FAQ I pointed you at, it
would have explained this in great detail.
What you want to do now is: Pick a file that causes an ICE. Rerun
the compile with -v -save-temps, so you get an intermediate file.
Then run the compiler proper about 10,000 times on the same input,
with the same command line, and under control of the 'catchsegv'
utility which ships with Mandrake. Count the number of times it
fails. Also, look carefully at the output of the utility for each
failure - it'll look vaguely like this:
*** Segmentation fault
Register dump:
EAX: bffffae4 EBX: 400f2cb8 ECX: 080483b0 EDX: 00000001
ESI: 40013010 EDI: 400e3453 EBP: bffffa7c ESP: bffffa7c
EIP: 080483b3 EFLAGS: 00010297
CS: 0023 DS: 002b ES: 002b FS: 0000 GS: 0000 SS: 002b
Trap: 0000000e Error: 00000006 OldMask: 00000000
ESP/signal: bffffa7c CR2: 00000000
Backtrace:
??:0(??)[0x80483b3]
...
If the register dump and backtrace are the same every time it fails,
you might have a case for a bug in GCC. Otherwise, start swapping
memory boards.
zw
More information about the Gcc-bugs
mailing list