Bug report #4762
Petri Mutka
pmu@sun3.oulu.fi
Wed Nov 7 04:26:00 GMT 2001
Hello,
Thank you for your email.
The current history of the bug is following; while coding a numerical
program doing lengthy calculations (more than 50 hours), I came around
some weird behaviour.
The program has a linked list driving a numerical module with different
parameters. For some unknown reason, every 18th or 42th (depending on the
platform) came out as NaN (it took about 5 hours to reach that
point). After three weeks of intense debugging I was able to track down
the bug into two subroutines (the program has more than 10000 lines of
code in it). That's when the weirdness started.
When examining the calculations producing NaN in debugger, none of the
variables where it came from by adding, subtracting, multiplying or
dividing were NaN. Some times sin() gave NaN. Few times even a variable
defined as constant was mysteriously changed to a NaN.
I checked the code for memory leaks, and found none. The size of the
program was not increasing during execution. I tried memory guardian
libraries (as Electric fence), which found nothing. I went through the
code many times and did'nt found any leaks or wild pointers; all the
reserved memory areas were released consistently. The NaN seemed just to
pop up as it pleased.
Then I started to search web for similiar bug reports, and found some. Few
reports were written about NaN:s and unconsistent behaviour of the
floating point libraries. That's when I wrote the program in the bug
report and noticed it producing NaN:s. I ran it over one weekend, and it
gave out one NaN for about 1e9 calculations of sin() or cos(). That's when
I wrote the report.
Afterwards, I gave the code in the report to some friends of mine, and
they were NOT able to reproduce the bug (with Redhat 7.2).
My mentioning about stack overflow was only a wild guess and propably a
wrong one.
However, I was able to get rid of the problem by nuking the RedHat
installation in my workstation and reinstalling newest version of it
(7.2). At solaris platform the problem was delt with by using a Sun
complier.
So the final reason of the bug is still open. But the original program is
working just fine!
I noticed that the my report was hmmm... less than graciously done, but
that is result from local paranoia; I have the sendmail disabled in my
workstation, and I use other computer for emails and such. So, I'm sorry
for the mess.
I hope that these emails are going where they should go...
Best wishes,
Petri Mutka
More information about the Gcc-bugs
mailing list