pch bug fix

Mike Stump mikestump@comcast.net
Thu Jan 2 00:20:00 GMT 2014

On Dec 31, 2013, at 11:46 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Dec 31, 2013 at 10:39:58PM -0800, Mike Stump wrote:
>> In testing for wide-int, we discovered that someone seems to have blown
>> pch

> Thanks for tracking this down, this sounds like PR59436.

To confirm that, one would need to either, investigate those issues, if it is the same, it is just plain hard work, or statistically work up a probability of failure, and run the test case enough to be 99.9% certain in a change of test case result and then drop my patch in, and then check.  In my case, I was near 30% failure, so, I could run it 20 times, and be 99.99999% certain.  I didn't judge my work by the probabilistic method; rather I only used it to double check after I was certain I understood and fixed the problem.  I read through the PR and could not rise above 10% confidence that it is the same problem.

> How have you managed to track it down?

:-)  You know, I'd love to teach an AI how to debug gcc, then, we could just use it to debug gcc, and it would have found the problem in 20 minutes on my crappy computer (16 core), I'd love to use my nice computer (64 cores) on a hard problem with such a program.  One day, this will come to pass, but, we seem so far away still from that goal.

I'll give you an overview of how it should be done, and I mostly did this, though, at times not quite in this order:

First, was the determination that the issue, when pch isn't used, doesn't occur.  Next, was to find the data that is wrong and the address it is at.  You see if the wrong data is memory smashed or created that way purposefully.  In my case, it was smashed by an unrelated data structure.  Next, why, what formed that address for both pieces of data?  In my case, both were ggc allocated, and valid addresses, wait, that's can't be.  Yup, it is the case.  The allocations are correct, and purposefully given out by ggc and are correct and both simultaneously live, wait, that can't be.  Size, it turns out, the person allocation the data structure was pch importing, and it has a size in mind, that size is way to small, and the write off the end of the allocated data structure clobbers the other unrelated data structure.  Now, why is the size wrong?  Comes from the pch file that way.  Why?  During initial allocation on the pch build size, the size is correct.  If pch is off, the size is correct, what?  Walk writing, notice that someone calls strlen on the data in question.  Wait, that's wrong, why does it do that?  Ah, char * is a string.  char[500], when you gty atomic it and takes it's address, it persisted out as a string, but that's wrong.  Make it not use strlen.

The rest of the process is then about fixing it.

I'm old skool, so, I remember the days of debugging gcc when we had memory smashers, lifetime issues and other such fun.  ggc was invented to get rid of a large class of errors in one wave of the magic wand, and today, those issues do seem rarer indeed.

> I also wonder why it doesn't seem to affect 4.8 when it also has the same change.

Don't know.  If no one writes to the pat_enable after pch load, then, trivially, clobbering can't happen; though, if someone ever reads pat_enable, then I can't imagine how it could work.

> Based on the comments in gengtype.c, I'd expect
>  unsigned char *GTY ((atomic, length ("sizeof (struct target_optabs)"))) optabs;

The comments should be fixed...

> to work (or, perhaps instead of sizeof directly call a function that is
> defined in optabs.c and returns that value, perhaps optabs.h isn't included
> where it should be), but unfortunately it seems to be rejected right now.

Yup, I thought the same, and first tried the same.  I too noticed it didn't work.

> So, the question is how hard would it be to support that.

I don't know.  All I know is that if a single other person stubs his toe on this issue, it will be a complete waste of their time trying to track it down.

More information about the Gcc-patches mailing list