A i686-pc-linux-gnu compiler built from gcc-3.4-20040303.tar.bz2 exhibits this: echo "#pragma GCC poison foo" > _dummy.c gcc -E -nostdinc _dummy.c # 1 "_dummy.c" # 1 "<built-in>" # 1 "<command line>" # 1 "_dummy.c" # 4294967218 "_dummy.c" Umm, ah, line number 4294967218 ? This bug was brought to my attention by GAS, when it said: line numbers must be positive; line number -15 rejected The bug is in GCC-3.3.2 but isn't in 3.3.1.
Confirmed, a regression from 3.3.1
This just bit me, too. Alexandre, I strongly suspect your patches http://gcc.gnu.org/ml/gcc-cvs/2003-09/msg00418.html http://gcc.gnu.org/ml/gcc-cvs/2003-09/msg00715.html to be responsible for this. Could you please have a look?
You are right, it's the second cb_line_change, that I added in the second patch, that is the culprit. It doesn't seem to always (or ever?) have accurate line number information in pfile->cur_token after we handle pragma poison. I've taken it out in my local tree and the problem went away. Would you be willing to give the patch a round of testing, perhaps even for the next release out of the 3.3 branch, and post the patch for review by the cpp maintainers? If not, let me know, and I'll try to get to it. Thanks for bring the problem to my attention, and sorry about the lossage.
Alexandre, I tested your suggestion on 3.3 branch, 3.4 branch and mainline. It cured the problem of the submitter with #pragma GCC poison foo It also cured the problem for standard header files that contain #pragma GCC system_header For example, preprocessing the C++ header #include<list> without your patch resulted in # 1 "PR14438A.cc" # 1 "<built-in>" # 1 "<command line>" # 1 "PR14438A.cc" # 1 "/mygcc/bin/../lib/gcc/i686-pc-linux-gnu/3.4.0/../../../../include/c++/3.4.0/list" 1 3 # 64 "/mygcc/bin/../lib/gcc/i686-pc-linux-gnu/3.4.0/../../../../include/c++/3.4.0/list" 3 # 65 "/mygcc/bin/../lib/gcc/i686-pc-linux-gnu/3.4.0/../../../../include/c++/3.4.0/list" 3 # 4294967211 "/mygcc/bin/../lib/gcc/i686-pc-linux-gnu/3.4.0/../../../../include/c++/3.4.0/list" 3 ^^^^^^^^^^ Alas, I couldn't do regression testing, since my DejaGnu seems to be out of date. Maybe it's some different problem, but right know I don't have the time to hunt the problem down. Since it's your patch, anyway, could you please do that and get it reviewed? (I'll attach the patches to the PR, one for the 3.3 and 3.4 branch and one for mainline - which differs only in whitespace.) Btw, I don't have the slightest idea to how to prepare a test for this kind of problem for the testsuite. Any ideas?
Created attachment 5950 [details] patch for 3.3 and 3.4 branch
Created attachment 5951 [details] patch for mainline
I'd like to know why pfile->cur_token is garbage at this point; my suspicion is that this should be fixed there instead. The existing patch would be OK for 3.3/4, though, if it's too hard to fix this properly.
Based on Zack's comments, would you please apply this patch to 3.4 *only*. Leave the bug open for mainline. Alexandre, would you please follow up on that issue so that we can resolve this fully for 3.5?
Subject: Re: [3.3/3.4/3.5 Regression] -E of #pragma has negative line number "mmitchel at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes: | Based on Zack's comments, would you please apply this patch to 3.4 *only*. Apply it to gcc-3_3-branch too. | Leave the bug open for mainline. Agreed. -- Gaby
Err... Is it really a problem in 3.4? I didn't think so. I understood it was only broken in 3.3. I believe 3.4 is already fixed in that it didn't have garbage there at that point. I'll check whether the patch is necessary for 3.4 and, if so, check it in, and then proceed to regression-testing the patch for 3.3. I'm pretty sure it's not an issue in mainline, but I'll verify as well.
Subject: Re: [3.3/3.4/3.5 Regression] -E of #pragma has negative line number > Err... Is it really a problem in 3.4? The bug was first reported in a 3.4 snapshot. --- Don
So it is a problem in 3.4 and mainline as well. Sorry, I had misread the bug report.
Subject: Bug 14438 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_3-branch Changes by: aoliva@gcc.gnu.org 2004-03-24 03:18:44 Modified files: gcc : ChangeLog cpplib.c Log message: PR preprocessor/14438 * cpplib.c (do_pragma): Remove line_change call after pragma handler. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.16114.2.950&r2=1.16114.2.951 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cpplib.c.diff?cvsroot=gcc&only_with_tag=gcc-3_3-branch&r1=1.326.2.3&r2=1.326.2.4
Subject: Bug 14438 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_4-branch Changes by: aoliva@gcc.gnu.org 2004-03-24 03:19:02 Modified files: gcc : ChangeLog cpplib.c Log message: PR preprocessor/14438 * cpplib.c (do_pragma): Remove line_change call after pragma handler. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.367&r2=2.2326.2.368 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cpplib.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.355.4.1&r2=1.355.4.2
Subject: Bug 14438 CVSROOT: /cvs/gcc Module name: gcc Changes by: aoliva@gcc.gnu.org 2004-03-24 03:19:38 Modified files: gcc : ChangeLog cpplib.c Log message: PR preprocessor/14438 * cpplib.c (do_pragma): Remove line_change call after pragma handler. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.3273&r2=2.3274 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cpplib.c.diff?cvsroot=gcc&r1=1.361&r2=1.362
Regression fixed, still need to investigate, per Zack's request.
Subject: Re: preprocessor/14438 The reason why the patch had introduced a regression was that pfile->cur_token pointed to uninitialized memory, one past the last initialized token. _cpp_lex_direct() doesn't skip to the next line because we're processing a directive, so it just sets result->type to CPP_EOF, along with its correct location, after incrementing pfile->cur_token past result, without any form of initialization to this value. In fact, I'm a bit concerned as to what might happen if we were to call _cpp_lex_direct() an unlimited number of times while processing a directive; I don't see anything that would prevent cur_token from running past the memory area reserved for it. I suppose one way to fix this issue would be to decrement cur_token again when returning in the CPP_EOF case, such that we'd return the same EOF token over and over; another would be to just refrain from dereferencing cur_token after reading an EOF token, but how would we tell? And, more importantly, why would we want to? Clearly we don't need it in the case I thought we did when I added the code I just removed; I don't think we need it elsewhere. Is this analysis enough for us to close this bug report now?
>I suppose one way to fix this issue would be to decrement cur_token >again when returning in the CPP_EOF case, such that we'd return the >same EOF token over and over; Actually, that sounds like a fairly robust thing to do -- in _cpp_lex_direct, perhaps?
Subject: Re: preprocessor/14438 Alexandre Oliva <aoliva@redhat.com> writes: > The reason why the patch had introduced a regression was that > pfile->cur_token pointed to uninitialized memory, one past the last > initialized token. _cpp_lex_direct() doesn't skip to the next line > because we're processing a directive, so it just sets result->type to > CPP_EOF, along with its correct location, after incrementing > pfile->cur_token past result, without any form of initialization to > this value. In fact, I'm a bit concerned as to what might happen if > we were to call _cpp_lex_direct() an unlimited number of times while > processing a directive; I don't see anything that would prevent > cur_token from running past the memory area reserved for it. Oh, ugh. > I suppose one way to fix this issue would be to decrement cur_token > again when returning in the CPP_EOF case, such that we'd return the > same EOF token over and over; another would be to just refrain from > dereferencing cur_token after reading an EOF token, but how would we > tell? And, more importantly, why would we want to? Clearly we don't > need it in the case I thought we did when I added the code I just > removed; I don't think we need it elsewhere. Suggest you add defensive code to prevent running cur_token off the end of the buffer, as you suggest, and then go ahead and close the bug report. zw
Subject: Re: preprocessor/14438 On Mar 24, 2004, Zack Weinberg <zack@codesourcery.com> wrote: > Alexandre Oliva <aoliva@redhat.com> writes: >> In fact, I'm a bit concerned as to what might happen if >> we were to call _cpp_lex_direct() an unlimited number of times while >> processing a directive; I don't see anything that would prevent >> cur_token from running past the memory area reserved for it. > Oh, ugh. > Suggest you add defensive code to prevent running cur_token off the > end of the buffer, as you suggest, and then go ahead and close the bug > report. It turns out that the most common caller of _cpp_lex_direct(), cpp_lex_token(), already takes care of that, and other callers seem to take care of buffer management already, except perhaps for the code that reads the initial file and directory name, but those should be the first few tokens, therefore they should fall within the initially allocated buffer and never fail. I'm a bit wary of introducing overhead in _cpp_lex_direct(), since it's quite performance critical, and it appears that we're safe, so I'm leaning towards leaving it as is. If you agree, please go ahead and mark the bug as closed. Otherwise, I suggest that you, being more familiar with the cpp code, take it over and refactor it a big to make it safer without introducing unnecessary overhead in the common paths.
I will need to study the code a bit to figure out whether checking in _cpp_lex_direct is appropriate or not. Can't say when I will get to this, but given what you say it is not critical. Reassigning to myself, retargeting for 3.5.0, updating title and priority.
Comment on attachment 5950 [details] patch for 3.3 and 3.4 branch obsoleting patches which were checked in and do not address the current ambit of this bug report.
Comment on attachment 5951 [details] patch for mainline obsoleting patches which were checked in and do not address the current ambit of this bug report.
Unassigning from Zack since he is now gone from GCC development.
FWIW I've been looking at this area for PR 29966. I don't see how overflow checking could be added to _cpp_lex_direct. However I did end up adding an assertion to _cpp_lex_token, to try to expose a hidden assumption: if (pfile->cur_token < pfile->cur_run->base || pfile->cur_token >= pfile->cur_run->limit) abort (); I also looked at the other assignments to cur_token and (aside from the one fixed in my forthcoming patch) they all look ok. So, I think this PR can be closed. Please reopen if you think I've done this in error.