This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: -traditional comment elimination (yuk...)
- To: Neil Booth <NeilB at earthling dot net>
- Subject: Re: -traditional comment elimination (yuk...)
- From: Dave Brolley <brolley at redhat dot com>
- Date: Thu, 11 May 2000 11:49:08 -0400
- CC: Zack Weinberg <zack at wolery dot cumb dot org>, gcc at gcc dot gnu dot org
- Organization: Cygnus Solutions, a Red Hat Company
- References: <E12pqC9-0007mW-00@monkey.rosenet.ne.jp>
Neil Booth wrote:
>
> Zack,
>
> A quickie about exactly how comment removal works -traditionally. I
> think I've discovered an obscure bug in the preprocessor,
> (interestingly cccp seems to be OK) and we only get away with it
> because the front end does it's own lexing. It also reveals a bug in
> my current lexer. I realised this when thinking about handling
> removal of escaped newlines within parse_number and parse_string.
>
> Input New Lexer Current CPP Current Front End
>
> &/**/& <&&> <&><&> <&&>
> foo/**/bar <foo><bar> <foo><bar> <foobar>
> 123/**/.456 <123><.456> <123><.456> <123.456>
Comment removal in -traditional and the associated token pasting
effects can almost always be explained as the result of the
compiler re-lexing the textual output of the preprocessor. ANSI
changed this by redefining the output of the preprocessor as a
stream of tokens.
So, whether two tokens separated only by a comment get combined
with -traditional depends on the context:
if (1 &/**/& 1)
does not cause an error because the -traditional preprocessor
outputs
if (1 && 1)
which gets re-lexed by the compiler as <if><(><1><&&><1>.
However,
#if 1 &/**/& 1
causes an error because the #if line is interpreted right away by
the preprocessor as <#><if><1><&><&><1> and is never re-lexed by
the compiler.
Also note examples like
double f = 1.0/**/e+10;
where 1.0e+10 becomes one token with -traditional although before
re-lexing by the compiler it was 4 separate tokens
<1.0><e><+><10>
It is also possible for pasting of adjacent text to *create* new
comments wiht -traditional
//**/* this is a comment in open text but not on #if */
The easiest way to implement these effects in all of their gory
complexity is to actually re-lex complete blocks of
non-whitespace which are separated only by comments *when that
text would have -traditionally been re-lexed by the compiler*.
Also, note that all of these pasting effects disappear if -C is
ued to preserve comments in the textual output of the
preproessor.
It's kind of a catch 22. A token based integrated preprocessor
makes it easy to implement ANSI but hard to implement
--traditional. A text based two pass preprocessor/compiler makes
it easy to implement -traditional. I found this to be one of the
trickiest areas of the last preprocessor I worked on.
Dave