This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: -traditional comment elimination (yuk...)




Neil Booth wrote:
> 
> Zack,
> 
> A quickie about exactly how comment removal works -traditionally.  I
> think I've discovered an obscure bug in the preprocessor,
> (interestingly cccp seems to be OK) and we only get away with it
> because the front end does it's own lexing.  It also reveals a bug in
> my current lexer.  I realised this when thinking about handling
> removal of escaped newlines within parse_number and parse_string.
> 
> Input        New Lexer    Current CPP    Current Front End
> 
> &/**/&          <&&>         <&><&>           <&&>
> foo/**/bar   <foo><bar>    <foo><bar>       <foobar>
> 123/**/.456  <123><.456>   <123><.456>      <123.456>

Comment removal in -traditional and the associated token pasting
effects can almost always be explained as the result of the
compiler re-lexing the textual output of the preprocessor. ANSI
changed this by redefining the output of the preprocessor as a
stream of tokens.

So, whether two tokens separated only by a comment get combined
with -traditional depends on the context:

if (1 &/**/& 1)

does not cause an error because the -traditional preprocessor
outputs

if (1 && 1)

which gets re-lexed by the compiler as <if><(><1><&&><1>.
However,

#if 1 &/**/& 1

causes an error because the #if line is interpreted right away by
the preprocessor as <#><if><1><&><&><1> and is never re-lexed by
the compiler. 

Also note examples like

double f = 1.0/**/e+10;

where 1.0e+10 becomes one token with -traditional although before
re-lexing by the compiler it was 4 separate tokens
<1.0><e><+><10>

It is also possible for pasting of adjacent text to *create* new
comments wiht -traditional

//**/* this is a comment in open text but not on #if */

The easiest way to implement these effects in all of their gory
complexity is to actually re-lex complete blocks of
non-whitespace which are separated only by comments *when that
text would have -traditionally been re-lexed by the compiler*.
Also, note that all of these pasting effects disappear if -C is
ued to preserve comments in the textual output of the
preproessor.

It's kind of a catch 22. A token based integrated preprocessor
makes it easy to implement ANSI but hard to implement
--traditional. A text based two pass preprocessor/compiler makes
it easy to implement -traditional. I found this to be one of the
trickiest areas of the last preprocessor I worked on.

Dave

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]