This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
cppinternals.texi: Clarify token spacing chapter

To: gcc-patches at gcc dot gnu dot org
Subject: cppinternals.texi: Clarify token spacing chapter
From: Neil Booth <neil at daikokuya dot demon dot co dot uk>
Date: Sat, 6 Oct 2001 12:28:07 +0100
Cc: Zack Weinberg <zack at codesourcery dot com>
I re-read it and it wasn't terribly clear.  There was also an
important few lines missed out.

Neil.

	* doc/cppinternals.texi: Update.

Index: doc/cppinternals.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/cppinternals.texi,v
retrieving revision 1.10
diff -u -p -r1.10 cppinternals.texi
--- cppinternals.texi	2001/10/05 20:09:42	1.10
+++ cppinternals.texi	2001/10/06 11:25:48
@@ -41,7 +41,7 @@ into another language, under the above c
 @titlepage
 @c @finalout
 @title Cpplib Internals
-@subtitle Last revised September 2001
+@subtitle Last revised October 2001
 @subtitle for GCC version 3.1
 @author Neil Booth
 @page
@@ -71,7 +71,7 @@ into another language, under the above c
 @chapter Cpplib---the core of the GNU C Preprocessor
 
 The GNU C preprocessor in GCC 3.x has been completely rewritten.  It is
-now implemented as a library, cpplib, so it can be easily shared between
+now implemented as a library, @dfn{cpplib}, so it can be easily shared between
 a stand-alone preprocessor, and a preprocessor integrated with the C,
 C++ and Objective-C front ends.  It is also available for use by other
 programs, though this is not recommended as its exposed interface has
@@ -498,12 +498,13 @@ both for aesthetic reasons and because i
 still try to abuse the preprocessor for things like Fortran source and
 Makefiles.
 
-For now, just notice that the only places we need to be careful about
-@dfn{paste avoidance} are when tokens are added (or removed) from the
-original token stream.  This only occurs because of macro expansion, but
-care is needed in many places: before @strong{and} after each macro
-replacement, each argument replacement, and additionally each token
-created by the @samp{#} and @samp{##} operators.
+For now, just notice that when tokens are added (or removed, as shown by
+the @code{EMPTY} example) from the original lexed token stream, we need
+to check for accidental token pasting.  We call this @dfn{paste
+avoidance}.  Token addition and removal can only occur because of macro
+expansion, but accidental pasting can occur in many places: both before
+and after each macro replacement, each argument replacement, and
+additionally each token created by the @samp{#} and @samp{##} operators.
 
 Let's look at how the preprocessor gets whitespace output correct
 normally.  The @code{cpp_token} structure contains a flags byte, and one
@@ -512,7 +513,7 @@ indicates that the token was preceded by
 than a new line.  The stand-alone preprocessor can use this flag to
 decide whether to insert a space between tokens in the output.
 
-Now consider the following:
+Now consider the result of the following macro expansion:
 
 @smallexample
 #define add(x, y, z) x + y +z;
@@ -524,20 +525,21 @@ The interesting thing here is that the t
 output with a preceding space, and @samp{3} is output without a
 preceding space, but when lexed none of these tokens had that property.
 Careful consideration reveals that @samp{1} gets its preceding
-whitespace from the space preceding @samp{add} in the macro
-@emph{invocation}, @samp{2} gets its whitespace from the space preceding
-the parameter @samp{y} in the macro @emph{replacement list}, and
-@samp{3} has no preceding space because parameter @samp{z} has none in
-the replacement list.
+whitespace from the space preceding @samp{add} in the macro invocation,
+@emph{not} replacement list.  @samp{2} gets its whitespace from the
+space preceding the parameter @samp{y} in the macro replacement list,
+and @samp{3} has no preceding space because parameter @samp{z} has none
+in the replacement list.
 
 Once lexed, tokens are effectively fixed and cannot be altered, since
 pointers to them might be held in many places, in particular by
 in-progress macro expansions.  So instead of modifying the two tokens
 above, the preprocessor inserts a special token, which I call a
-@dfn{padding token}, into the token stream in front of every macro
-expansion and expanded macro argument, to indicate that the subsequent
-token should assume its @code{PREV_WHITE} flag from a different
-@dfn{source token}.  In the above example, the source tokens are
+@dfn{padding token}, into the token stream to indicate that spacing of
+the subsequent token is special.  The preprocessor inserts padding
+tokens in front of every macro expansion and expanded macro argument.
+These point to a @dfn{source token} from which the subsequent real token
+should inherit its spacing.  In the above example, the source tokens are
 @samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
 macro replacement list, respectively.
 
@@ -551,11 +553,15 @@ a macro's first replacement token expand
         @expansion{} [baz]
 @end smallexample
 
-Here, two padding tokens with sources @samp{foo} between the brackets,
-and @samp{bar} from foo's replacement list, are generated.  Clearly the
-first padding token is the one that matters.  But what if we happen to
-leave a macro expansion?  Adjusting the above example slightly:
+Here, two padding tokens are generated with sources the @samp{foo} token
+between the brackets, and the @samp{bar} token from foo's replacement
+list, respectively.  Clearly the first padding token is the one we
+should use, so our output code should contain a rule that the first
+padding token in a sequence is the one that matters.
 
+But what if we happen to leave a macro expansion?  Adjusting the above
+example slightly:
+
 @smallexample
 #define foo bar
 #define bar EMPTY baz
@@ -563,34 +569,42 @@ leave a macro expansion?  Adjusting the 
 [foo] EMPTY;
         @expansion{} [ baz] ;
 @end smallexample
+
+As shown, now there should be a space before @samp{baz} and the
+semicolon in the output.
 
-As shown, now there should be a space before baz and the semicolon.  Our
-initial algorithm fails for the former, because we would see three
-padding tokens, one per macro invocation, followed by @samp{baz}, which
-would have inherit its spacing from the original source, @samp{foo},
-which has no leading space.  Note that it is vital that cpplib get
-spacing correct in these examples, since any of these macro expansions
-could be stringified, where spacing matters.
-
-So, I have demonstrated that not just entering macro and argument
-expansions, but leaving them requires special handling too.  So cpplib
-inserts a padding token with a @code{NULL} source token when leaving
-macro expansions and after each replaced argument in a macro's
-replacement list.  It also inserts appropriate padding tokens on either
-side of tokens created by the @samp{#} and @samp{##} operators.
-
-Now we can see the relationship with paste avoidance: we have to be
-careful about paste avoidance in exactly the same locations we take care
-to get white space correct.  This makes implementation of paste
-avoidance easy: wherever the stand-alone preprocessor is fixing up
-spacing because of padding tokens, and it turns out that no space is
-needed, it has to take the extra step to check that a space is not
-needed after all to avoid an accidental paste.  The function
-@code{cpp_avoid_paste} advises whether a space is required between two
-consecutive tokens.  To avoid excessive spacing, it tries hard to only
-require a space if one is likely to be necessary, but for reasons of
-efficiency it is slightly conservative and might recommend a space where
-one is not strictly needed.
+The rules we decided above fail for @samp{baz}: we generate three
+padding tokens, one per macro invocation, before the token @samp{baz}.
+We would then have it take its spacing from the first of these, which
+carries source token @samp{foo} with no leading space.
+
+It is vital that cpplib get spacing correct in these examples since any
+of these macro expansions could be stringified, where spacing matters.
+
+So, this demonstrates that not just entering macro and argument
+expansions, but leaving them requires special handling too.  I made
+cpplib insert a padding token with a @code{NULL} source token when
+leaving macro expansions, as well as after each replaced argument in a
+macro's replacement list.  It also inserts appropriate padding tokens on
+either side of tokens created by the @samp{#} and @samp{##} operators.
+I expanded the rule so that, if we see a padding token with a
+@code{NULL} source token, @emph{and} that source token has no leading
+space, then we behave as if we have seen no padding tokens at all.  A
+quick check shows this rule will then get the above example correct as
+well.
+
+Now a relationship with paste avoidance is apparent: we have to be
+careful about paste avoidance in exactly the same locations we have
+padding tokens in order to get white space correct.  This makes
+implementation of paste avoidance easy: wherever the stand-alone
+preprocessor is fixing up spacing because of padding tokens, and it
+turns out that no space is needed, it has to take the extra step to
+check that a space is not needed after all to avoid an accidental paste.
+The function @code{cpp_avoid_paste} advises whether a space is required
+between two consecutive tokens.  To avoid excessive spacing, it tries
+hard to only require a space if one is likely to be necessary, but for
+reasons of efficiency it is slightly conservative and might recommend a
+space where one is not strictly needed.
 
 @node Line Numbering
 @unnumbered Line numbering
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]