This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RFC: cpplib: multiline strings


I'd like to get rid of them to be honest, but I know that's not an
option.  Instead, I'd like to propose that we make it undefined
behaviour if the first non-whitespace (in the C sense, i.e. including
C-style comments) character on the second or subsequent lines of a
multiline string is a # (or its digraphed equivalent).  Thus

"foo
bar"

is fine, but

"foo
# bar"

gives undefined behaviour.  This is no great loss: it can be re-written

"foo
" "# bar"

instead.

The reason for this is that they prevent an important optimization in
the preprocessor that could realize non-trivial speed gains. If we are
skipping through the false part of a preprocessor conditional, the
standard says we are only to analyse the skipped section for
conditionals that might terminate the skipping, such as #else and
#endif.

Currently the preprocessor tokenizes everything - and this is a lot of
overhead.  It includes trigraph checking and replacement, memory
allocation and copying for each string / number / identifier, hashing
identifiers etc. etc.  Instead, it should just be able to scan for
(unescaped) newlines, and see if the first token after the newline is
a CPP_HASH, i.e. starts a directive.  If not, or it's an uninteresting
directive like #define, we can just skip the line without having to
tokenize it.

...except we can't, because we might be in the middle of a multi-line
string (aargh, damned extensions).  Hence the request, because
unfortunately the only clean and simple way to find and keep track of
muti-line strings when skipping is to do a full tokenization of the
skipped region.

I'm investigating (again) a new lexer for cpplib, which would allow
this optimization to be easily implemented (it is impractical in the
current lexer), but things like

"Some directives, like #else or
#endif can terminate skipped blocks"

would thwart it.

Neil.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]