This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RFC: cpplib: multiline strings
- To: gcc at gcc dot gnu dot org
- Subject: RFC: cpplib: multiline strings
- From: Neil Booth <NeilB at earthling dot net>
- Date: Sun, 20 Aug 2000 09:54:20 +0100
- Bcc: Neil Booth <neil at daikokuya dot demon dot co dot uk>
- Cc: Zack Weinberg <zack at wolery dot cumb dot org>
I'd like to get rid of them to be honest, but I know that's not an
option. Instead, I'd like to propose that we make it undefined
behaviour if the first non-whitespace (in the C sense, i.e. including
C-style comments) character on the second or subsequent lines of a
multiline string is a # (or its digraphed equivalent). Thus
"foo
bar"
is fine, but
"foo
# bar"
gives undefined behaviour. This is no great loss: it can be re-written
"foo
" "# bar"
instead.
The reason for this is that they prevent an important optimization in
the preprocessor that could realize non-trivial speed gains. If we are
skipping through the false part of a preprocessor conditional, the
standard says we are only to analyse the skipped section for
conditionals that might terminate the skipping, such as #else and
#endif.
Currently the preprocessor tokenizes everything - and this is a lot of
overhead. It includes trigraph checking and replacement, memory
allocation and copying for each string / number / identifier, hashing
identifiers etc. etc. Instead, it should just be able to scan for
(unescaped) newlines, and see if the first token after the newline is
a CPP_HASH, i.e. starts a directive. If not, or it's an uninteresting
directive like #define, we can just skip the line without having to
tokenize it.
...except we can't, because we might be in the middle of a multi-line
string (aargh, damned extensions). Hence the request, because
unfortunately the only clean and simple way to find and keep track of
muti-line strings when skipping is to do a full tokenization of the
skipped region.
I'm investigating (again) a new lexer for cpplib, which would allow
this optimization to be easily implemented (it is impractical in the
current lexer), but things like
"Some directives, like #else or
#endif can terminate skipped blocks"
would thwart it.
Neil.