This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
cpplib project web page update
- To: gcc-patches at gcc dot gnu dot org
- Subject: cpplib project web page update
- From: Neil Booth <NeilB at earthling dot net>
- Date: Fri, 5 May 2000 18:01:04 +0900
- Cc: Zack Weinberg <zack at wolery dot cumb dot org>
This is a suggested refresh to cover progress over the past 3 months.
Anything I've missed or got wrong, Zack?
Neil.
--- proj-cpplib.html Fri Mar 17 08:28:09 2000
+++ proj-cpplib.new.html Fri May 5 17:56:09 2000
@@ -1,12 +1,5 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
-
-
-
-
-
-
-
<html>
<head>
@@ -74,31 +67,65 @@
href="mailto:gcc-bugs@gcc.gnu.org">gcc-bugs@gcc.gnu.org</a>. But
please read the rest of this document first!
-<h2>Known Bugs</h2>
+<h2>Fixed Bugs and Nits</h2>
+
+<p>These have either been fixed in the latest snapshot, or are
+awaiting uncommenting of the relevent code.
<p><ol>
<li>Under some conditions the line numbers seen by the compiler
- proper are incorrect. It shows up most obviously as bad line
- numbers in warnings when bootstrapping the compiler. I have not
- been able to reproduce this with an input file of less than a
- couple thousand lines. Help would be greatly appreciated.
-
- <li>cpplib will silently mangle input files containing ASCII NUL.
- The cause of the bug is well known, but we weren't able to come
- to consensus on what to do about it. My personal preference is
- to issue a warning and strip the NUL; other people feel it
- should be preserved or considered a hard error.
+ proper were incorrect.
+
+ <li>cpplib used to silently mangle input files containing ASCII NUL.
+ Handling now depends on the context. In comments, they are
+ ignored. In string and character constants, they are warned
+ about but preserved. Anywhere else they are treated as whitespace,
+ and a warning emitted.
+
+ <li>Trigraphs no longer provoke warnings within comments.
+
+ <li>C89 Amendment 1 "alternate spellings" of punctuators are now
+ recognized. These are
+<pre> <: :> <% %> %: %:%:</pre>
+ which correspond, respectively, to
+<pre> [ ] { } # ##</pre>
+
+ <li>Someone once requested warnings about stray whitespace in the
+ input under various circumstances. With -traditional, cpplib
+ now warns about directives with initial whitespace that were
+ available before c89, and conversely warns about other
+ directives unavailable at that time without initial whitespace.
+ <p> Additionally, cpplib now warns if pure whitespace separates
+ a backslash from a subsequent newline character. This looks
+ like a line continuation sequence, but isn't.
+
+ <li>The handling of <code>#define</code> and <code>#if</code> now
+ uses the same lexical analysis code as the rest of cpplib. This
+ is essential to adding support for the new preprocessor features
+ in C9x and C89 Amendment 1.
+
+ <li>cpplib no longer makes two separate passes over the input file,
+ which hopefully will lead to improved performance.
+
+ <li>The code is now stricter in its use of <code>char *</code> and
+ <code>unsigned char *</code> for improved consistency.
+
+ <li>The lexer now parses tokens a logical line at a time. The
+ resulting token lists should be useable pretty much directly by
+ the C or C++ front ends, so when linked up they won't have to do
+ any rescanning of tokens.
+</ol>
+
+<h2>Known Bugs</h2>
+
+<p><ol>
<li>Character sets that are <em>not</em> strict supersets of ASCII
may cause cpplib to mangle the input file, even in comments or
strings. Unfortunately, that includes important character sets
such as Shift JIS and UCS2. (Please see the discussion of <a
href="#charset">character set issues</a>, below.)
- <li>Trigraphs provoke warnings everywhere in the input file, even in
- comments. This is obnoxious, but difficult to fix due to the
- brain-dead semantics of trigraphs and backslash-newline.
-
<li>Code that does perverse things with directives inside macro
arguments can cause the preprocessor to dump core. cccp dealt
with this by disallowing all directives there, but it might be
@@ -109,15 +136,6 @@
<h2>Missing User-visible Features</h2>
<p><ol>
-
- <li>C89 Amendment 1 "alternate spellings" of punctuators are not
- recognized. These are
-<pre> <: :> <% %> %: %:%:</pre>
- which correspond, respectively, to
-<pre> [ ] { } # ##</pre>
- The preprocessor must be aware of all of them, even though it
- uses only <code>%:</code> and <code>%:%:</code> itself.
-
<li>Character sets that are strict supersets of ASCII are safe to
use, but extended characters cannot appear in identifiers. This
has to be coordinated with the front end, and requires library
@@ -144,11 +162,6 @@
and not in a reloadable format. The front end must cooperate
also.
- <li>Someone once requested warnings about stray whitespace in the
- input, notably trailing whitespace after a backslash. If that
- happens, you have something that looks like a line-continuation
- backslash, but isn't.
-
<li>Better support for languages other than C would be nice. People
want to preprocess Fortran, Chill, and assembly language. Chill
has been kludged in, Fortran and assembly still have serious
@@ -165,18 +178,6 @@
<h2>Internal work that needs doing</h2>
<ol>
- <li>The handling of <code>#define</code> and <code>#if</code> must
- be fixed so it uses the same lexical analysis code as the rest of
- cpplib (i.e. <code>cpp_get_token</code>). This is essential to
- adding support for the new preprocessor features in C9x and C89
- Amendment 1.
-
- <li>cpplib makes two separate passes over the input file, which
- causes a number of headaches, such as the trigraph warnings
- inside comments. It's also a performance problem. Semantic
- issues make a one-pass lexer impractical, but a two pass scheme
- with the first pass called coroutine fashion from the first
- should work better.
<li>The macro expander could use a total rewrite. We currently
re-tokenize macros every time they are expanded. It'd be better
@@ -188,10 +189,6 @@
and <code>long</code> are used interchangeably; this is worse,
but I think most of the instances have been removed.
- <li>Likewise, the code uses <code>char *</code>, <code>unsigned char
- *</code>, and <code>U_CHAR *</code> interchangeably. This is
- more of a consistency issue and annoyance than a real problem.
-
<li>VMS support has suffered extreme bit rot. There may be problems
with support for DOS, Windows, MVS, and other non-Unixy
platforms. I can fix none of these myself.
@@ -205,18 +202,10 @@
<ol>
- <li>The lexer should do more work - enough that when cpplib is
- linked into the C or C++ front end, the front end doesn't have
- to do any rescanning of tokens.
-
<li>The library interface needs to be tidied up. Internal
implementation details are exposed all over the place.
Extracting all the information the library provides is
difficult.
-
- <li><code>cpp_get_token</code> must be changed to return exactly one
- token per invocation. For performance, there should be a
- <code>cpp_get_tokens</code> call that returns a lineful.
<li>Front ends need to use cpplib's line and column numbering
interface directly. cpplib needs to stop inserting #line