This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

cpplib project web page update


This is a suggested refresh to cover progress over the past 3 months.
Anything I've missed or got wrong, Zack?

Neil.

--- proj-cpplib.html	Fri Mar 17 08:28:09 2000
+++ proj-cpplib.new.html	Fri May  5 17:56:09 2000
@@ -1,12 +1,5 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 
-
-
-
-
-
-
-
 <html>
 
 <head>
@@ -74,31 +67,65 @@
 href="mailto:gcc-bugs@gcc.gnu.org">gcc-bugs@gcc.gnu.org</a>.  But
 please read the rest of this document first!
 
-<h2>Known Bugs</h2>
+<h2>Fixed Bugs and Nits</h2>
+
+<p>These have either been fixed in the latest snapshot, or are
+awaiting uncommenting of the relevent code.
 
 <p><ol>
   <li>Under some conditions the line numbers seen by the compiler
-      proper are incorrect.  It shows up most obviously as bad line
-      numbers in warnings when bootstrapping the compiler.  I have not
-      been able to reproduce this with an input file of less than a
-      couple thousand lines.  Help would be greatly appreciated.
-
-  <li>cpplib will silently mangle input files containing ASCII NUL.
-      The cause of the bug is well known, but we weren't able to come
-      to consensus on what to do about it.  My personal preference is
-      to issue a warning and strip the NUL; other people feel it
-      should be preserved or considered a hard error.
+      proper were incorrect.
+
+  <li>cpplib used to silently mangle input files containing ASCII NUL.
+      Handling now depends on the context.  In comments, they are
+      ignored.  In string and character constants, they are warned
+      about but preserved.  Anywhere else they are treated as whitespace,
+      and a warning emitted.
+
+  <li>Trigraphs no longer provoke warnings within comments.
+
+  <li>C89 Amendment 1 "alternate spellings" of punctuators are now
+      recognized. These are
+<pre>		&lt;:  :&gt;  &lt;%  %&gt;  %:  %:%:</pre>
+      which correspond, respectively, to
+<pre>		[   ]   {   }   #   ##</pre>
+
+  <li>Someone once requested warnings about stray whitespace in the
+      input under various circumstances.  With -traditional, cpplib
+      now warns about directives with initial whitespace that were
+      available before c89, and conversely warns about other
+      directives unavailable at that time without initial whitespace.
+      <p> Additionally, cpplib now warns if pure whitespace separates
+      a backslash from a subsequent newline character.  This looks
+      like a line continuation sequence, but isn't.
+
+  <li>The handling of <code>#define</code> and <code>#if</code> now
+      uses the same lexical analysis code as the rest of cpplib.  This
+      is essential to adding support for the new preprocessor features
+      in C9x and C89 Amendment 1.
+
+  <li>cpplib no longer makes two separate passes over the input file,
+      which hopefully will lead to improved performance.
+
+  <li>The code is now stricter in its use of <code>char *</code> and
+      <code>unsigned char *</code> for improved consistency.
+
+  <li>The lexer now parses tokens a logical line at a time.  The
+      resulting token lists should be useable pretty much directly by
+      the C or C++ front ends, so when linked up they won't have to do
+      any rescanning of tokens.
 
+</ol>
+
+<h2>Known Bugs</h2>
+
+<p><ol>
   <li>Character sets that are <em>not</em> strict supersets of ASCII
       may cause cpplib to mangle the input file, even in comments or
       strings.  Unfortunately, that includes important character sets
       such as Shift JIS and UCS2.  (Please see the discussion of <a
       href="#charset">character set issues</a>, below.)
 
-  <li>Trigraphs provoke warnings everywhere in the input file, even in
-      comments.  This is obnoxious, but difficult to fix due to the
-      brain-dead semantics of trigraphs and backslash-newline.
-
   <li>Code that does perverse things with directives inside macro
       arguments can cause the preprocessor to dump core.  cccp dealt
       with this by disallowing all directives there, but it might be
@@ -109,15 +136,6 @@
 <h2>Missing User-visible Features</h2>
 
 <p><ol>
-
-  <li>C89 Amendment 1 "alternate spellings" of punctuators are not
-      recognized. These are
-<pre>		&lt;:  :&gt;  &lt;%  %&gt;  %:  %:%:</pre>
-      which correspond, respectively, to
-<pre>		[   ]   {   }   #   ##</pre>
-      The preprocessor must be aware of all of them, even though it
-      uses only <code>%:</code> and <code>%:%:</code> itself.
-
   <li>Character sets that are strict supersets of ASCII are safe to
       use, but extended characters cannot appear in identifiers.  This
       has to be coordinated with the front end, and requires library
@@ -144,11 +162,6 @@
       and not in a reloadable format.  The front end must cooperate
       also.
 
-  <li>Someone once requested warnings about stray whitespace in the
-      input, notably trailing whitespace after a backslash.  If that
-      happens, you have something that looks like a line-continuation
-      backslash, but isn't.
-
   <li>Better support for languages other than C would be nice.  People
       want to preprocess Fortran, Chill, and assembly language.  Chill
       has been kludged in, Fortran and assembly still have serious
@@ -165,18 +178,6 @@
 <h2>Internal work that needs doing</h2>
 
 <ol>
-  <li>The handling of <code>#define</code> and <code>#if</code> must
-      be fixed so it uses the same lexical analysis code as the rest of
-      cpplib (i.e. <code>cpp_get_token</code>).  This is essential to
-      adding support for the new preprocessor features in C9x and C89
-      Amendment 1.
-
-  <li>cpplib makes two separate passes over the input file, which
-      causes a number of headaches, such as the trigraph warnings
-      inside comments.  It's also a performance problem.  Semantic
-      issues make a one-pass lexer impractical, but a two pass scheme
-      with the first pass called coroutine fashion from the first
-      should work better.
 
   <li>The macro expander could use a total rewrite.  We currently
       re-tokenize macros every time they are expanded.  It'd be better
@@ -188,10 +189,6 @@
       and <code>long</code> are used interchangeably; this is worse,
       but I think most of the instances have been removed.
 
-  <li>Likewise, the code uses <code>char *</code>, <code>unsigned char
-      *</code>, and <code>U_CHAR *</code> interchangeably.  This is
-      more of a consistency issue and annoyance than a real problem.
-
   <li>VMS support has suffered extreme bit rot.  There may be problems
       with support for DOS, Windows, MVS, and other non-Unixy
       platforms.  I can fix none of these myself.
@@ -205,18 +202,10 @@
 
 <ol>
 
-  <li>The lexer should do more work - enough that when cpplib is
-      linked into the C or C++ front end, the front end doesn't have
-      to do any rescanning of tokens.
-
   <li>The library interface needs to be tidied up.  Internal
       implementation details are exposed all over the place.
       Extracting all the information the library provides is
       difficult.  
-
-  <li><code>cpp_get_token</code> must be changed to return exactly one
-      token per invocation.  For performance, there should be a
-      <code>cpp_get_tokens</code> call that returns a lineful.
 
   <li>Front ends need to use cpplib's line and column numbering
       interface directly.  cpplib needs to stop inserting #line

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]