This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ideas for cpplib


On Tue, 13 Oct 1998 10:27:31 -0700, Per Bothner wrote:
>> A trial implementation didn't speed up noticeably.  This is because
>> cpplib is bound by directory search and I/O in my usual test
>> (compiling glibc).  A less disk-intensive compile might benefit more.
>
>It is well-known that cpplib's files searching mechanism is bad.
>I basically copied the approach from cccp.c, which had lots of
>redundancies and overheads.   Then in December 1995 I and Paul
>Eggert re-did how this was handled for cccp, but we never got
>around to do the similar fixes to cpplib.
>
>I believe the first thing to do if you want to improve cpplib
>performance is merge in the data structure and improvements
>that were made to cccp around December 1995.  (It's a looong
>ChangeLog entry.)

I can't find this ChangeLog.  gcc 2.7.2 stops at Nov 95 and CVS only has
back to 1997 or so.

Current cccp does almost as badly as cpplib on glibc.  Short of saving
stat() information across compiles, I don't think anything will help much;
the problem is having to examine something like 30 directories on every
#include, most of which don't have the header it's looking for.  Multiply by
ten to twenty incestuous headers per source file and 2000 source files,
and...

>> * Profiling indicates an overwhelming amount of the CPU time consumed
>> by cpplib is in adjust_position() and its immediate callers
>> (update_position, cpp_buf_line_and_col). These functions could go
>> away if pfile->buffer->lineno were kept up-to-date in all the places
>> that read from the buffer.
>
>Please note that then you also have to keep colno up-to-date.
>It might be better to just note the most recent previous '\n';
>that way you need calculate the column number only when requested.
>
>>One problem with doing backslash-newline in a pre-scan is that you lose the
>>information you need to get line numbers correct.
>
>Please remember that one of the goals of cpplib is to get correct
>*column number* information, not only line numbers.  Many of the
>cpplib design decisions are because I think column numbers are
>important.  Note that for a stand-alone cpp program it would
>probably be too painful to pass column numbers to cc1/cc1plus,
>but the goal is that when cpplib is integrated directly into
>the C/C++ lexer, we should be able to get correct column
>numbers in error messages, etc.

Hm.  Trouble with that is that backslash-newline can appear anywhere, even
in the middle of a token.  I thought about replacing it with an escape
that meant 'bump line number here', postponed till whitespace.  That would
be the only way to get lines and cols right in the middle of a long series
of escaped newlines.

The commonest use of escaped newlines is big hairy macros.  cc1 doesn't even
try to look into big hairy macros for error messages; you get the error on
the line where the macro was called.  If cpplib could fix that, that would
be nice.

>In the case of trigraphs, their use is so obscure that I would not
>be terribly upset if we don't get the column numbers right.
>Still, I would prefer to do the right thing, if it does not
>hurt performance in the normal case.  I think that is not too
>difficult.  One idea is to modify CPP_BUF_GET (and related
>macros) so that it calls a function when (BUFFER)->cur >= (BUFFER)->rlimit;
>that function can handle backslashes, trigraphs, digraphs,
>newline conversions, even Unicode conversion.  (I think cpplib
>should automatically handle any of CR, LR, and CRLF.)

I think I've got this handled in my safe_file_read rewrite already.

>> It turns out to be a huge memory win, because I don't read the file
>> all at once, I use stdio and enlarge the intermediate buffer as
>> necessary.  combine.c is 404K half of which is comments.  The current
>> cpp allocates a buffer the size of the file; my code needs only 252K.
>
>Modifying CPP_BUF_GET also allows you to play with different buffer
>policies.  On the other hand, I can't see that saving 152K is
>very important these days;  speed is more important.  It seems
>like memory mapping the input file would be much better.  That
>is another motivation for not modifying the input buffer.

On a 32Mb machine (pretty common these days) running X and 2.0 kernel,
things start hitting swap at around 2 megs of working set.  cc1 wants about
half of that, as wants two to five hundred K, and a big make instance (like
the ones in glibc) easily consumes the rest.  So a 152K memory win is
definitely worth it.

Ulrich says he wants to put mmap of read-only files into stdio, and I'd
rather not worry about whether we have mmap support inside cpplib.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]