This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ideas for cpplib


> A trial implementation didn't speed up noticeably.  This is because
> cpplib is bound by directory search and I/O in my usual test
> (compiling glibc).  A less disk-intensive compile might benefit more.

It is well-known that cpplib's files searching mechanism is bad.
I basically copied the approach from cccp.c, which had lots of
redundancies and overheads.   Then in December 1995 I and Paul
Eggert re-did how this was handled for cccp, but we never got
around to do the similar fixes to cpplib.

I believe the first thing to do if you want to improve cpplib
performance is merge in the data structure and improvements
that were made to cccp around December 1995.  (It's a looong
ChangeLog entry.)

> * Profiling indicates an overwhelming amount of the CPU time consumed
> by cpplib is in adjust_position() and its immediate callers
> (update_position, cpp_buf_line_and_col). These functions could go
> away if pfile->buffer->lineno were kept up-to-date in all the places
> that read from the buffer.

Please note that then you also have to keep colno up-to-date.
It might be better to just note the most recent previous '\n';
that way you need calculate the column number only when requested.

>One problem with doing backslash-newline in a pre-scan is that you lose the
>information you need to get line numbers correct.

Please remember that one of the goals of cpplib is to get correct
*column number* information, not only line numbers.  Many of the
cpplib design decisions are because I think column numbers are
important.  Note that for a stand-alone cpp program it would
probably be too painful to pass column numbers to cc1/cc1plus,
but the goal is that when cpplib is integrated directly into
the C/C++ lexer, we should be able to get correct column
numbers in error messages, etc.

In the case of trigraphs, their use is so obscure that I would not
be terribly upset if we don't get the column numbers right.
Still, I would prefer to do the right thing, if it does not
hurt performance in the normal case.  I think that is not too
difficult.  One idea is to modify CPP_BUF_GET (and related
macros) so that it calls a function when (BUFFER)->cur >= (BUFFER)->rlimit;
that function can handle backslashes, trigraphs, digraphs,
newline conversions, even Unicode conversion.  (I think cpplib
should automatically handle any of CR, LR, and CRLF.)

> It turns out to be a huge memory win, because I don't read the file
> all at once, I use stdio and enlarge the intermediate buffer as
> necessary.  combine.c is 404K half of which is comments.  The current
> cpp allocates a buffer the size of the file; my code needs only 252K.

Modifying CPP_BUF_GET also allows you to play with different buffer
policies.  On the other hand, I can't see that saving 152K is
very important these days;  speed is more important.  It seems
like memory mapping the input file would be much better.  That
is another motivation for not modifying the input buffer.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]