This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: ideas for cpplib
- To: Zack Weinberg <zack at rabi dot columbia dot edu>
- Subject: Re: ideas for cpplib
- From: Per Bothner <bothner at cygnus dot com>
- Date: Tue, 13 Oct 1998 10:27:31 -0700
- cc: Dave Brolley <brolley at cygnus dot com>, egcs at cygnus dot com
> A trial implementation didn't speed up noticeably. This is because
> cpplib is bound by directory search and I/O in my usual test
> (compiling glibc). A less disk-intensive compile might benefit more.
It is well-known that cpplib's files searching mechanism is bad.
I basically copied the approach from cccp.c, which had lots of
redundancies and overheads. Then in December 1995 I and Paul
Eggert re-did how this was handled for cccp, but we never got
around to do the similar fixes to cpplib.
I believe the first thing to do if you want to improve cpplib
performance is merge in the data structure and improvements
that were made to cccp around December 1995. (It's a looong
ChangeLog entry.)
> * Profiling indicates an overwhelming amount of the CPU time consumed
> by cpplib is in adjust_position() and its immediate callers
> (update_position, cpp_buf_line_and_col). These functions could go
> away if pfile->buffer->lineno were kept up-to-date in all the places
> that read from the buffer.
Please note that then you also have to keep colno up-to-date.
It might be better to just note the most recent previous '\n';
that way you need calculate the column number only when requested.
>One problem with doing backslash-newline in a pre-scan is that you lose the
>information you need to get line numbers correct.
Please remember that one of the goals of cpplib is to get correct
*column number* information, not only line numbers. Many of the
cpplib design decisions are because I think column numbers are
important. Note that for a stand-alone cpp program it would
probably be too painful to pass column numbers to cc1/cc1plus,
but the goal is that when cpplib is integrated directly into
the C/C++ lexer, we should be able to get correct column
numbers in error messages, etc.
In the case of trigraphs, their use is so obscure that I would not
be terribly upset if we don't get the column numbers right.
Still, I would prefer to do the right thing, if it does not
hurt performance in the normal case. I think that is not too
difficult. One idea is to modify CPP_BUF_GET (and related
macros) so that it calls a function when (BUFFER)->cur >= (BUFFER)->rlimit;
that function can handle backslashes, trigraphs, digraphs,
newline conversions, even Unicode conversion. (I think cpplib
should automatically handle any of CR, LR, and CRLF.)
> It turns out to be a huge memory win, because I don't read the file
> all at once, I use stdio and enlarge the intermediate buffer as
> necessary. combine.c is 404K half of which is comments. The current
> cpp allocates a buffer the size of the file; my code needs only 252K.
Modifying CPP_BUF_GET also allows you to play with different buffer
policies. On the other hand, I can't see that saving 152K is
very important these days; speed is more important. It seems
like memory mapping the input file would be much better. That
is another motivation for not modifying the input buffer.
--Per Bothner
Cygnus Solutions bothner@cygnus.com http://www.cygnus.com/~bothner