This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the EGCS project.
Re: Patch to improve cpplib's C++ support
- To: Jason Merrill <jason@cygnus.com>
- Subject: Re: Patch to improve cpplib's C++ support
- From: Zack Weinberg <zack@rabi.columbia.edu>
- Date: Mon, 12 Jul 1999 22:11:05 -0400
- cc: egcs-patches@egcs.cygnus.com
On 12 Jul 1999 11:39:09 -0700, Jason Merrill wrote:
>>>>>> Zack Weinberg <zack@rabi.columbia.edu> writes:
>
> > On 12 Jul 1999 00:54:06 -0700, Jason Merrill wrote:
> >>>>>>> Zack Weinberg <zack@rabi.columbia.edu> writes:
> >>
> >> > However, I'm working on a revised API to cpplib that is going to behave
> >> > differently in both the affected areas. The big difference is that you
> >> > make one function call and get handed an entire line split up into
> >> > tokens. You then can walk through it at your leisure.
>
> > You don't have to know about this in the front-end unless you want to.
> > Think of it as a buffering mechanism to avoid having to call
> > cpp_get_token for every single token in the program - the overhead is
> > substantial.
>
>So how would the line be split up under your new scheme? It seems like the
>overhead would be caused by the tokenization, not the call itself or
>copying into token_buffer.
That was what I thought initially; then I profiled it...
cpp_get_token is a very large function with an expensive stack frame. Most
of the time it doesn't do much work. If you make it iterate through the big
switch a bunch of times, you amortize the call cost nicely. (I'm not
certain how this interacts with macro expansion yet.)
Also, if you scan an entire line finding all the token boundaries, and
then copy the entire line to a string buffer, it is ~25% faster than copying
as you go - the big thing here is that you halve the number of byte accesses
to main memory, I think.
This architecture should also let me tokenize macros once, at #define time.
Expansion becomes a trivial substitution operation (almost).
From the front end's point of view, the way it'll work is that cpp_get_token
will be a macro that walks down a list of tokens. Each buffer has its own
list. When you run out of tokens on a list, a refill function gets called,
which scans the next chunk of input. It's the same principle as stdio
input buffering.
> > Another thing: white space doesn't generate tokens in the new scheme.
> > Is that likely to cause problems?
>
>Nope, sounds good. Will that apply to newlines, too?
Newline still gets a token all to itself, but a series of lines that are
effectively blank (whitespace, comments, and directives that don't produce
output) will get just one token for the whole thing.
> >> > As for redirected_input_p, the plan is to make caller pop the buffer all
> >> > the time (except for macro buffers).
> >>
> >> Huh? Why should the frontend handle #includes?
>
> > Given that you need the ability to pop the buffer yourself sometimes, I
> > think it's cleaner to ask you to do it all the time. Compare "if you
> > get a POP token, call cpp_pop_buffer before calling cpp_get_token again"
> > to "if you get a POP token, you may or may not be expected to call
> > cpp_pop_buffer, depending on a magic flag whose name doesn't sound like
> > it has to do with buffer popping".
>
>Ihe flag can certainly be renamed, and the patch returns EOF at the end of
>an artificial buffer, not POP. It could be changed to return something
>unique, if that seems appropriate. It just seems to me that the frontend
>should see the same series of tokens whether it's reading from the original
>sources or from preprocessed output.
It doesn't do that now; you get a POP token at the end of every included
file when reading the original source, and not when rereading .i files. The
front ends ignore POP tokens now, so it doesn't matter. I could make it so
that you never saw a POP unless you asked for one for a specific buffer,
using ->redirected_input_p or a similar mechanism. Would that be
preferable?
What cpplib does when fed an already-preprocessed file is a separate issue,
and it's perhaps the number one reason why I don't recommend people use
--enable-c-cpplib right now. It runs the entire preprocessor over again.
This is dead wrong - you can write programs whose meaning will change
depending on whether they were preprocessed separately.
zw