This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Precompiled headers (or other speed ups)


On Wed, Jan 10, 2001 at 05:59:36PM -0000, David Korn wrote:
> 
> ><<  Well, only really surprising if you assume the parsing code is well
> >written :-) After all, the mid/backend is mostly doing memory-memory
> >transforms on the trees, and that's going to run as fast as your cache
> >and cpu can handle.  Parsers, OTOH, have to I/O.  Badly written parsers
> >might even use c stdio functions to fgetc the entire file one char at a
> >time....
> >>>
> >
> >Sure, but that presumably is an easy thing to fix. In GNAT, the entire
> >file is read with a single read, and then all access to the file is to
> >the in memory copy, which stays around throughout the compilation.
> 
>   Oh yeah; I was just suggesting it wouldn't be *surprising*, not that it
> would be 'right'!  Last time I looked at bison (or was it lex or yacc? I
> forget) was around '95 and it did tend to turn out fairly sucky code (in
> terms of f-stdio) if you didn't know what you were doing and how to hook
> it into a memory buffer.  I haven't looked at the gcc parsers yet but I
> would hope that they don't have such obvious design flaws.....
> 
>   Another thing that could eat time in a parser would be if there were
> numerous shift-reduce conflicts that caused a lot of backtracking on fairly
> common constructs.  That's just a for-example also :)

C++ reads files through cpplib, which is naturally as fast as I and
Neil Booth have been able to make it.  It does indeed read the entire
file at once and process it in memory.  The tokenizer is, in my tests,
consistently faster than the analogue in GCC 2.95.  Macro expansion
can still be a bottleneck under some conditions.

The parser, on the other hand, is not efficient at all.  By the parser
I mean specifically the code which takes in the token stream from
cpplib and generates a tree representation.  We spend a great deal of
time in yyparse itself, which I suspect means there is indeed a lot of
backtracking involved.  Its immediate subroutines are expensive also.

I don't know enough about C++ or its implementation in GCC to say for
sure what the ultimate cause is.  I've been told that C++ has an
extremely ambiguous grammar and the parser has to accumulate entire
statements on the stack before it can begin to match anything.
There's an ad-hoc layer in between the lexer and the parser
(cp/spew.c) whose purpose is to rearrange the order in which certain
constructs appear so they are easier to parse; you can imagine the
mess that would require that.

It isn't merely a matter of efficiency; there are some parser bugs in
the current implementation that have been pushed aside as too
difficult to fix.  A rewrite has been discussed numerous times, e.g. 
http://gcc.gnu.org/ml/gcc/2000-10/msg00573.html.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]