This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
The integrated preprocessor
- To: gcc-patches at gcc dot gnu dot org
- Subject: The integrated preprocessor
- From: Zack Weinberg <zack at wolery dot cumb dot org>
- Date: Wed, 23 Aug 2000 00:05:21 -0700
This patch makes it possible to integrate cpplib into the C, C++, and
Objective C front ends. Currently the preprocessor tokenizes
everything, then turns it back into text, writes it out to a file (or
pipe), and the compiler reads it in again and re-tokenizes it. With
this patch, the parser can take the token stream directly from the
preprocessor, eliminating a substantial chunk of overhead. In my
tests this is good for 1-10% speedup, depending on what you're doing
with it. The improvement will be more noticeable if you use an
operating system with high I/O and multitasking overhead.
Further improvements are possible once integration is the default.
For example, we can share the identifier hash table between cpplib and
the compiler proper, which will save a hefty chunk of memory and
another 1-10% (no joke!) of compile time in some cases. We can give
better error messages for bugs inside complex macros, and more
detailed debugging information. And so on.
The patch is not quite ready for prime time. I suspect it breaks some
obscure C++ front end features, and I know it breaks target-specific
pragmas. Also, -g3 may not do anything more than -g. It does,
however, pass the entire C and C++ test suite with and without
--enable-c-cpplib. Caveat: there seems to be something horribly wrong
with C++ diagnostics right now, so this exact patch has not been run
through the C++ test suite; I believe the changes between it and the
iteration that was only affect plain C, but I Could Be Wrong. Also I
may have missed a regression in the flood of expected unexpected C
failures.
You will note that the gperf tables for C and C++ keywords have been
removed. Instead, keywords live in the normal hash table. This may
seem like a strange choice. It is preparation for sharing the hash
table with the preprocessor - all identifiers, keyword or not, have to
be looked up in that table, so we might as well store keyword-ness
there too. It also simplifies the logic in yylex(), which is now a
thin wrapper around cpp_get_token() in the --enable-c-cpplib case.
That is, it's a thin wrapper in C, and a thick one in C++. C++ does a
fair amount of black magic manipulations of tokens in between reading
them in and handing them to yyparse(). That code - cp/lex.c,
cp/spew.c, cp/input.c - has been heavily rewritten. It does pass the
C++ test suite, as I have said, but beware. The common logic of
numeric constant, character constant, and string literal translation
is now shared between C and C++.
As a side effect of those changes, RID_* constants had to be defined
for every reserved word in all three languages, and they all had to be
merged into the same enumeration.
c-pragma.c has been rewritten from scratch. On the up side, it is now
comprehensible. On the down side, it now insists on routines from
c-lex.c, which means Chill can't use it anymore. I doubt anyone
cares. (Fortran threatens to use it, but that code has been ifdefed
out since forever.)
As previously mentioned, target-specific pragmas are broken. To be
specific, the HANDLE_PRAGMA() macro will be silently ignored (not
HANDLE_SYSV_PRAGMA or HANDLE_PRAGMA_PACK(_PUSH_POP). I will be
replacing this with a new mechanism, which will work like the new
c-pragma.c. The affected targets are: arm, c4x, h8300, i370, i960,
sh, and v850.
If you use --enable-c-cpplib, you will find that -traditional is
ignored, and -save-temps does not get you an .i file. This is because
I have not yet done the specs hackery to get them to work. It's not
hard, just a moderate pain.
Some of the dependencies in the Makefiles may be incorrect.
I do not recommend using --enable-c-cpplib in production yet. The
patch by itself should be safe as long as you don't need target
specific pragmas or a couple of C++ features (-fdetailed-statistics
and whatever the GNU_xref_foo() routines do). All the testing you can
manage, in either mode, will be greatly appreciated.
zw
* Makefile.in: Kill all references to c-parse.gperf,
c-gperf.h, and c-parse.h. Remove -d from yacc command line.
* cpphash.h (IN_I): New directive flag.
* cpplib.c (DIRECTIVE_TABLE): Mark #define, #undef, #pragma,
and #ident with IN_I.
(_cpp_check_directive): Obey IN_I directives even if
-fpreprocessed. Do not issue any warnings in that case.
* cpplex.c (_cpp_get_token): Expand no macros if -fpreprocessed.
* c-lex.c: Don't include c-parse.h. Do include timevar.h.
Elide lots of unnecessary code if USE_CPPLIB. Delete code
rendered unnecessary by new architecture. Move routines not
shared with C++ to c-parse.in. Maintain a local idea of the
line number.
[USE_CPPLIB]: Declare and register callbacks for #ident and
for entering/leaving files.
(init_c_lex, c_lex): Are now the entry points to this file.
(check_newline): Break out directive handling to
process_directive.
(read_ucs, is_extended_char, utf8_extend_token): Moved here
from C++ front end.
(readescape, parse_float): Overhaul.
(lex_number, lex_string, lex_charconst): Break out of c_lex
(n'ee yylex).
* c-lex.h: Update prototypes.
* c-tree.h (struct lang_identifier): Add rid_code and
rid_yycode fields.
(C_IS_RESERVED_WORD, C_RID_CODE, C_RID_YYCODE): New.
* c-pragma.c: Rewrite parsing logic to fit with cpplib's
#pragma registry. Provide dummy implementation of that
interface if !USE_CPPLIB.
* c-pragma.h: Update to match.
* c-common.c (parse_options, cpp_token): Don't declare.
(yy_cur, yy_lim, GETC, UNGETC, yy_get_token): Delete.
(get_directive_line): Kill USE_CPPLIB variants.
* c-common.h: Define RID_ constants for every keyword in C,
C++, and Objective C.
(extract_interface_info): Declare.
* c-decl.c (c_decode_option): Recognize -lang-objc here.
(print_lang_identifier): Report reserved words as such.
(grokdeclarator): Update for new RID scheme.
(extract_interface_info): Define a dummy.
* c-lang.c (yy_cur, parse_options): Don't declare.
(lang_init_options [USE_CPPLIB]): Call cpp_init, not cpp_options_init.
(lang_init): Don't call check_newline if USE_CPPLIB.
* c-parse.in: Include c-pragma.h. Remove unnecesary calls to
reinit_parse_for_function and/or position_after_white_space.
(save_filename, save_lineno): Look ahead before saving.
(reservedwords): No need to call get_identifier.
(init_parse, finish_parse, yyerror, yylex, yyprint,
make_pointer_declarator): Are now here for C/ObjC.
* c-gperf.h, c-parse.gperf: Delete.
* gcc.c (C specs): Use %(trad_capable_cpp) for -E|-M|-MM case
#if USE_CPPLIB.
* timevar.def (TV_CPP, TV_LEX): New.
cp:
* Make-lang.in, Makefile.in: Remove all references to input.c,
gxx.gperf, and hash.h. Add ../c-lex.o to C_OBJS.
* gxx.gperf, hash.h, input.c: Delete.
* lang-specs.h: Pass -lang-c++ to cc1plus so cpplib is
initialized properly.
* class.c (fixup_pending_inline): Take a tree, not a
struct pending_inline *. All callers changed.
(init_class_processing): Set RID_PUBLIC, RID_PRIVATE,
RID_PROTECTED entries in ridpointers[] array here.
* decl.c (duplicate_decls): Do not refer to struct
pending_inline.
(record_builtin_type, init_decl_processing): Use RID_MAX not
CP_RID_MAX.
(grokdeclarator): Use C_IS_RESERVED_WORD.
* decl2.c (lang_decode_option): Ignore -lang-c++ for sake of
cpplib.
(grok_x_components): Do not inspect pending_inlines chain.
* cp-tree.h (struct lang_identifier): Add rid_code and
rid_yycode entries.
(C_IS_RESERVED_WORD, C_RID_CODE, C_RID_YYCODE): New.
(flag_no_gnu_keywords, flag_operator_names): Declare.
(DEFARG_LENGTH, struct pending_inline): Kill.
Update prototypes.
* lex.h: Expunge cp_rid. Rewrite RIDBIT macros to use just a
single 32-bit word.
* parse.y: Call do_pending_inlines unconditionally.
reinit_parse_for_method is now snarf_method. fn.defpen is no
longer necessary. Remove unnecessary <itype> annotation on
SCOPE. Do not refer to end_of_file or struct pending_inline.
* semantics.c (begin_inline_definitions): Call
do_pending_inlines unconditionally.
* lex.c: Remove all code now shared with C front end.
Initialize cpplib properly if USE_CPPLIB. Put reserved words
into the get_identifier table. Rewrite pragma handling to
work with the registry. Move code to save tokens for later
processing to spew.c.
* spew.c: Rewrite everything in terms of token streams instead
of text. Move routines here from lex.c / input.c as
appropriate. GC-mark trees hanging off the pending inlines
chain.
objc:
* lang-specs.h: Use %(trad_capable_cpp) for -E|-M|-MM case
#if USE_CPPLIB.
* objc-act.c: Don't mention yy_cur or parse_options.
Initialize cpplib properly. Force lineno to 0 after first
call to check_newline. Don't handle -lang-objc here.
Move forget_protocol_qualifiers and
remember_protocol_qualifiers here.
testsuite:
* g++.old-deja/g++.benjamin/13478.C: Put meaningful tags on
ERROR markers.
* g++.old-deja/g++.brendan/crash8.C: Move ERROR marker up one line.
* gcc.dg/cpp/unc1.c, gcc.dg/cpp/unc2.c, gcc.dg/cpp/unc3.c:
Add dg-do preprocess marker.
* gcc.dg/cpp/unc4.c: Adjust line number in dg-error line.
* gcc.dg/noncompile/const-ll-1.c: Generalize error regexp.
d.intpp.bz2