This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

The integrated preprocessor


This patch makes it possible to integrate cpplib into the C, C++, and
Objective C front ends.  Currently the preprocessor tokenizes
everything, then turns it back into text, writes it out to a file (or
pipe), and the compiler reads it in again and re-tokenizes it.  With
this patch, the parser can take the token stream directly from the
preprocessor, eliminating a substantial chunk of overhead.  In my
tests this is good for 1-10% speedup, depending on what you're doing
with it.  The improvement will be more noticeable if you use an
operating system with high I/O and multitasking overhead.

Further improvements are possible once integration is the default.
For example, we can share the identifier hash table between cpplib and
the compiler proper, which will save a hefty chunk of memory and
another 1-10% (no joke!) of compile time in some cases.  We can give
better error messages for bugs inside complex macros, and more
detailed debugging information.  And so on.

The patch is not quite ready for prime time.  I suspect it breaks some
obscure C++ front end features, and I know it breaks target-specific
pragmas.  Also, -g3 may not do anything more than -g.  It does,
however, pass the entire C and C++ test suite with and without
--enable-c-cpplib.  Caveat: there seems to be something horribly wrong
with C++ diagnostics right now, so this exact patch has not been run
through the C++ test suite; I believe the changes between it and the
iteration that was only affect plain C, but I Could Be Wrong.  Also I
may have missed a regression in the flood of expected unexpected C
failures.

You will note that the gperf tables for C and C++ keywords have been
removed.  Instead, keywords live in the normal hash table.  This may
seem like a strange choice.  It is preparation for sharing the hash
table with the preprocessor - all identifiers, keyword or not, have to
be looked up in that table, so we might as well store keyword-ness
there too.  It also simplifies the logic in yylex(), which is now a
thin wrapper around cpp_get_token() in the --enable-c-cpplib case.

That is, it's a thin wrapper in C, and a thick one in C++.  C++ does a
fair amount of black magic manipulations of tokens in between reading
them in and handing them to yyparse().  That code - cp/lex.c,
cp/spew.c, cp/input.c - has been heavily rewritten.  It does pass the
C++ test suite, as I have said, but beware.  The common logic of
numeric constant, character constant, and string literal translation
is now shared between C and C++.

As a side effect of those changes, RID_* constants had to be defined
for every reserved word in all three languages, and they all had to be
merged into the same enumeration.

c-pragma.c has been rewritten from scratch.  On the up side, it is now
comprehensible.  On the down side, it now insists on routines from
c-lex.c, which means Chill can't use it anymore.  I doubt anyone
cares.  (Fortran threatens to use it, but that code has been ifdefed
out since forever.)

As previously mentioned, target-specific pragmas are broken.  To be
specific, the HANDLE_PRAGMA() macro will be silently ignored (not
HANDLE_SYSV_PRAGMA or HANDLE_PRAGMA_PACK(_PUSH_POP).  I will be
replacing this with a new mechanism, which will work like the new
c-pragma.c.  The affected targets are: arm, c4x, h8300, i370, i960,
sh, and v850.

If you use --enable-c-cpplib, you will find that -traditional is
ignored, and -save-temps does not get you an .i file.  This is because
I have not yet done the specs hackery to get them to work.  It's not
hard, just a moderate pain.

Some of the dependencies in the Makefiles may be incorrect.

I do not recommend using --enable-c-cpplib in production yet.  The
patch by itself should be safe as long as you don't need target
specific pragmas or a couple of C++ features (-fdetailed-statistics
and whatever the GNU_xref_foo() routines do).  All the testing you can
manage, in either mode, will be greatly appreciated.

zw

	* Makefile.in: Kill all references to c-parse.gperf,
	c-gperf.h, and c-parse.h.  Remove -d from yacc command line.

	* cpphash.h (IN_I): New directive flag.
	* cpplib.c (DIRECTIVE_TABLE): Mark #define, #undef, #pragma,
	and #ident with IN_I.
	(_cpp_check_directive): Obey IN_I directives even if
	-fpreprocessed.  Do not issue any warnings in that case.
	* cpplex.c (_cpp_get_token): Expand no macros if -fpreprocessed.

	* c-lex.c: Don't include c-parse.h.  Do include timevar.h.
	Elide lots of unnecessary code if USE_CPPLIB.  Delete code
	rendered unnecessary by new architecture.  Move routines not
	shared with C++ to c-parse.in.  Maintain a local idea of the
	line number.
	[USE_CPPLIB]: Declare and register callbacks for #ident and
	for entering/leaving files.
	(init_c_lex, c_lex): Are now the entry points to this file.
	(check_newline): Break out directive handling to
	process_directive.
	(read_ucs, is_extended_char, utf8_extend_token): Moved here 
	from C++ front end.
	(readescape, parse_float): Overhaul.
	(lex_number, lex_string, lex_charconst): Break out of c_lex
	(n'ee yylex).
	* c-lex.h: Update prototypes.

	* c-tree.h (struct lang_identifier): Add rid_code and
	rid_yycode fields.
	(C_IS_RESERVED_WORD, C_RID_CODE, C_RID_YYCODE): New.

	* c-pragma.c: Rewrite parsing logic to fit with cpplib's
	#pragma registry.  Provide dummy implementation of that
	interface if !USE_CPPLIB.
	* c-pragma.h: Update to match.

	* c-common.c (parse_options, cpp_token): Don't declare.
	(yy_cur, yy_lim, GETC, UNGETC, yy_get_token): Delete.
	(get_directive_line): Kill USE_CPPLIB variants.
	* c-common.h: Define RID_ constants for every keyword in C,
	C++, and Objective C.
	(extract_interface_info): Declare.
	* c-decl.c (c_decode_option): Recognize -lang-objc here.
	(print_lang_identifier): Report reserved words as such.
	(grokdeclarator): Update for new RID scheme.
	(extract_interface_info): Define a dummy.
	* c-lang.c (yy_cur, parse_options): Don't declare.
	(lang_init_options [USE_CPPLIB]): Call cpp_init, not cpp_options_init.
	(lang_init): Don't call check_newline if USE_CPPLIB.
	* c-parse.in: Include c-pragma.h. Remove unnecesary calls to
	reinit_parse_for_function and/or position_after_white_space.
	(save_filename, save_lineno): Look ahead before saving.
	(reservedwords): No need to call get_identifier.
	(init_parse, finish_parse, yyerror, yylex, yyprint,
	make_pointer_declarator): Are now here for C/ObjC.

	* c-gperf.h, c-parse.gperf: Delete.

	* gcc.c (C specs): Use %(trad_capable_cpp) for -E|-M|-MM case
	#if USE_CPPLIB.
	* timevar.def (TV_CPP, TV_LEX): New.

cp:
	* Make-lang.in, Makefile.in: Remove all references to input.c,
	gxx.gperf, and hash.h.  Add ../c-lex.o to C_OBJS.
	* gxx.gperf, hash.h, input.c: Delete.
	* lang-specs.h: Pass -lang-c++ to cc1plus so cpplib is
	initialized properly.

	* class.c (fixup_pending_inline): Take a tree, not a
	struct pending_inline *.  All callers changed.
	(init_class_processing): Set RID_PUBLIC, RID_PRIVATE,
	RID_PROTECTED entries in ridpointers[] array here.
	* decl.c (duplicate_decls): Do not refer to struct
	pending_inline.
	(record_builtin_type, init_decl_processing): Use RID_MAX not
	CP_RID_MAX.
	(grokdeclarator): Use C_IS_RESERVED_WORD.
	* decl2.c (lang_decode_option): Ignore -lang-c++ for sake of
	cpplib.
	(grok_x_components): Do not inspect pending_inlines chain.

	* cp-tree.h (struct lang_identifier): Add rid_code and
	rid_yycode entries.
	(C_IS_RESERVED_WORD, C_RID_CODE, C_RID_YYCODE): New.
	(flag_no_gnu_keywords, flag_operator_names): Declare.
	(DEFARG_LENGTH, struct pending_inline): Kill.
	Update prototypes.
	* lex.h: Expunge cp_rid.  Rewrite RIDBIT macros to use just a
	single 32-bit word.
	* parse.y: Call do_pending_inlines unconditionally.
	reinit_parse_for_method is now snarf_method.  fn.defpen is no
	longer necessary.  Remove unnecessary <itype> annotation on
	SCOPE.  Do not refer to end_of_file or struct pending_inline.
	* semantics.c (begin_inline_definitions): Call
	do_pending_inlines unconditionally.

	* lex.c: Remove all code now shared with C front end.
	Initialize cpplib properly if USE_CPPLIB.  Put reserved words
	into the get_identifier table.  Rewrite pragma handling to
	work with the registry.  Move code to save tokens for later
	processing to spew.c.

	* spew.c: Rewrite everything in terms of token streams instead
	of text.  Move routines here from lex.c / input.c as
	appropriate.  GC-mark trees hanging off the pending inlines
	chain.

objc:
	* lang-specs.h: Use %(trad_capable_cpp) for -E|-M|-MM case
	#if USE_CPPLIB.
	* objc-act.c: Don't mention yy_cur or parse_options.
	Initialize cpplib properly.  Force lineno to 0 after first
	call to check_newline.  Don't handle -lang-objc here.
	Move forget_protocol_qualifiers and
	remember_protocol_qualifiers here.

testsuite:
	* g++.old-deja/g++.benjamin/13478.C: Put meaningful tags on
	ERROR markers.
	* g++.old-deja/g++.brendan/crash8.C: Move ERROR marker up one line.
	* gcc.dg/cpp/unc1.c, gcc.dg/cpp/unc2.c, gcc.dg/cpp/unc3.c:
	Add dg-do preprocess marker.
	* gcc.dg/cpp/unc4.c: Adjust line number in dg-error line.
	* gcc.dg/noncompile/const-ll-1.c: Generalize error regexp.

d.intpp.bz2


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]