This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

treelang patch part 4 of 6


+
+ @ifclear INTERNALS
+ This manual documents how to run and install @code{treelang},
+ as well as its new features and incompatibilities, and how to report
+ bugs.
+ It corresponds to the @value{which-treelang} version of
@code{treelang}.
+ @end ifclear
+ @ifclear USING
+ This manual documents how to maintain @code{treelang}, as well as its
+ new features and incompatibilities, and how to report bugs.  It
+ corresponds to the @value{which-treelang} version of @code{treelang}.
+ @end ifclear
+
+ @end ifinfo
+
+ @ifset DEVELOPMENT
+ @emph{Warning:} This document is still under development, and might
not
+ accurately reflect the @code{treelang} code base of which it is a
part.
+ @end ifset
+
+ @menu
+ * Copying::
+ * Contributors::
+ * GNU Free Documentation License::
+ * Funding::
+ * Getting Started::
+ * What is GNU Treelang?::
+ * Lexical Syntax::
+ * Parsing Syntax::
+ * Compiler Overview::
+ * TREELANG and GCC::
+ * Compiler::
+ * Other Languages::
+ * treelang internals::
+ * Open Questions::
+ * Bugs::
+ * Service::
+ * Projects::
+ * Index::
+
+ @detailmenu
+  --- The Detailed Node Listing ---
+
+ Other Languages
+
+ * Interoperating with C and C++::
+
+ treelang internals
+
+ * treelang files::
+ * treelang compiler interfaces::
+ * Hints and tips::
+
+ treelang compiler interfaces
+
+ * treelang driver::
+ * treelang main compiler::
+
+ treelang main compiler
+
+ * Interfacing to toplev.c::
+ * Interfacing to the garbage collection::
+ * Interfacing to the code generation code. ::
+
+ Reporting Bugs
+
+ * Sending Patches::
+
+ @end detailmenu
+ @end menu
+
+ @include gpl.texi
+
+ @include fdl.texi
+
+ @node Contributors
+
+ @unnumbered Contributors to GNU Treelang
+ @cindex contributors
+ @cindex credits
+
+ Treelang was based on 'toy' by Richard Kenner, and also uses code from

+ the GCC core code tree. Tim Josling first created the language and
+ documentation, based on the GCC Fortran compiler's documentation
+ framework.
+
+ @itemize @bullet
+ @item
+ The packaging and compiler portions of GNU Treelang are based largely
+ on the GCC compiler.
+ @xref{Contributors,,Contributors to GCC,GCC,Using and Maintaining
GCC},
+ for more information.
+
+ @item
+ There is no specific run-time library for treelang, other than the
+ standard C runtime.
+
+ @item
+ It would have been difficult to build treelang without access to
Joachim
+ Nadler's guide to writing a front end to GCC (written in German). A
+ translation of this document into English is available via the
+ CobolForGCC project or via the documentation links from the GCC home
+ page @uref{http://GCC.gnu.org}.
+ @end itemize
+
+ @include funding.texi
+
+ @node Getting Started
+ @chapter Getting Started
+ @cindex getting started
+ @cindex new users
+ @cindex newbies
+ @cindex beginners
+
+ Treelang is a sample language, useful only to help people understand
how
+ to implement a new language front end to GCC. It is not a useful
+ language in itself other than as an example or basis for building a
new
+ language. Therefore only language developers are likely to have an
+ interest in it.
+
+ This manual assumes familiarity with GCC, which you can obtain by
using
+ it and by reading the manual @samp{Using and Porting GCC}.
+
+ To install treelang, follow the GCC installation instructions,
+ taking care to ensure you specify treelang in the configure step.
+
+ If you're generally curious about the future of
+ @code{treelang}, see @ref{Projects}.
+ If you're curious about its past,
+ see @ref{Contributors}.
+
+ To see a few of the questions maintainers of @code{treelang} have,
+ and that you might be able to answer,
+ see @ref{Open Questions}.
+
+ @ifset USING
+ @node What is GNU Treelang?, Lexical Syntax, Getting Started, Top
+ @chapter What is GNU Treelang?
+ @cindex concepts, basic
+ @cindex basic concepts
+
+ GNU Treelang, or @code{treelang}, is designed initially as a free
+ replacement for, or alternative to, the 'toy' language, but which is
+ amenable to inclusion within the GCC source tree.
+
+ @code{treelang} is largely a cut down version of C, designed to
showcase
+ the features of the GCC code generation back end. Only those features
+ that are directly supported by the GCC code generation back end are
+ implemented. Features are implemented in a manner which is easiest and

+ clearest to implement. Not all or even most code generation back end
+ features are implemented. The intention is to add features
incrementally
+ until most features of the GCC back end are implemented in treelang.
+
+ The main features missing are structures, arrays and pointers.
+
+ A sample program follows:
+
+ @example
+ // function prototypes
+ // function 'add' taking two ints and returning an int
+ external_definition int add(int arg1, int arg2);
+ external_definition int subtract(int arg3, int arg4);
+ external_definition int first_nonzero(int arg5, int arg6);
+ external_definition int double_plus_one(int arg7);
+
+ // function definition
+ add
+ @{
+ // return the sum of arg1 and arg2
+   return arg1 + arg2;
+ @}
+
+
+ subtract
+ @{
+   return arg3 - arg4;
+ @}
+
+ double_plus_one
+ @{
+ // aaa is a variable, of type integer and allocated at the start of
the function
+   automatic int aaa;
+ // set aaa to the value returned from aaa, when passed arg7 and arg7
as the two parameters
+   aaa=add(arg7, arg7);
+   aaa=add(aaa, aaa);
+   aaa=subtract(subtract(aaa, arg7), arg7) + 1;
+   return aaa;
+ @}
+
+ first_nonzero
+ @{
+ // C-like if statement
+   if (arg5)
+     @{
+       return arg5;
+     @}
+   else
+     @{
+     @}
+   return arg6;
+ @}
+ @end example
+
+ @node Lexical Syntax, Parsing Syntax, What is GNU Treelang?, Top
+ @chapter Lexical Syntax
+ @cindex Lexical Syntax
+
+ Treelang programs consist of whitespace, comments, keywords and names.

+ @itemize @bullet
+
+ @item
+ Whitespace consists of the space character and the end of line
+ character. Tabs are not allowed. Line terminations are as defined by
the
+ standard C library. Whitespace is ignored except within comments,
+ and where it separates parts of the program. In the example below, A
and
+ B are two separate names separated by whitespace.
+
+ @smallexample
+ A B
+ @end smallexample
+
+ @item
+ Comments consist of @samp{//} followed by any characters up to the end

+ of the line. C style comments (/* */) are not supported. For example,
+ the assignment below is followed by a not very helpful comment.
+
+ @smallexample
+ x=1; // Set X to 1
+ @end smallexample
+
+ @item
+ Keywords consist of any reserved words or symbols as described
+ later. The list of keywords follows:
+
+ @smallexample
+ @{ - used to start the statements in a function
+ @} - used to end the statements in a function
+ ( - start list of function arguments, or to change the precedence of
operators in an expression
+ ) - end list or prioritised operators in expression
+ , - used to separate parameters in a function prototype or in a
function call
+ ; - used to end a statement
+ + - addition
+ - - subtraction
+ = - assignment
+ == - equality test
+ if - begin IF statement
+ else - begin 'else' portion of IF statement
+ static - indicate variable is permanent, or function has file scope
only
+ automatic - indicate that variable is allocated for the life of the
function
+ external_reference - indicate that variable or function is defined in
another file
+ external_definition - indicate that variable or function is to be
accessible from other files
+ int - variable is an integer (same as C int)
+ char - variable is a character (same as C char)
+ unsigned - variable is unsigned. If this is not present, the variable
is signed
+ return - start function return statement
+ void - used as function type to indicate function returns nothing
+ @end smallexample
+
+
+ @item
+ Names consist of any letter or "_" followed by any number of letters
or
+ numbers or "_". "$" is not allowed in a name. All names must be
globally
+ unique - the same name may not be used twice in any context - and must

+ not be a keyword. Names and keywords are case sensitive. For example:
+
+ @smallexample
+ a A _a a_ IF_X
+ @end smallexample
+
+ are all different names.
+
+ @end itemize
+
+ @node Parsing Syntax, Compiler Overview, Lexical Syntax, Top
+ @chapter Parsing Syntax
+ @cindex Parsing Syntax
+
+ Declarations are built up from the lexical elements described above. A

+ file may contain one of more declarations.
+
+ @itemize @bullet
+
+ @item
+ declaration: variable declaration OR function prototype OR function
declaration
+
+ @item
+ Function Prototype: storage type NAME ( parameter_list )
+
+ @smallexample
+ static int add (int a, int b)
+ @end smallexample
+
+ @item
+ variable_declaration: storage type NAME initial;
+
+ Example:
+
+ @smallexample
+ int temp1=1;
+ @end smallexample
+
+ A variable declaration can be outside a function, or at the start of a
function.
+
+ @item
+ storage: automatic OR static OR external_reference OR
external_definition
+
+ This defines the scope, duration and visibility of a function or
variable
+
+ @enumerate 1
+
+ @item
+ automatic: This means a variable is allocated at start of function and

+ released when the function returns. This can only be used for
variables
+ within functions. It cannot be used for functions.
+
+ @item
+ static: This means a variable is allocated at start of program and
+ remains allocated until the program as a whole ends. For a function,
it
+ means that the function is only visible within the current file.
+
+ @item
+ external_definition: For a variable, which must be defined outside a
+ function, it means that the variable is visible from other files. For
a
+ function, it means that the function is visible from another file.
+
+ @item
+ external_reference: For a variable, which must be defined outside a
+ function, it means that the variable is defined in another file. For a

+ function, it means that the function is defined in another file.
+
+ @end enumerate
+
+ @item
+ type: int OR unsigned int OR char OR unsigned char OR void
+
+ This defines the data type of a variable or the return type of a
function.
+
+ @enumerate a
+
+ @item
+ int: The variable is a signed integer. The function returns a signed
integer.
+
+ @item
+ unsigned int: The variable is an unsigned integer. The function
returns an unsigned integer.
+
+ @item
+ char: The variable is a signed character. The function returns a
signed character.
+
+ @item
+ unsigned char: The variable is an unsigned character. The function
returns an unsigned character.
+
+ @end enumerate
+
+ @item
+ parameter_list OR parameter [, parameter]...
+
+ @item
+ parameter: variable_declaration ,
+
+ The variable declarations must not have initialisations.
+
+ @item
+ initial: = value
+
+ @item
+ value: integer_constant
+
+ @smallexample
+ eg 1 +2 -3
+ @end smallexample
+
+ @item
+ function_declaration: name @{variable_declarations statements @}
+
+ A function consists of the function name then the declarations (if
any)
+ and statements (if any) within one pair of braces.
+
+ The details of the function arguments come from the function
+ prototype. The function prototype must precede the function
declaration
+ in the file.
+
+ @item
+ statement: if_statement OR expression_statement OR return_statement
+
+ @item
+ if_statement: if (expression) @{ statements @} else @{ statements @}
+
+ The first lot of statements is executed if the expression is
+ non-zero. Otherwise the second lot of statements is executed. Either
+ list of statements may be empty, but both sets of braces and the else
must be present.
+
+ @smallexample
+ if (a==b)
+ @{
+ // nothing
+ @}
+ else
+ @{
+ a=b;
+ @}
+ @end smallexample
+
+ @item
+ expression_statement: expression;
+
+ The expression is executed and any side effects, such
+
+ @item
+ return_statement: return expression_opt;
+
+ Returns from the function. If the function is void, the expression
must
+ be absent, and if the function is not void the expression must be
+ present.
+
+ @item
+ expression: variable OR integer_constant OR expression+expression OR
expression-expression
+  OR expression==expression OR (expression) OR variable=expression OR
function_call
+
+ An expression can be a constant or a variable reference or a
+ function_call. Expressions can be combined as a sum of two expressions

+ or the difference of two expressions, or an equality test of two
+ expresions. An assignment is also an expression. Expresions and
operator
+ precedence work as in C.
+
+ @item
+ function_call: function_name (comma_separated_expressions)
+
+ This invokes the function, passing to it the values of the expressions

+ as actual parameters.
+
+ @end itemize
+
+ @cindex compilers
+ @node Compiler Overview, TREELANG and GCC, Parsing Syntax, Top
+ @chapter Compiler Overview
+ treelang is run as part of the GCC compiler.
+
+ @itemize @bullet
+ @cindex source code
+ @cindex file, source
+ @cindex code, source
+ @cindex source file
+ @item
+ It reads a user's program, stored in a file and containing
instructions
+ written in the appropriate language (Treelang, C, and so on).  This
file
+ contains @dfn{source code}.
+
+ @cindex translation of user programs
+ @cindex machine code
+ @cindex code, machine
+ @cindex mistakes
+ @item
+ It translates the user's program into instructions a computer can
carry
+ out more quickly than it takes to translate the instructions in the
+ first place.  These instructions are called @dfn{machine code}---code
+ designed to be efficiently translated and processed by a machine such
as
+ a computer.  Humans usually aren't as good writing machine code as
they
+ are at writing Treelang or C, because it is easy to make tiny mistakes

+ writing machine code.  When writing Treelang or C, it is easy to make
+ big mistakes. But you can only make one mistake, because the compiler
+ stops after it finds any problem.
+
+ @cindex debugger
+ @cindex bugs, finding
+ @cindex @code{gdb}, command
+ @cindex commands, @code{gdb}
+ @item
+ It provides information in the generated machine code
+ that can make it easier to find bugs in the program
+ (using a debugging tool, called a @dfn{debugger},
+ such as @code{gdb}).
+
+ @cindex libraries
+ @cindex linking
+ @cindex @code{ld} command
+ @cindex commands, @code{ld}
+ @item
+ It locates and gathers machine code already generated to perform
actions
+ requested by statements in the user's program.  This machine code is
+ organized into @dfn{libraries} and is located and gathered during the
+ @dfn{link} phase of the compilation process.  (Linking often is
thought
+ of as a separate step, because it can be directly invoked via the
+ @code{ld} command.  However, the @code{gcc} command, as with most
+ compiler commands, automatically performs the linking step by calling
on
+ @code{ld} directly, unless asked to not do so by the user.)
+
+ @cindex language, incorrect use of
+ @cindex incorrect use of language
+ @item
+ It attempts to diagnose cases where the user's program contains
+ incorrect usages of the language.  The @dfn{diagnostics} produced by
the
+ compiler indicate the problem and the location in the user's source
file
+ where the problem was first noticed.  The user can use this
information
+ to locate and fix the problem.
+
+ The compiler stops after the first error. There are no plans to fix
+ this, ever, as it would vastly complicate the implementation of
treelang
+ to little or no benefit.
+
+ @cindex diagnostics, incorrect
+ @cindex incorrect diagnostics
+ @cindex error messages, incorrect
+ @cindex incorrect error messages
+ (Sometimes an incorrect usage of the language leads to a situation
where
+ the compiler can not make any sense of what it reads---while a human
+ might be able to---and thus ends up complaining about an incorrect
+ ``problem'' it encounters that, in fact, reflects a misunderstanding
of
+ the programmer's intention.)
+
+ @cindex warnings
+ @cindex questionable instructions
+ @item
+ There are no warnings in treelang. A program is either correct or in
+ error.
+ @end itemize
+
+ @cindex components of treelang
+ @cindex @code{treelang}, components of
+ @code{treelang} consists of several components:
+
+ @cindex @code{gcc}, command
+ @cindex commands, @code{gcc}
+ @itemize @bullet
+ @item
+ A modified version of the @code{gcc} command, which also might be
+ installed as the system's @code{cc} command.
+ (In many cases, @code{cc} refers to the
+ system's ``native'' C compiler, which
+ might be a non-GNU compiler, or an older version
+ of @code{GCC} considered more stable or that is
+ used to build the operating system kernel.)
+
+ @cindex @code{treelang}, command
+ @cindex commands, @code{treelang}
+ @item
+ The @code{treelang} command itself.
+
+ @item
+ The @code{libc} run-time library.  This library contains the machine
+ code needed to support capabilities of the Treelang language that are
+ not directly provided by the machine code generated by the
+ @code{treelang} compilation phase. This is the same library that the
+ main c compiler uses (libc).
+
+ @cindex @code{tree1}, program
+ @cindex programs, @code{tree1}
+ @cindex assembler
+ @cindex @code{as} command
+ @cindex commands, @code{as}
+ @cindex assembly code
+ @cindex code, assembly
+ @item
+ The compiler itself, is internally named @code{tree1}.
+
+ Note that @code{tree1} does not generate machine code directly---it
+ generates @dfn{assembly code} that is a more readable form
+ of machine code, leaving the conversion to actual machine code
+ to an @dfn{assembler}, usually named @code{as}.
+ @end itemize
+
+ @code{GCC} is often thought of as ``the C compiler'' only,
+ but it does more than that.
+ Based on command-line options and the names given for files
+ on the command line, @code{gcc} determines which actions to perform,
including
+ preprocessing, compiling (in a variety of possible languages),
assembling,
+ and linking.
+
+ @cindex driver, gcc command as
+ @cindex @code{gcc}, command as driver
+ @cindex executable file
+ @cindex files, executable
+ @cindex cc1 program
+ @cindex programs, cc1
+ @cindex preprocessor
+ @cindex cpp program
+ @cindex programs, cpp
+ For example, the command @samp{gcc foo.c} @dfn{drives} the file
+ @file{foo.c} through the preprocessor @code{cpp}, then
+ the C compiler (internally named
+ @code{cc1}), then the assembler (usually @code{as}), then the linker
+ (@code{ld}), producing an executable program named @file{a.out} (on
+ UNIX systems).
+
+ @cindex treelang program
+ @cindex programs, treelang
+ As another example, the command @samp{gcc foo.tree} would do much the
+ same as @samp{gcc foo.c}, but instead of using the C compiler named
+ @code{cc1}, @code{gcc} would use the treelang compiler (named
+ @code{tree1}). However there is no preprocessor for treelang.
+
+ @cindex @code{tree1}, program
+ @cindex programs, @code{tree1}
+ In a GNU Treelang installation, @code{gcc} recognizes Treelang source
+ files by name just like it does C and C++ source files.  It knows to
use
+ the Treelang compiler named @code{tree1}, instead of @code{cc1} or
+ @code{cc1plus}, to compile Treelang files. If a file's name ends in
+ @code{.tree} then GCC knows that the program is written in treelang.
You
+ can also manually override the language.
+
+ @cindex @code{gcc}, not recognizing Treelang source
+ @cindex unrecognized file format
+ @cindex file format not recognized
+ Non-Treelang-related operation of @code{gcc} is generally
+ unaffected by installing the GNU Treelang version of @code{gcc}.
+ However, without the installed version of @code{gcc} being the
+ GNU Treelang version, @code{gcc} will not be able to compile
+ and link Treelang programs.
+
+ @cindex printing version information
+ @cindex version information, printing
+ The command @samp{gcc -v x.tree} where @samp{x.tree} is a file which
+ must exist but whose contents are ignored, is a quick way to display
+ version information for the various programs used to compile a typical

+ Treelang source file.
+
+ The @code{tree1} program represents most of what is unique to GNU
+ Treelang; @code{tree1} is a combination of two rather large chunks of
+ code.
+
+ @cindex GCC Back End (GBE)
+ @cindex GBE
+ @cindex @code{GCC}, back end
+ @cindex back end, GCC
+ @cindex code generator
+ One chunk is the so-called @dfn{GNU Back End}, or GBE,
+ which knows how to generate fast code for a wide variety of
processors.
+ The same GBE is used by the C, C++, and Treelang compiler programs
@code{cc1},
+ @code{cc1plus}, and @code{tree1}, plus others.
+ Often the GBE is referred to as the ``GCC back end'' or
+ even just ``GCC''---in this manual, the term GBE is used
+ whenever the distinction is important.
+
+ @cindex GNU Treelang Front End (TFE)
+ @cindex tree1
+ @cindex @code{treelang}, front end
+ @cindex front end, @code{treelang}
+ The other chunk of @code{tree1} is the majority of what is unique
about
+ GNU Treelang---the code that knows how to interpret Treelang programs
to
+ determine what they are intending to do, and then communicate that
+ knowledge to the GBE for actual compilation of those programs.  This
+ chunk is called the @dfn{Treelang Front End} (TFE).  The @code{cc1}
and
+ @code{cc1plus} programs have their own front ends, for the C and C++
+ languages, respectively.  These fronts ends are responsible for
+ diagnosing incorrect usage of their respective languages by the
programs
+ the process, and are responsible for most of the warnings about
+ questionable constructs as well.  (The GBE in principle handles
+ producing some warnings, like those concerning possible references to
+ undefined variables, but these warnings should not occur in treelang
+ programs as the front end is meant to pick them up first).
+
+ Because so much is shared among the compilers for various languages,
+ much of the behavior and many of the user-selectable options for these

+ compilers are similar.
+ For example, diagnostics (error messages and
+ warnings) are similar in appearance; command-line
+ options like @samp{-Wall} have generally similar effects; and the
quality
+ of generated code (in terms of speed and size) is roughly similar
+ (since that work is done by the shared GBE).
+
+ @node TREELANG and GCC, Compiler, Compiler Overview, Top
+ @chapter Compile Treelang, C, or Other Programs
+ @cindex compiling programs
+ @cindex programs, compiling
+
+ @cindex @code{gcc}, command
+ @cindex commands, @code{gcc}
+ A GNU Treelang installation includes a modified version of the
@code{gcc}
+ command.
+
+ In a non-Treelang installation, @code{gcc} recognizes C, C++,
+ and Objective-C source files.
+
+ In a GNU Treelang installation, @code{gcc} also recognizes Treelang
source
+ files and accepts Treelang-specific command-line options, plus some
+ command-line options that are designed to cater to Treelang users
+ but apply to other languages as well.
+
+ @xref{G++ and GCC,,Compile C; C++; or Objective-C,GCC,Using and
Porting GCC},
+ for information on the way different languages are handled
+ by the GCC compiler (@code{gcc}).
+
+ You can use this, combined with the output of the @samp{GCC -v x.tree}

+ command to get the options applicable to treelang. Treelang programs
+ must end with the suffix @samp{.tree}.
+
+ @cindex preprocessor
+
+ Treelang programs are not by default run through the C
+ preprocessor by @code{gcc}. There is no reason why they cannot be run
through the
+ preprocessor manually, but you would need to prevent the preprocessor
+ from generating #line directives, using the @samp{-P} option,
otherwise
+ tree1 will not accept the input.
+
+ @node Compiler, Other Languages, TREELANG and GCC, Top
+ @chapter The GNU Treelang Compiler
+
+ The GNU Treelang compiler, @code{treelang}, supports programs written
+ in the GNU Treelang language.
+
+ @node Other Languages, treelang internals, Compiler, Top
+ @chapter Other Languages
+
+ @menu
+ * Interoperating with C and C++::
+ @end menu
+
+ @node Interoperating with C and C++,  , Other Languages, Other
Languages
+ @section Tools and advice for interoperating with C and C++
+
+ The output of treelang programs looks like C program code to the
linker
+ and everybody else, so you should be able to freely mix treelang and C

+ (and C++) code, with one proviso.
+
+ C promotes small integer types to 'int' when used as function
parameters and
+ return values. The treelang compiler does not do this, so if you want
to interface
+ to C, you need to specify the promoted value, not the nominal value.
+
+ @ifset INTERNALS
+ @node treelang internals, Open Questions, Other Languages, Top
+ @chapter treelang internals
+
+ @menu
+ * treelang files::
+ * treelang compiler interfaces::
+ * Hints and tips::
+ @end menu
+
+ @node treelang files, treelang compiler interfaces, treelang
internals, treelang internals
+ @section treelang files
+
+ To create a compiler that integrates into GCC, you need create many
+ files. Some of the files are integrated into the main GCC makefile, to

+ build the various parts of the compiler and to run the test
+ suite. Others are incorporated into various GCC programs such as
+ GCC.c. Finally you must provide the actual programs comprising your
+ compiler.
+
+ @cindex files
+
+ The files are:
+
+ @enumerate 1
+
+ @item
+ COPYING. This is the copyright file, assuming you are going to use the

+ GNU General Public Licence. You probably need to use the GPL because
if
+ you use the GCC back end your program and the back end are one
program,
+ and the back end is GPLed.
+
+ This need not be present if the language is incorporated into the main

+ GCC tree, as the main GCC directory has this file.
+
+ @item
+ COPYING.LIB. This is the copyright file for those parts of your
program
+ that are not to be covered by the GPL, but are instead to be covered
by
+ the LGPL (Library or Lesser GPL). This licence may be appropriate for
+ the library routines associated with your compiler. These are the
+ routines that are linked with the @emph{output} of the compiler. Using

+ the LGPL for these programs allows programs written using your
compiler
+ to be closed source. For example LIBC is under the LGPL.
+
+ This need not be present if the language is incorporated into the main

+ GCC tree, as the main GCC directory has this file.
+
+ @item
+ ChangeLog. Record all the changes to your compiler. Use the same
format
+ as used in treelang as it is supported by an emacs editing mode and is

+ part of the FSF coding standard. Normally each directory has its own
+ changelog. The FSF standard allows but does not require a meaningful
+ comment on why the changes were made, above and beyond @emph{why} they

+ were made. In the author's opinion it is useful to provide this
+ information.
+
+ @item
+ treelang.texi. The manual, written in texinfo. Your manual would have
a
+ different file name. You need not write it in texinfo if you don't
want
+ do, but a lot of GNU software does use texinfo.
+
+ @cindex Make-lang.in
+ @item
+ Make-lang.in. This file is part of the make file which in incorporated

+ with the GCC make file skeleton (Makefile.in in the GCC directory) to
+ make Makefile, as part of the configuration process.
+
+ Makefile in turn is the main instruction to actually build
+ everything. The build instructions are held in the main GCC manual and

+ web site so they are not repeated here.
+
+ There are some comments at the top which will help you understand what

+ you need to do.
+
+ There are make commands to build things, remove generated files with
+ various degrees of thoroughness, count the lines of code (so you know
+ how much progress you are making), build info and html files from the
+ texinfo source, run the tests etc.
+
+ @item
+ README. Just a brief informative text file saying what is in this
+ directory.
+
+ @cindex config-lang.in
+ @item
+ config-lang.in. This file is read by the configuration progress and
must
+ be present. You specify the name of your language, the name(s) of the
+ compiler(s) incouding preprocessors you are going to build, whether
any,
+ usually generated, files should be excluded from diffs (ie when making

+ diff files to send in patches). Whether the equate 'stagestuff' is
used
+ is unknown (???).
+
+ @cindex lang-options
+ @item
+ lang-options. This file is included into GCC.c, the main GCC driver,
and
+ tells it what options your language supports. This is only used to
+ display help (is this true ???).
+
+ @cindex lang-specs
+ @item
+ lang-specs. This file is also included in GCC.c. It tells GCC.c when
to
+ call your programs and what options to send them. The mini-language
+ 'specs' is documented in the source of GCC.c. Do not attempt to write
a
+ specs file from scratch - use an existing one as the base and enhance
+ it.
+
+ @item
+ Your texi files. Texinfo can be used to build documentation in HTML,
+ info, dvi and postscript formats. It is a tagged language, is
documented
+ in its own manual, and has its own emacs mode.
+
+ @item
+ Your programs. The relationships between all the programs are
explained
+ in the next section. You need to write or use the following programs:
+
+ @itemize @bullet
+
+ @item
+ lexer. This breaks the input into words and passes these to the
+ parser. This is lex.l in treelang, which is passed through flex, a lex

+ variant, to produce C code lex.c. Note there is a school of thought
that
+ says real men hand code their own lexers, however you may prefer to
+ write far less code and use flex, as was done with treelang.
+
+ @item
+ parser. This breaks the program into recognizable constructs such as
+ expressions, statements etc. This is parse.y in treelang, which is
+ passed through bison, which is a yacc variant, to produce C code
parse.c.
+
+ @item
+ back end interface. This interfaces to the code generation back end.
In
+ treelang, this is tree1.c which mainly interfaces to toplev.c and
+ treetree.c which mainly interfaces to everything else. Many languages
+ mix up the back end interface with the parser, as in the C compiler
for
+ example. It is a matter of taste which way to do it, but with treelang

+ it is separated out to make the back end interface cleaner and easier
to
+ understand.
+
+ @item
+ header files. For function prototypes and common data items. One point

+ to note here is that bison can generate a header files with all the
+ numbers is has assigned to the keywords and symbols, and you can
include
+ the same header in your lexer. This technique is demonstrated in
+ treelang.
+
+ @item
+ compiler main file. GCC comes with a program toplev.c which is a
+ perfectly serviceable main program for your compiler. treelang uses
+ toplev.c but other languages have been known to replace it with their
+ own main program. Again this is a matter of taste and how much code
you
+ want to write.
+
+ @end itemize
+
+ @end enumerate
+
+ @node treelang compiler interfaces, Hints and tips, treelang files,
treelang internals
+ @section treelang compiler interfaces
+
+ @cindex driver
+ @cindex toplev.c
+
+ @menu
+ * treelang driver::
+ * treelang main compiler::
+ @end menu
+
+ @node treelang driver, treelang main compiler, treelang compiler
interfaces, treelang compiler interfaces
+ @subsection treelang driver
+
+ The GCC compiler consists of a driver, which then executes the various

+ compiler phases based on the instructions in the specs files.
+
+ Typically a program's language will be identified from its suffix (eg
+ .tree) for treelang programs.
+
+ The driver (gcc.c) will then drive (exec) in turn a preprocessor, the
main
+ compiler, the assembler and the link editor. Options to GCC allow you
to
+ override all of this. In the case of treelang programs there is no
+ preprocessor, and mostly these days the C preprocessor is run within
the
+ main C compiler rather than as a separate process, apparently for
reasons of speed.
+
+ You will be using the standard assembler and linkage editor so these
are
+ ignored from now on.
+
+ You have to write your own preprocessor if you want one. This is
usually
+ totally language specific. The main point to be aware of is to ensure
+ that you find some way to pass file name and line number information
+ through to the main compiler so that it can tell the back end this
+ information and so the debugger can find the right source line for
each
+ piece of code. That is all there is to say about the preprocessor
except
+ that the preprocessor will probably not be the slowest part of the
+ compiler and will probably not use the most memory so don't waste too
+ much time tuning it until you know you need to do so.
+
+ @node treelang main compiler,  , treelang driver, treelang compiler
interfaces
+ @subsection treelang main compiler
+
+ The main compiler for treelang consists of toplev.c from the main GCC
+ compiler, the parser, lexer and back end interface routines, and the
+ back end routines themselves, of which there are many.
+
+ toplev.c does a lot of work for you and you should almost certainly
use it,
+
+ Writing this code is the hard part of creating a compiler using GCC.
The
+ back end interface documentation is incomplete and the interface is
+ complex.
+
+ There are three main aspects to interfacing to the other GCC code.
+
+ @menu
+ * Interfacing to toplev.c::
+ * Interfacing to the garbage collection::
+ * Interfacing to the code generation code. ::
+ @end menu
+
+ @node Interfacing to toplev.c, Interfacing to the garbage collection,
treelang main compiler, treelang main compiler
+ @subsubsection Interfacing to toplev.c
+
+ In treelang this is handled mainly in tree1.c
+ and partly in treetree.c. Peruse toplev.c for details of what you need

+ to do.
+
+ @node Interfacing to the garbage collection, Interfacing to the code
generation code. , Interfacing to toplev.c, treelang main compiler
+ @subsubsection Interfacing to the garbage collection
+
+ Interfacing to the garbage collection. In treelang this is mainly in
+ tree1.c.
+
+ Memory allocation in the compiler should be done using the ggc_alloc
and
+ kindred routines in ggc*.*. At the end of every 'function' in your
language, toplev.c calls
+ the garbage collection several times. The garbage collection calls
mark
+ routines which go through the memory which is still used, telling the
+ garbage collection not to free it. Then all the memory not used is
+ freed.
+
+ What this means is that you need a way to hook into this marking
+ process. This is done by calling ggc_add_root. This provides the
address
+ of a callback routine which will be called duing garbage collection
and
+ which can call ggc_mark to save the storage. If storage is only
+ used within the parsing of a function, you do not need to provide a
way
+ to mark it.
+
+ Note that you can also call ggc_mark_tree to mark any of the back end
+ internal 'tree' nodes. This routine will follow the branches of the
+ trees and mark all the subordinate structures. This is useful for
+ example when you have created a variable declaaration that will be
used
+ across multiple functions, or for a function declaration (from a
+ prototype) that may be used later on. See the next item for more on
the
+ tree nodes.
+
+ @node Interfacing to the code generation code. ,  , Interfacing to the
garbage collection, treelang main compiler
+ @subsubsection Interfacing to the code generation code.
+
+ In treelang this is done in treetree.c. A typedef called 'tree' which
is
+ defined in tree.h and tree.def in the GCC directory and largely
+ implemented in tree.c and stmt.c forms the basic interface to the
+ compiler back end.
+
+ In general you call various tree routines to generate code, either
+ directly or through toplev.c. You build up data structures and
+ expressions in similar ways.
+
+ You can read some documentation on this which can be found via the GCC

+ main web page. In particular, the documentation produced by Joachim
+ Nadler and translated by Tim Josling can be quite useful. the C
compiler
+ also has documentation in the main GCC manual (particularly the
current
+ CVS version) which is useful on a lot of the details.
+
+ In time it is hoped to enhance this document to provide a more
+ comprehensive overview of this topic. The main gap is in explaining
how
+ it all works together.
+
+ @node Hints and tips,  , treelang compiler interfaces, treelang
internals
+ @section Hints and tips
+
+ @itemize @bullet
+
+ @item
+ TAGS: Use the make ETAGS commands to create TAGS files which can be
used in
+ emacs to jump to any symbol quickly.
+
+ @item




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]