Quickstart Guide to Hacking Gfortran
Starting to help in developing an existing compiler can be a daunting task. This document aims to go give a new developer a foothold in the many lines of code. It is still very preliminary. Feel free to ask any of the gfortran regulars (or on the gfortran mailing list) for advice or help.
Gfortran is a front end to gcc. Its task is to parse Fortran code and to convert it to an intermediate form, which is then handed off to other parts of gcc (the so-called "middle end") which do further optimization and finally the translation to the assembly language ("back end").
This document is only about the gfortran front end, the other parts of gcc have their own documentation. It assumes you know how to build gcc, including gfortran.
What does the front end do?
The front end's action can be grouped into four phases:
- This converts source code into a stream of tokens which describe the language. Because Fortran does not have reserved keywords, the gfortran runs a series of matchers against code trying to find one that matches a statement. On failing a match an error message may be queued, and another matcher tried. If all attempts at matching fail, the error queue is dumped to the user.
- (parse.c, scanner.c and primary.c)
- This resolves things left over from the parsing phase, such as types of expressions, and compile-time simplification of constants. Many errors are issued in this phase. At the end of this phase, the abstract syntax tree is finished.
- (resolve.c and (for intrinsics) iresolve.c, expr.c, array.c, and simplify.c)
- Front-end optimization
- This does some optimization since there is some information in the Fortran language that can not easily be handled by the later stages.
- This translates the Fortran abstract syntax tree into a tree stucture suitable for the middle end.
Examining gfortran data structures
There are a few useful options to look at gfortran internals. Compiling a file with gfortran -fdump-fortran-original foo.f90 dumps the internal representation of the Fortran abstract syntax tree to standard output. The code which generates output for this option can be found in dump-parse-tree.c, which can serve as a good starting point for examining gfortran's data structures.
Using gfortran -fdump-tree-original foo.f90 will generate a file named foo.f90.004t.original which contains a C-like representation of what the compiler handed off to the middle end. Most code errors can be found from examining this file.
Some documentation on the data files can be found in the GNU Fortran Compiler Internals document.
Using a debugger on the gfortran compiler
You need to run the debugger (usually gdb) on the f951 executable. This can be found in your gcc build directory. Assuming that this is ~/gcc-bin, the executable is in ~/gcc-bin/gcc/f951.
A good starting point is to run gdb with
$ gdb ~/gcc-bin/gcc/f951 (gdb) break show_expr (gdb) run -fdump-fortran-original hello.c
and then examine the expressions there. You can find some documentation on the gfc_code and gfc_expr expressions you will encounter in the gcc-internals.texi file in the gfortran source directory.
Another interesting variable to look at is gfc_current_ns. It contains the code found under gfc_current_ns->code and symbols (i.e. variable names, functions etc.) found under gfc_current_ns->sym_root. This is a gfc_symtree pointer. Looking at the first symbol will require you to look at *(gfc_current_ns->sym_root->n.sym).
If you are looking for the source of a particular error, you can set a breakpoint in gfc_error. Be prepared for a large number of false positives, because the parser calls gfc_error frequently for constructs that it may recognize later. It may be a better idea to grep for the error message in the gfortran source files, and then set a break point there.
If you want to inspect a particular internal data structure which is pointed to by a pointer *p , a good first try is to use
(gdb) call debug(p)
on it. There are also some special funcitons like gfc_debug_code and gfc_debug_expr which you can also call. These functions can also be found in dump-parse-tree.c.
Examining tree structures
Let's say you are looking at the middle end code generated in a tree variable named stmt somewhere in trans-*.c. The best way to look at this is to try
(gdb) call debug(stmt)
which will dump the code using the same internal representation as the -fdump-tree-original option. If the tree you are looking at contains a declaration, this will return an empty line. In this case, you can use
(gdb) call debug_tree(stmt)
which will return a complete, but somewhat hard to read, representation of the declaration.
Breaking on an internal error
If you want to look at an internal error, try setting a breakpoint in fancy_abort. Stepping up from this will lead you to the gcc_unreachable () call where something went wrong.
What kind of PR to start with
All currently open bugs reports (called PRs) can be found in the gcc bugtracker called bugzilla if you set the product to fortran.
Traditionally, internal compiler errors on invalid code (gcc bugzilla keyword ice-on-invalid-code). have been considered relatively easy. But you may always find a hard one...