Why Two Passes - Using and Porting GNU Fortran

Previous: Two-pass Code, Up: Two-pass Design

21.4.2 Why Two Passes

The need for two passes was not immediately evident during the design and implementation of the code in the FFE that was to produce GBEL. Only after a few kludges, to handle things like incorrectly-guessed ASSIGN label nature, had been implemented, did enough evidence pile up to make it clear that std.c had to be introduced to intercept, save, then revisit as part of a second pass, the digested contents of a program unit.

Other such missteps have occurred during the evolution of the FFE, because of the different goals of the FFE and the GBE.

Because the GBE's original, and still primary, goal was to directly support the GNU C language, the GBEL, and the GBE itself, requires more complexity on the part of most front ends than it requires of gcc's.

For example, the GBEL offers an interface that permits the gcc front end to implement most, or all, of the language features it supports, without the front end having to make use of non-user-defined variables. (It's almost certainly the case that all of K&R C, and probably ANSI C as well, is handled by the gcc front end without declaring such variables.)

The FFE, on the other hand, must resort to a variety of “tricks” to achieve its goals.

Consider the following C code:

     int
     foo (int a, int b)
     {
       int c = 0;
     
       if ((c = bar (c)) == 0)
         goto done;
     
       quux (c << 1);
     
     done:
       return c;
     }

Note what kinds of objects are declared, or defined, before their use, and before any actual code generation involving them would normally take place:

Return type of function
Entry point(s) of function
Dummy arguments
Variables
Initial values for variables

Whereas, the following items can, and do, suddenly appear “out of the blue” in C:

Label references
Function references

Not surprisingly, the GBE faithfully permits the latter set of items to be “discovered” partway through GBEL “programs”, just as they are permitted to in C.

Yet, the GBE has tended, at least in the past, to be reticent to fully support similar “late” discovery of items in the former set.

This makes Fortran a poor fit for the “safe” subset of GBEL. Consider:

           FUNCTION X (A, ARRAY, ID1)
           CHARACTER*(*) A
           DOUBLE PRECISION X, Y, Z, TMP, EE, PI
           REAL ARRAY(ID1*ID2)
           COMMON ID2
           EXTERNAL FRED
     
           ASSIGN 100 TO J
           CALL FOO (I)
           IF (I .EQ. 0) PRINT *, A(0)
           GOTO 200
     
           ENTRY Y (Z)
           ASSIGN 101 TO J
     200   PRINT *, A(1)
           READ *, TMP
           GOTO J
     100   X = TMP * EE
           RETURN
     101   Y = TMP * PI
           CALL FRED
           DATA EE, PI /2.71D0, 3.14D0/
           END

Here are some observations about the above code, which, while somewhat contrived, conforms to the FORTRAN 77 and Fortran 90 standards:

The return type of function X is not known until the DOUBLE PRECISION line has been parsed.
Whether A is a function or a variable is not known until the PRINT *, A(0) statement has been parsed.
The bounds of the array of argument ARRAY depend on a computation involving the subsequent argument ID1 and the blank-common member ID2.
Whether Y and Z are local variables, additional function entry points, or dummy arguments to additional entry points is not known until the ENTRY statement is parsed.
Similarly, whether TMP is a local variable is not known until the READ *, TMP statement is parsed.
The initial values for EE and PI are not known until after the DATA statement is parsed.
Whether FRED is a function returning type REAL or a subroutine (which can be thought of as returning type void or, to support alternate returns in a simple way, type int) is not known until the CALL FRED statement is parsed.
Whether 100 is a FORMAT label or the label of an executable statement is not known until the X = statement is parsed. (These two types of labels get very different treatment, especially when ASSIGN'ed.)
That J is a local variable is not known until the first ASSIGN statement is parsed. (This happens after executable code has been seen.)

Very few of these “discoveries” can be accommodated by the GBE as it has evolved over the years. The GBEL doesn't support several of them, and those it might appear to support don't always work properly, especially in combination with other GBEL and GBE features, as implemented in the GBE.

(Had the GBE and its GBEL originally evolved to support g77, the shoe would be on the other foot, so to speak—most, if not all, of the above would be directly supported by the GBEL, and a few C constructs would probably not, as they are in reality, be supported. Both this mythical, and today's real, GBE caters to its GBEL by, sometimes, scrambling around, cleaning up after itself—after discovering that assumptions it made earlier during code generation are incorrect. That's not a great design, since it indicates significant code paths that might be rarely tested but used in some key production environments.)

So, the FFE handles these discrepancies—between the order in which it discovers facts about the code it is compiling, and the order in which the GBEL and GBE support such discoveries—by performing what amounts to two passes over each program unit.

(A few ambiguities can remain at that point, such as whether, given EXTERNAL BAZ and no other reference to BAZ in the program unit, it is a subroutine, a function, or a block-data—which, in C-speak, governs its declared return type. Fortunately, these distinctions are easily finessed for the procedure, library, and object-file interfaces supported by g77.)