This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[tree-ssa][rfc] plans for nested functions
- From: Richard Henderson <rth at redhat dot com>
- To: gcc at gcc dot gnu dot org
- Date: Fri, 26 Sep 2003 14:36:51 -0700
- Subject: [tree-ssa][rfc] plans for nested functions
This is stuff that was running around my head this morning.
I mostly wanted to write this down so that I don't forget...
The problem I'm trying to solve is that, at present, we have
ordering issues wrt compilation of nested functions.
(1) We depend heavily on rtl-level state in order to compile
nested functions.
A couple of weeks ago I tried to un-nest compilation of nested
functions with cgraph. It turns out there is so much rtl state
needed that one can not compile the nested function either before
or after the parent -- one *must* compile it *during* the parent.
(2) We cannot inline the parent function, because that would leave
the nested function without a well-defined static-chain frame
to access outer variables.
The solution would be to generate a "virtual frame" directly early
in the translation of the parent/nested function pair. Consider:
int f1(int p1)
{
int a1 = 0;
auto int f2(void)
{
int a2 = 0;
auto int f3(void)
{
return a2++ + a1++;
}
bar(f3);
return a2 + p1;
}
bar(f2);
return a1;
}
=>
int f1(int p1)
{
struct f1_frame
{
int p1;
int a1;
} frame;
auto int f2(void)
{
struct f2_frame
{
struct f1_frame * const chain;
int a2;
} frame;
auto int f3(void)
{
struct f2_chain * const chain = <<<incomming_static_chain>>>;
return chain->a2++ + chain->chain->a1++;
}
frame.chain = <<<incomming_static_chain>>>;
frame.a2 = 0;
bar(__builtin_construct_trampoline(f3, &frame));
return frame.a2 + frame.chain->p1;
}
frame.p1 = p1;
frame.a1 = 0;
bar(__builtin_construct_trampoline(f2, &frame));
return frame.a1;
}
I envision the order of operations as follows:
(1) Nested functions are initially translated, then ignored until
they are actually referenced. (Indeed, all inline functions
ought to be treated this way.)
(2) Before gimplification, f1 undergoes a remove_useless pass.
During this pass, we notice if any references to nested
functions exist.
(3) Process all referenced nested functions:
(a) Recurse to step 2.
(b) For each decl that comes from an outer scope, enter it into
a hash-table that maps var_decl/parm_decl -> field_decl,
where the field_decl is going to be an element of the frame
structures above. This includes synthetic decls such as the
static chain itself.
(c) Replace the direct decl references with the appropriate
chain->field reference.
(4) Sort the fields for nice packing and layout the structure type.
Construct the struct as with an anonymous struct, such that
debugging turns out ok. (Not sure exactly how to handle
debugging for the variables accessed from the inner scope, but
since I'm certain that doesn't work at the moment, there's no
loss of functionality.)
(5) Replace references to decls in the parent with field accesses.
For all parm_decls (and the static chain), insert code to copy
the value from the incomming location to the structure.
(6) Replace references to pointers to nested functions with a
trampoline constructed via a builtin. This we still want to
defer to target-specific ugliness.
Now we've got something completely independent from rtl that suffers
from neither of the discussed problems.
r~