This is the mail archive of the
mailing list for the GCC project.
Re: needless deep recursion in gt-c-decl.h
- From: Mike Stump <mrs at apple dot com>
- To: Geoff Keating <geoffk at redhat dot com>
- Cc: per at bothner dot com, gcc at gcc dot gnu dot org
- Date: Thu, 1 Aug 2002 14:00:23 -0700
- Subject: Re: needless deep recursion in gt-c-decl.h
On Thursday, August 1, 2002, at 12:32 PM, Geoff Keating wrote:
I have two ideas. The first one is a piece of code that resets the
stack limit to the maximum, should the driver find that the default is
still in effect:
I now have some numbers!
I looked at expr.o, which needs (on the branch) a recursion depth of
24024, the third-highest (and not typical, most files are < 10000);
that corresponds to perhaps 1M of stack.
I tried both algorithms on the tree marker routine for C. They both
had about the same effect, reducing the recursion depth to about
16000, with no significant change in CPU time. (This is with
gcac checking, so any effect should be greatly magnified.)
The 16000 turns out to be mostly RTL marking. So, I applied the
one-loop method to the RTL marker routine, and it reduced the maximum
stack depth to about 13000. Better, but not really acceptable, since
this still corresponds to perhaps 600k of stack, more than the limit
I then generated a coredump when the stack depth was 13000, and looked
at it in gdb. I was very pleased to find that the gdb folks have
improved the performance (since gdb 4.x) when looking at deep stacks,
it now handles this level of stack depth with no problems at all.
I'd manually applied the one-loop method to the NEXT_INSN field of
insns, and otherwise to fld if it was a RTX. The stack consisted,
by my unscientific survey, almost entirely of RTX marking, and most of
it (more than half but not 90%) was recursing on PREV_INSN fields.
So, I looked at a two-loop method, looking at fld and fld. This
worked better yet, reducing the maximum stack depth to under 10000,
but that's still ~400k of stack.
Nearly all of the remainder is still recursion on PREV_INSN fields.
This happens because the DECL_RTL of a LABEL_REF will refer to a
CODE_LABEL in the middle of the insn chain, and it happens that this
is the first place the insn chain is encountered when marking. The
label involved is "plus_expr:" in expand_expr, line 7632.
expand_expr starts at line 6183, so there's a long way to go back.
My current best idea is to rearrange the order that
globals are marked so that the insn chain gets seen before the trees,
if this is possible, but I'm not at all sure that this will work
Plan B is to implement a stack using malloc, only on stack-constrained
systems because I expect this will be slower on Linux (it was when I
Any other ideas?
On the bright side, my "typical" file, flow.o, is reduced from ~6000
maximum depth to ~500 maximum depth. If I didn't have to worry about
making stack usage small for every case, the two-loop method would be
quite good enough.
/* APPLE LOCAL setrlimit */
/* Get rid of any avoidable limit on stack size. */
struct rlimit rlim;
/* Set the stack limit huge. (Compiles normally work within
a megabyte of stack, but the normal limit on OSX is 512K for
some reason.) */
getrlimit (RLIMIT_STACK, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit (RLIMIT_STACK, &rlim);
#endif /* RLIMIT_STACK defined */
and the other is the idea that we can count how long the insn chain is
(PREV_INSN and NEXT_INSN as long as the nodes are unmarked), allocate
the many slots, move them in, mark just those nodes all iteratively,
unmark the first, then really mark it, loop. This way, all the long
insn chains will be in the vec, very storage efficient.
I don't know if this is a good idea with all the extra memory bandwidth
it causes. :-( Maybe the saving in recursion will make up for the
extra memory hits.