This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [LTO] patch: new CALL_EXPR abstractions in builtins.c
- From: Roger Sayle <roger at eyesopen dot com>
- To: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
- Cc: gcc-patches at gcc dot gnu dot org, <sandra at codesourcery dot com>
- Date: Wed, 28 Jun 2006 06:44:06 -0600 (MDT)
- Subject: Re: [LTO] patch: new CALL_EXPR abstractions in builtins.c
On Wed, 28 Jun 2006, Paolo Bonzini wrote:
> I agree with Sandra. Assuming she doesn't have to rely on backup plans
> (*), which would be quite a pity, the heaviness of TREE_LISTs (24 bytes)
> more than offsets the problem of having to construct a CALL_EXPR.
I do agree that its a great idea to reduce the footprint of CALL_EXPR.
I do agree with Sandra's response that we're currently already leaking
memory constructing tree lists in calls to fold_builtins, so that the
overhead of using an efficient CALL_EXPR is only a minor increase.
My main point is that we already need to reduce this "leakage" on
folding both mainline and not let it get any worse on the LTO branch.
I suspect that people fail to appreciate how much the new
tree-ssa passes make use of the middle-end's folding and tree
simplification routines.
Consider the simple test case below:
double sin(double);
double foo()
{
double x = 0.5;
return sin(x);
}
Rhetorical question: How many times would you expect that
fold_builtin_sin gets called during the compilation of the
above code at -O2? Notice that there's only a single basic
block, so we shouldn't have to worry about DOM-like passes,
or CSE-like path following. Notice the built-in function only
has a single argument so the number of times we propagate to it
is low. And remembering that we currently don't have a tree
combiner, and don't even fold coming out of SSA, so RTL expansion
still gets handled non-folded/non-canonical trees.
The answer, with current mainline, is that fold_builtin_sin
gets called no less than eight times!
Hence your analysis of break-even points has to factor the
memory we save on the single CALL_EXPR, which is all thats
need to represent the above call to "sin" in SSA form,
vs. the overhead we'll leak from the eight calls to fold!
As I mentioned above, I'd also expect that as more passes
are added, such as a tree-combiner, or canonicalization as
we come out of SSA, this number will go up. I also suspect
that in real code containing many basic-blocks, and builtins
with multiple arguments, the ratio of calls to fold per
CALL_EXPR (on current mainline) is currently greater than
eight.
Just like symbol table lookup, we need to balance the cost
of construction (rare), with the cost of lookup (frequent).
Now I appreciate the not every call to fold will/must leak
memory using Sandra's scheme or even on mainline, but I'd hope
the above statistic is surprising enough to most people, to
induce a more accurate sense of respect about what's being
traded-off. You'll also understand why many middle-end folks
strongly believe that trees should be folded at construction
time, and when modified, rather than repeatedly calling fold
on the same expression.
Please, nobody misinterpret the above analysis: calling "fold"
is a good thing, which is why we do it so often. It's because
its so effective that it needs to be efficient.
Roger
--