This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [LTO] patch: new CALL_EXPR abstractions in builtins.c


On Wed, 28 Jun 2006, Paolo Bonzini wrote:
> I agree with Sandra.  Assuming she doesn't have to rely on backup plans
> (*), which would be quite a pity, the heaviness of TREE_LISTs (24 bytes)
> more than offsets the problem of having to construct a CALL_EXPR.

I do agree that its a great idea to reduce the footprint of CALL_EXPR.
I do agree with Sandra's response that we're currently already leaking
memory constructing tree lists in calls to fold_builtins, so that the
overhead of using an efficient CALL_EXPR is only a minor increase.
My main point is that we already need to reduce this "leakage" on
folding both mainline and not let it get any worse on the LTO branch.

I suspect that people fail to appreciate how much the new
tree-ssa passes make use of the middle-end's folding and tree
simplification routines.

Consider the simple test case below:

double sin(double);
double foo()
{
  double x = 0.5;
  return sin(x);
}

Rhetorical question: How many times would you expect that
fold_builtin_sin gets called during the compilation of the
above code at -O2?  Notice that there's only a single basic
block, so we shouldn't have to worry about DOM-like passes,
or CSE-like path following.  Notice the built-in function only
has a single argument so the number of times we propagate to it
is low.  And remembering that we currently don't have a tree
combiner, and don't even fold coming out of SSA, so RTL expansion
still gets handled non-folded/non-canonical trees.

The answer, with current mainline, is that fold_builtin_sin
gets called no less than eight times!

Hence your analysis of break-even points has to factor the
memory we save on the single CALL_EXPR, which is all thats
need to represent the above call to "sin" in SSA form,
vs. the overhead we'll leak from the eight calls to fold!

As I mentioned above, I'd also expect that as more passes
are added, such as a tree-combiner, or canonicalization as
we come out of SSA, this number will go up.  I also suspect
that in real code containing many basic-blocks, and builtins
with multiple arguments, the ratio of calls to fold per
CALL_EXPR (on current mainline) is currently greater than
eight.


Just like symbol table lookup, we need to balance the cost
of construction (rare), with the cost of lookup (frequent).


Now I appreciate the not every call to fold will/must leak
memory using Sandra's scheme or even on mainline, but I'd hope
the above statistic is surprising enough to most people, to
induce a more accurate sense of respect about what's being
traded-off.  You'll also understand why many middle-end folks
strongly believe that trees should be folded at construction
time, and when modified, rather than repeatedly calling fold
on the same expression.

Please, nobody misinterpret the above analysis: calling "fold"
is a good thing, which is why we do it so often.  It's because
its so effective that it needs to be efficient.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]