This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Partial inlining


> The above numbers are all with the max-inline-insns-auto reduction?
> I'd like to have that adjustment split out as a separate followup patch.

Yes, without the change we generally tend to do more inlining (as we produce
more inline candidates).  Not exactly in all cases: when we split at point
that the call to split function is cold, we reduce size even for unchanges
max-inlie-insns-auto.

I've removed this change from patch and will commit it sequentially.
> 
> The partial-inlining-entry-probability default looks quite high.  I would
> have expected that guessed probabilities of say
> 
>   if (flag)
>     return;
> 
> make the entry taken 50% of the time?  So, did you experiment with
> other values than 70 or is that something that is suggested by literature?

Well, not really.  It depends on what is inside if conditional.
If it is if (flag < 0), lets say, we will hit non-negative heuristics
and will predict it with 79%.

The value of 70 seemed high enough to take care of most of random fuzz from the
heruistics, but it is something I plan to experiment with in the future.
In general the predictors on tests of this type are quite ineffective.
> 
> You run ipa-split late during early optimizations.  Will the new clone
> you create go through the early opts again?  Thus, is the placement

New clone is not considered by early optimizers. It is added as new
function just afterwards.  We might want to change this in future, but
that is how cgraph_add_new_function works for now (it queues the function
to be inserted once IPA pass is done and early optimizations are one
IPA pass).

> at the very end of early opts critical?  If so please add a comment

Well, it will work elsewhere too, just you miss optimizations on the split part
of function then.  In general I see little value in moving it earlier, but I've
added coment.

> before the NEXT_PASS.  I suppose we don't re-run early opts on the
> clone as otherwise we could end up splitting it again (and so for
> a very large function with cascaded if (x) cheap; if (x) cheap; ....
> we will take a very long time to compile because we split of
> each if (x) cheap; one after the other?  Please add a testcase
> like this.)

Splitting is considered just once.  But while working on pass, just
for fun I tried to split at every possible split point and quite surprisingly
it did not introduce that much of compile time penalty for such a testcase.
(it sped up insn-attrtab compilation that is pretty much like this)

So multiple split points is something I plan to add, just want to do that
incrementally so the benefits can be more easilly tracked.

What should the testcase test?
> 
> +  for (parm = DECL_ARGUMENTS (current_function_decl); parm;
> +       parm = TREE_CHAIN (parm))
> +    {
> +      if (!is_gimple_reg (parm))
> +       {
> +         if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
> +           {
> +             if (dump_file)
> +               fprintf (dump_file,
> +                        "  Refused: need to pass non-ssa param 
> values\n");
> 
> I'm confused by the non_ssa_vars thing.  This is just tracking
> whether a non-SSA parameter is being used?

Yes, to handle non-ssa arguments, we will need to pass them by reference
that means that we will need to change function prototype in less trivial
way than by removing its arguments.

arg adjustments used in ipa-sra can do this, all we need is to hook them into
clonning machinery that is something I plan to do relatively soon too (it will
imply support in virutal clones that needs way to stream them into cgraphopt
section for WHOPR so it is not completely trivial, but the adjustment code was
meant for this from beggining).

There are two places we check them.  First is for SRA arguments, the other
place is checking that if non-ssa variable is used, it is either used only by
header or by split part, but not by both.

(for same reason I also do not support yet sharing values computed by header
and used by split part that is something I can do by passing them as argument
or recomputing them in split part if they are cheap.  I think especially
for casts in C++ we will want to do the recomputation in trivial cases)
> 
> +  if (current->split_size <= call_overhead)
> +    {
> +      if (dump_file)
> +       fprintf (dump_file,
> +                "  Refused: split size is smaller than call overhead\n");
> +      return;
> +    }
> 
> also refuse to split if the size of the header is bigger than that
> of the tail?  Otherwise we're just going to inline it back, no?

Not necesarily - especially when the split point is cold and thus we end
up inlining for size.

This is something I want to experiment with too.  At the moment we produce
more splitting than needed (i.e. we often inline things back) that has
negative effect on compile time and debug info. In practice I didn't found
this bad.

Keeping the function splitting spearate from inliner heuristics, we can also
play safe and split only when the function becomes inlinable after splitting
or the split part is cold and its size
> +               if (is_gimple_debug (gsi_stmt (bsi)))
> +                 continue;
> 
> I wonder if debug gets confused if you reference vars that are not 
> there...

tree-inline will remove the debug statement (and thus we lose some debug info
when split part is inlined back), but in general it works.
> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if 
> none
> +   found.  */
> 
> What about BUILT_IN_RETURN?  Or returns via EH or abnormally returning
> fns like longjmp?

EH or abnormal jumps are not problem.  Functions returning both via
normal return and BUILT_IN_RETURN (or functions having two return statement
one with and one without return value) are currently outlawed.

We can support multiple return BBs if needed, I added TODO for that reason.
> +  /* At present we can't pass non-SSA arguments to split function.
> +     FIXME: this can be relaxed by passing references to arguments.  */
> +  if (TREE_CODE (t) == PARM_DECL)
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, "Can not split use of non-ssa function 
> parameter.\n");
> +      return true;
> +    }
> 
> I think this will do excessive dumping.  At least make it
> dump_flags & TDF_DETAILS.

Done.
> +  if ((!node->callers || !node->callers->next_caller)
> +      && !node->address_taken
> +      && ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, "Not splitting: not called directly or called 
> once.\n");
> +      return 0;
> +    }
> 
> Why the check for node->address_taken?  Why the check
> for external visibility?

We need to decide if function can possibly end up calling more than once when it reach
inliner.  When it has address taken, we can devirtualize the call.
When doing LTO, it might be called directly from other unit.
> 
> Ok with the above changes (please go over all dumping and make sure
> that without -details we only say that we are splitting and where
> and dump the split function).

You seem to be consistently on less dumping by default than me.  I find dumping
of reasons why split points are not accepted quite useful, but I guess using
-details is not problem.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]