Partial inlining

H.J. Lu hjl.tools@gmail.com
Mon Jun 28 07:21:00 GMT 2010


On Fri, Jun 25, 2010 at 11:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 1:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> The above numbers are all with the max-inline-insns-auto reduction?
>>> I'd like to have that adjustment split out as a separate followup patch.
>>
>> Yes, without the change we generally tend to do more inlining (as we produce
>> more inline candidates).  Not exactly in all cases: when we split at point
>> that the call to split function is cold, we reduce size even for unchanges
>> max-inlie-insns-auto.
>>
>> I've removed this change from patch and will commit it sequentially.
>>>
>>> The partial-inlining-entry-probability default looks quite high.  I would
>>> have expected that guessed probabilities of say
>>>
>>>   if (flag)
>>>     return;
>>>
>>> make the entry taken 50% of the time?  So, did you experiment with
>>> other values than 70 or is that something that is suggested by literature?
>>
>> Well, not really.  It depends on what is inside if conditional.
>> If it is if (flag < 0), lets say, we will hit non-negative heuristics
>> and will predict it with 79%.
>>
>> The value of 70 seemed high enough to take care of most of random fuzz from the
>> heruistics, but it is something I plan to experiment with in the future.
>> In general the predictors on tests of this type are quite ineffective.
>>>
>>> You run ipa-split late during early optimizations.  Will the new clone
>>> you create go through the early opts again?  Thus, is the placement
>>
>> New clone is not considered by early optimizers. It is added as new
>> function just afterwards.  We might want to change this in future, but
>> that is how cgraph_add_new_function works for now (it queues the function
>> to be inserted once IPA pass is done and early optimizations are one
>> IPA pass).
>>
>>> at the very end of early opts critical?  If so please add a comment
>>
>> Well, it will work elsewhere too, just you miss optimizations on the split part
>> of function then.  In general I see little value in moving it earlier, but I've
>> added coment.
>>
>>> before the NEXT_PASS.  I suppose we don't re-run early opts on the
>>> clone as otherwise we could end up splitting it again (and so for
>>> a very large function with cascaded if (x) cheap; if (x) cheap; ....
>>> we will take a very long time to compile because we split of
>>> each if (x) cheap; one after the other?  Please add a testcase
>>> like this.)
>>
>> Splitting is considered just once.  But while working on pass, just
>> for fun I tried to split at every possible split point and quite surprisingly
>> it did not introduce that much of compile time penalty for such a testcase.
>> (it sped up insn-attrtab compilation that is pretty much like this)
>>
>> So multiple split points is something I plan to add, just want to do that
>> incrementally so the benefits can be more easilly tracked.
>>
>> What should the testcase test?
>>>
>>> +  for (parm = DECL_ARGUMENTS (current_function_decl); parm;
>>> +       parm = TREE_CHAIN (parm))
>>> +    {
>>> +      if (!is_gimple_reg (parm))
>>> +       {
>>> +         if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
>>> +           {
>>> +             if (dump_file)
>>> +               fprintf (dump_file,
>>> +                        "  Refused: need to pass non-ssa param
>>> values\n");
>>>
>>> I'm confused by the non_ssa_vars thing.  This is just tracking
>>> whether a non-SSA parameter is being used?
>>
>> Yes, to handle non-ssa arguments, we will need to pass them by reference
>> that means that we will need to change function prototype in less trivial
>> way than by removing its arguments.
>>
>> arg adjustments used in ipa-sra can do this, all we need is to hook them into
>> clonning machinery that is something I plan to do relatively soon too (it will
>> imply support in virutal clones that needs way to stream them into cgraphopt
>> section for WHOPR so it is not completely trivial, but the adjustment code was
>> meant for this from beggining).
>>
>> There are two places we check them.  First is for SRA arguments, the other
>> place is checking that if non-ssa variable is used, it is either used only by
>> header or by split part, but not by both.
>>
>> (for same reason I also do not support yet sharing values computed by header
>> and used by split part that is something I can do by passing them as argument
>> or recomputing them in split part if they are cheap.  I think especially
>> for casts in C++ we will want to do the recomputation in trivial cases)
>>>
>>> +  if (current->split_size <= call_overhead)
>>> +    {
>>> +      if (dump_file)
>>> +       fprintf (dump_file,
>>> +                "  Refused: split size is smaller than call overhead\n");
>>> +      return;
>>> +    }
>>>
>>> also refuse to split if the size of the header is bigger than that
>>> of the tail?  Otherwise we're just going to inline it back, no?
>>
>> Not necesarily - especially when the split point is cold and thus we end
>> up inlining for size.
>>
>> This is something I want to experiment with too.  At the moment we produce
>> more splitting than needed (i.e. we often inline things back) that has
>> negative effect on compile time and debug info. In practice I didn't found
>> this bad.
>>
>> Keeping the function splitting spearate from inliner heuristics, we can also
>> play safe and split only when the function becomes inlinable after splitting
>> or the split part is cold and its size
>>> +               if (is_gimple_debug (gsi_stmt (bsi)))
>>> +                 continue;
>>>
>>> I wonder if debug gets confused if you reference vars that are not
>>> there...
>>
>> tree-inline will remove the debug statement (and thus we lose some debug info
>> when split part is inlined back), but in general it works.
>>> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if
>>> none
>>> +   found.  */
>>>
>>> What about BUILT_IN_RETURN?  Or returns via EH or abnormally returning
>>> fns like longjmp?
>>
>> EH or abnormal jumps are not problem.  Functions returning both via
>> normal return and BUILT_IN_RETURN (or functions having two return statement
>> one with and one without return value) are currently outlawed.
>>
>> We can support multiple return BBs if needed, I added TODO for that reason.
>>> +  /* At present we can't pass non-SSA arguments to split function.
>>> +     FIXME: this can be relaxed by passing references to arguments.  */
>>> +  if (TREE_CODE (t) == PARM_DECL)
>>> +    {
>>> +      if (dump_file)
>>> +       fprintf (dump_file, "Can not split use of non-ssa function
>>> parameter.\n");
>>> +      return true;
>>> +    }
>>>
>>> I think this will do excessive dumping.  At least make it
>>> dump_flags & TDF_DETAILS.
>>
>> Done.
>>> +  if ((!node->callers || !node->callers->next_caller)
>>> +      && !node->address_taken
>>> +      && ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
>>> +    {
>>> +      if (dump_file)
>>> +       fprintf (dump_file, "Not splitting: not called directly or called
>>> once.\n");
>>> +      return 0;
>>> +    }
>>>
>>> Why the check for node->address_taken?  Why the check
>>> for external visibility?
>>
>> We need to decide if function can possibly end up calling more than once when it reach
>> inliner.  When it has address taken, we can devirtualize the call.
>> When doing LTO, it might be called directly from other unit.
>>>
>>> Ok with the above changes (please go over all dumping and make sure
>>> that without -details we only say that we are splitting and where
>>> and dump the split function).
>>
>> You seem to be consistently on less dumping by default than me.  I find dumping
>> of reasons why split points are not accepted quite useful, but I guess using
>> -details is not problem.
>>
>
> This breaks C++:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44671

Also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44687

-- 
H.J.



More information about the Gcc-patches mailing list