Partial inlining
H.J. Lu
hjl.tools@gmail.com
Mon Jun 28 07:21:00 GMT 2010
On Fri, Jun 25, 2010 at 11:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 1:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> The above numbers are all with the max-inline-insns-auto reduction?
>>> I'd like to have that adjustment split out as a separate followup patch.
>>
>> Yes, without the change we generally tend to do more inlining (as we produce
>> more inline candidates). Not exactly in all cases: when we split at point
>> that the call to split function is cold, we reduce size even for unchanges
>> max-inlie-insns-auto.
>>
>> I've removed this change from patch and will commit it sequentially.
>>>
>>> The partial-inlining-entry-probability default looks quite high. I would
>>> have expected that guessed probabilities of say
>>>
>>> if (flag)
>>> return;
>>>
>>> make the entry taken 50% of the time? So, did you experiment with
>>> other values than 70 or is that something that is suggested by literature?
>>
>> Well, not really. It depends on what is inside if conditional.
>> If it is if (flag < 0), lets say, we will hit non-negative heuristics
>> and will predict it with 79%.
>>
>> The value of 70 seemed high enough to take care of most of random fuzz from the
>> heruistics, but it is something I plan to experiment with in the future.
>> In general the predictors on tests of this type are quite ineffective.
>>>
>>> You run ipa-split late during early optimizations. Will the new clone
>>> you create go through the early opts again? Thus, is the placement
>>
>> New clone is not considered by early optimizers. It is added as new
>> function just afterwards. We might want to change this in future, but
>> that is how cgraph_add_new_function works for now (it queues the function
>> to be inserted once IPA pass is done and early optimizations are one
>> IPA pass).
>>
>>> at the very end of early opts critical? If so please add a comment
>>
>> Well, it will work elsewhere too, just you miss optimizations on the split part
>> of function then. In general I see little value in moving it earlier, but I've
>> added coment.
>>
>>> before the NEXT_PASS. I suppose we don't re-run early opts on the
>>> clone as otherwise we could end up splitting it again (and so for
>>> a very large function with cascaded if (x) cheap; if (x) cheap; ....
>>> we will take a very long time to compile because we split of
>>> each if (x) cheap; one after the other? Please add a testcase
>>> like this.)
>>
>> Splitting is considered just once. But while working on pass, just
>> for fun I tried to split at every possible split point and quite surprisingly
>> it did not introduce that much of compile time penalty for such a testcase.
>> (it sped up insn-attrtab compilation that is pretty much like this)
>>
>> So multiple split points is something I plan to add, just want to do that
>> incrementally so the benefits can be more easilly tracked.
>>
>> What should the testcase test?
>>>
>>> + for (parm = DECL_ARGUMENTS (current_function_decl); parm;
>>> + parm = TREE_CHAIN (parm))
>>> + {
>>> + if (!is_gimple_reg (parm))
>>> + {
>>> + if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
>>> + {
>>> + if (dump_file)
>>> + fprintf (dump_file,
>>> + " Refused: need to pass non-ssa param
>>> values\n");
>>>
>>> I'm confused by the non_ssa_vars thing. This is just tracking
>>> whether a non-SSA parameter is being used?
>>
>> Yes, to handle non-ssa arguments, we will need to pass them by reference
>> that means that we will need to change function prototype in less trivial
>> way than by removing its arguments.
>>
>> arg adjustments used in ipa-sra can do this, all we need is to hook them into
>> clonning machinery that is something I plan to do relatively soon too (it will
>> imply support in virutal clones that needs way to stream them into cgraphopt
>> section for WHOPR so it is not completely trivial, but the adjustment code was
>> meant for this from beggining).
>>
>> There are two places we check them. First is for SRA arguments, the other
>> place is checking that if non-ssa variable is used, it is either used only by
>> header or by split part, but not by both.
>>
>> (for same reason I also do not support yet sharing values computed by header
>> and used by split part that is something I can do by passing them as argument
>> or recomputing them in split part if they are cheap. I think especially
>> for casts in C++ we will want to do the recomputation in trivial cases)
>>>
>>> + if (current->split_size <= call_overhead)
>>> + {
>>> + if (dump_file)
>>> + fprintf (dump_file,
>>> + " Refused: split size is smaller than call overhead\n");
>>> + return;
>>> + }
>>>
>>> also refuse to split if the size of the header is bigger than that
>>> of the tail? Otherwise we're just going to inline it back, no?
>>
>> Not necesarily - especially when the split point is cold and thus we end
>> up inlining for size.
>>
>> This is something I want to experiment with too. At the moment we produce
>> more splitting than needed (i.e. we often inline things back) that has
>> negative effect on compile time and debug info. In practice I didn't found
>> this bad.
>>
>> Keeping the function splitting spearate from inliner heuristics, we can also
>> play safe and split only when the function becomes inlinable after splitting
>> or the split part is cold and its size
>>> + if (is_gimple_debug (gsi_stmt (bsi)))
>>> + continue;
>>>
>>> I wonder if debug gets confused if you reference vars that are not
>>> there...
>>
>> tree-inline will remove the debug statement (and thus we lose some debug info
>> when split part is inlined back), but in general it works.
>>> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if
>>> none
>>> + found. */
>>>
>>> What about BUILT_IN_RETURN? Or returns via EH or abnormally returning
>>> fns like longjmp?
>>
>> EH or abnormal jumps are not problem. Functions returning both via
>> normal return and BUILT_IN_RETURN (or functions having two return statement
>> one with and one without return value) are currently outlawed.
>>
>> We can support multiple return BBs if needed, I added TODO for that reason.
>>> + /* At present we can't pass non-SSA arguments to split function.
>>> + FIXME: this can be relaxed by passing references to arguments. */
>>> + if (TREE_CODE (t) == PARM_DECL)
>>> + {
>>> + if (dump_file)
>>> + fprintf (dump_file, "Can not split use of non-ssa function
>>> parameter.\n");
>>> + return true;
>>> + }
>>>
>>> I think this will do excessive dumping. At least make it
>>> dump_flags & TDF_DETAILS.
>>
>> Done.
>>> + if ((!node->callers || !node->callers->next_caller)
>>> + && !node->address_taken
>>> + && ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
>>> + {
>>> + if (dump_file)
>>> + fprintf (dump_file, "Not splitting: not called directly or called
>>> once.\n");
>>> + return 0;
>>> + }
>>>
>>> Why the check for node->address_taken? Why the check
>>> for external visibility?
>>
>> We need to decide if function can possibly end up calling more than once when it reach
>> inliner. When it has address taken, we can devirtualize the call.
>> When doing LTO, it might be called directly from other unit.
>>>
>>> Ok with the above changes (please go over all dumping and make sure
>>> that without -details we only say that we are splitting and where
>>> and dump the split function).
>>
>> You seem to be consistently on less dumping by default than me. I find dumping
>> of reasons why split points are not accepted quite useful, but I guess using
>> -details is not problem.
>>
>
> This breaks C++:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44671
Also caused:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44687
--
H.J.
More information about the Gcc-patches
mailing list