This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Partial inlining


On Fri, Jun 25, 2010 at 11:48 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 1:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> The above numbers are all with the max-inline-insns-auto reduction?
>>> I'd like to have that adjustment split out as a separate followup patch.
>>
>> Yes, without the change we generally tend to do more inlining (as we produce
>> more inline candidates). ?Not exactly in all cases: when we split at point
>> that the call to split function is cold, we reduce size even for unchanges
>> max-inlie-insns-auto.
>>
>> I've removed this change from patch and will commit it sequentially.
>>>
>>> The partial-inlining-entry-probability default looks quite high. ?I would
>>> have expected that guessed probabilities of say
>>>
>>> ? if (flag)
>>> ? ? return;
>>>
>>> make the entry taken 50% of the time? ?So, did you experiment with
>>> other values than 70 or is that something that is suggested by literature?
>>
>> Well, not really. ?It depends on what is inside if conditional.
>> If it is if (flag < 0), lets say, we will hit non-negative heuristics
>> and will predict it with 79%.
>>
>> The value of 70 seemed high enough to take care of most of random fuzz from the
>> heruistics, but it is something I plan to experiment with in the future.
>> In general the predictors on tests of this type are quite ineffective.
>>>
>>> You run ipa-split late during early optimizations. ?Will the new clone
>>> you create go through the early opts again? ?Thus, is the placement
>>
>> New clone is not considered by early optimizers. It is added as new
>> function just afterwards. ?We might want to change this in future, but
>> that is how cgraph_add_new_function works for now (it queues the function
>> to be inserted once IPA pass is done and early optimizations are one
>> IPA pass).
>>
>>> at the very end of early opts critical? ?If so please add a comment
>>
>> Well, it will work elsewhere too, just you miss optimizations on the split part
>> of function then. ?In general I see little value in moving it earlier, but I've
>> added coment.
>>
>>> before the NEXT_PASS. ?I suppose we don't re-run early opts on the
>>> clone as otherwise we could end up splitting it again (and so for
>>> a very large function with cascaded if (x) cheap; if (x) cheap; ....
>>> we will take a very long time to compile because we split of
>>> each if (x) cheap; one after the other? ?Please add a testcase
>>> like this.)
>>
>> Splitting is considered just once. ?But while working on pass, just
>> for fun I tried to split at every possible split point and quite surprisingly
>> it did not introduce that much of compile time penalty for such a testcase.
>> (it sped up insn-attrtab compilation that is pretty much like this)
>>
>> So multiple split points is something I plan to add, just want to do that
>> incrementally so the benefits can be more easilly tracked.
>>
>> What should the testcase test?
>>>
>>> + ?for (parm = DECL_ARGUMENTS (current_function_decl); parm;
>>> + ? ? ? parm = TREE_CHAIN (parm))
>>> + ? ?{
>>> + ? ? ?if (!is_gimple_reg (parm))
>>> + ? ? ? {
>>> + ? ? ? ? if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
>>> + ? ? ? ? ? {
>>> + ? ? ? ? ? ? if (dump_file)
>>> + ? ? ? ? ? ? ? fprintf (dump_file,
>>> + ? ? ? ? ? ? ? ? ? ? ? ?" ?Refused: need to pass non-ssa param
>>> values\n");
>>>
>>> I'm confused by the non_ssa_vars thing. ?This is just tracking
>>> whether a non-SSA parameter is being used?
>>
>> Yes, to handle non-ssa arguments, we will need to pass them by reference
>> that means that we will need to change function prototype in less trivial
>> way than by removing its arguments.
>>
>> arg adjustments used in ipa-sra can do this, all we need is to hook them into
>> clonning machinery that is something I plan to do relatively soon too (it will
>> imply support in virutal clones that needs way to stream them into cgraphopt
>> section for WHOPR so it is not completely trivial, but the adjustment code was
>> meant for this from beggining).
>>
>> There are two places we check them. ?First is for SRA arguments, the other
>> place is checking that if non-ssa variable is used, it is either used only by
>> header or by split part, but not by both.
>>
>> (for same reason I also do not support yet sharing values computed by header
>> and used by split part that is something I can do by passing them as argument
>> or recomputing them in split part if they are cheap. ?I think especially
>> for casts in C++ we will want to do the recomputation in trivial cases)
>>>
>>> + ?if (current->split_size <= call_overhead)
>>> + ? ?{
>>> + ? ? ?if (dump_file)
>>> + ? ? ? fprintf (dump_file,
>>> + ? ? ? ? ? ? ? ?" ?Refused: split size is smaller than call overhead\n");
>>> + ? ? ?return;
>>> + ? ?}
>>>
>>> also refuse to split if the size of the header is bigger than that
>>> of the tail? ?Otherwise we're just going to inline it back, no?
>>
>> Not necesarily - especially when the split point is cold and thus we end
>> up inlining for size.
>>
>> This is something I want to experiment with too. ?At the moment we produce
>> more splitting than needed (i.e. we often inline things back) that has
>> negative effect on compile time and debug info. In practice I didn't found
>> this bad.
>>
>> Keeping the function splitting spearate from inliner heuristics, we can also
>> play safe and split only when the function becomes inlinable after splitting
>> or the split part is cold and its size
>>> + ? ? ? ? ? ? ? if (is_gimple_debug (gsi_stmt (bsi)))
>>> + ? ? ? ? ? ? ? ? continue;
>>>
>>> I wonder if debug gets confused if you reference vars that are not
>>> there...
>>
>> tree-inline will remove the debug statement (and thus we lose some debug info
>> when split part is inlined back), but in general it works.
>>> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if
>>> none
>>> + ? found. ?*/
>>>
>>> What about BUILT_IN_RETURN? ?Or returns via EH or abnormally returning
>>> fns like longjmp?
>>
>> EH or abnormal jumps are not problem. ?Functions returning both via
>> normal return and BUILT_IN_RETURN (or functions having two return statement
>> one with and one without return value) are currently outlawed.
>>
>> We can support multiple return BBs if needed, I added TODO for that reason.
>>> + ?/* At present we can't pass non-SSA arguments to split function.
>>> + ? ? FIXME: this can be relaxed by passing references to arguments. ?*/
>>> + ?if (TREE_CODE (t) == PARM_DECL)
>>> + ? ?{
>>> + ? ? ?if (dump_file)
>>> + ? ? ? fprintf (dump_file, "Can not split use of non-ssa function
>>> parameter.\n");
>>> + ? ? ?return true;
>>> + ? ?}
>>>
>>> I think this will do excessive dumping. ?At least make it
>>> dump_flags & TDF_DETAILS.
>>
>> Done.
>>> + ?if ((!node->callers || !node->callers->next_caller)
>>> + ? ? ?&& !node->address_taken
>>> + ? ? ?&& ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
>>> + ? ?{
>>> + ? ? ?if (dump_file)
>>> + ? ? ? fprintf (dump_file, "Not splitting: not called directly or called
>>> once.\n");
>>> + ? ? ?return 0;
>>> + ? ?}
>>>
>>> Why the check for node->address_taken? ?Why the check
>>> for external visibility?
>>
>> We need to decide if function can possibly end up calling more than once when it reach
>> inliner. ?When it has address taken, we can devirtualize the call.
>> When doing LTO, it might be called directly from other unit.
>>>
>>> Ok with the above changes (please go over all dumping and make sure
>>> that without -details we only say that we are splitting and where
>>> and dump the split function).
>>
>> You seem to be consistently on less dumping by default than me. ?I find dumping
>> of reasons why split points are not accepted quite useful, but I guess using
>> -details is not problem.
>>
>
> This breaks C++:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44671

Also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44687

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]