This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Partial inlining
On Fri, Jun 25, 2010 at 1:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> The above numbers are all with the max-inline-insns-auto reduction?
>> I'd like to have that adjustment split out as a separate followup patch.
>
> Yes, without the change we generally tend to do more inlining (as we produce
> more inline candidates). ?Not exactly in all cases: when we split at point
> that the call to split function is cold, we reduce size even for unchanges
> max-inlie-insns-auto.
>
> I've removed this change from patch and will commit it sequentially.
>>
>> The partial-inlining-entry-probability default looks quite high. ?I would
>> have expected that guessed probabilities of say
>>
>> ? if (flag)
>> ? ? return;
>>
>> make the entry taken 50% of the time? ?So, did you experiment with
>> other values than 70 or is that something that is suggested by literature?
>
> Well, not really. ?It depends on what is inside if conditional.
> If it is if (flag < 0), lets say, we will hit non-negative heuristics
> and will predict it with 79%.
>
> The value of 70 seemed high enough to take care of most of random fuzz from the
> heruistics, but it is something I plan to experiment with in the future.
> In general the predictors on tests of this type are quite ineffective.
>>
>> You run ipa-split late during early optimizations. ?Will the new clone
>> you create go through the early opts again? ?Thus, is the placement
>
> New clone is not considered by early optimizers. It is added as new
> function just afterwards. ?We might want to change this in future, but
> that is how cgraph_add_new_function works for now (it queues the function
> to be inserted once IPA pass is done and early optimizations are one
> IPA pass).
>
>> at the very end of early opts critical? ?If so please add a comment
>
> Well, it will work elsewhere too, just you miss optimizations on the split part
> of function then. ?In general I see little value in moving it earlier, but I've
> added coment.
>
>> before the NEXT_PASS. ?I suppose we don't re-run early opts on the
>> clone as otherwise we could end up splitting it again (and so for
>> a very large function with cascaded if (x) cheap; if (x) cheap; ....
>> we will take a very long time to compile because we split of
>> each if (x) cheap; one after the other? ?Please add a testcase
>> like this.)
>
> Splitting is considered just once. ?But while working on pass, just
> for fun I tried to split at every possible split point and quite surprisingly
> it did not introduce that much of compile time penalty for such a testcase.
> (it sped up insn-attrtab compilation that is pretty much like this)
>
> So multiple split points is something I plan to add, just want to do that
> incrementally so the benefits can be more easilly tracked.
>
> What should the testcase test?
>>
>> + ?for (parm = DECL_ARGUMENTS (current_function_decl); parm;
>> + ? ? ? parm = TREE_CHAIN (parm))
>> + ? ?{
>> + ? ? ?if (!is_gimple_reg (parm))
>> + ? ? ? {
>> + ? ? ? ? if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
>> + ? ? ? ? ? {
>> + ? ? ? ? ? ? if (dump_file)
>> + ? ? ? ? ? ? ? fprintf (dump_file,
>> + ? ? ? ? ? ? ? ? ? ? ? ?" ?Refused: need to pass non-ssa param
>> values\n");
>>
>> I'm confused by the non_ssa_vars thing. ?This is just tracking
>> whether a non-SSA parameter is being used?
>
> Yes, to handle non-ssa arguments, we will need to pass them by reference
> that means that we will need to change function prototype in less trivial
> way than by removing its arguments.
>
> arg adjustments used in ipa-sra can do this, all we need is to hook them into
> clonning machinery that is something I plan to do relatively soon too (it will
> imply support in virutal clones that needs way to stream them into cgraphopt
> section for WHOPR so it is not completely trivial, but the adjustment code was
> meant for this from beggining).
>
> There are two places we check them. ?First is for SRA arguments, the other
> place is checking that if non-ssa variable is used, it is either used only by
> header or by split part, but not by both.
>
> (for same reason I also do not support yet sharing values computed by header
> and used by split part that is something I can do by passing them as argument
> or recomputing them in split part if they are cheap. ?I think especially
> for casts in C++ we will want to do the recomputation in trivial cases)
>>
>> + ?if (current->split_size <= call_overhead)
>> + ? ?{
>> + ? ? ?if (dump_file)
>> + ? ? ? fprintf (dump_file,
>> + ? ? ? ? ? ? ? ?" ?Refused: split size is smaller than call overhead\n");
>> + ? ? ?return;
>> + ? ?}
>>
>> also refuse to split if the size of the header is bigger than that
>> of the tail? ?Otherwise we're just going to inline it back, no?
>
> Not necesarily - especially when the split point is cold and thus we end
> up inlining for size.
>
> This is something I want to experiment with too. ?At the moment we produce
> more splitting than needed (i.e. we often inline things back) that has
> negative effect on compile time and debug info. In practice I didn't found
> this bad.
>
> Keeping the function splitting spearate from inliner heuristics, we can also
> play safe and split only when the function becomes inlinable after splitting
> or the split part is cold and its size
>> + ? ? ? ? ? ? ? if (is_gimple_debug (gsi_stmt (bsi)))
>> + ? ? ? ? ? ? ? ? continue;
>>
>> I wonder if debug gets confused if you reference vars that are not
>> there...
>
> tree-inline will remove the debug statement (and thus we lose some debug info
> when split part is inlined back), but in general it works.
>> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if
>> none
>> + ? found. ?*/
>>
>> What about BUILT_IN_RETURN? ?Or returns via EH or abnormally returning
>> fns like longjmp?
>
> EH or abnormal jumps are not problem. ?Functions returning both via
> normal return and BUILT_IN_RETURN (or functions having two return statement
> one with and one without return value) are currently outlawed.
>
> We can support multiple return BBs if needed, I added TODO for that reason.
>> + ?/* At present we can't pass non-SSA arguments to split function.
>> + ? ? FIXME: this can be relaxed by passing references to arguments. ?*/
>> + ?if (TREE_CODE (t) == PARM_DECL)
>> + ? ?{
>> + ? ? ?if (dump_file)
>> + ? ? ? fprintf (dump_file, "Can not split use of non-ssa function
>> parameter.\n");
>> + ? ? ?return true;
>> + ? ?}
>>
>> I think this will do excessive dumping. ?At least make it
>> dump_flags & TDF_DETAILS.
>
> Done.
>> + ?if ((!node->callers || !node->callers->next_caller)
>> + ? ? ?&& !node->address_taken
>> + ? ? ?&& ((!flag_lto && !flag_whopr) || !node->local.externally_visible))
>> + ? ?{
>> + ? ? ?if (dump_file)
>> + ? ? ? fprintf (dump_file, "Not splitting: not called directly or called
>> once.\n");
>> + ? ? ?return 0;
>> + ? ?}
>>
>> Why the check for node->address_taken? ?Why the check
>> for external visibility?
>
> We need to decide if function can possibly end up calling more than once when it reach
> inliner. ?When it has address taken, we can devirtualize the call.
> When doing LTO, it might be called directly from other unit.
>>
>> Ok with the above changes (please go over all dumping and make sure
>> that without -details we only say that we are splitting and where
>> and dump the split function).
>
> You seem to be consistently on less dumping by default than me. ?I find dumping
> of reasons why split points are not accepted quite useful, but I guess using
> -details is not problem.
>
This breaks C++:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44671
--
H.J.