[PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

Fri Sep 19 16:19:00 GMT 2014

On 09/19/14 10:02, Jiong Wang wrote:
>
> On 19/09/14 16:49, Jeff Law wrote:
>> On 09/19/14 05:27, Jiong Wang wrote:
>>> On 19/09/14 06:51, Jeff Law wrote:
>>>> On 09/16/14 00:55, Zhenqiang Chen wrote:
>>>>>> -----Original Message-----
>>>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>>>>>> owner@gcc.gnu.org] On Behalf Of Jiong Wang
>>>>>> Sent: Monday, September 15, 2014 11:28 PM
>>>>>> To: Zhenqiang Chen
>>>>>> Cc: gcc-patches@gcc.gnu.org; Jeff Law
>>>>>> Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop:
>>>>>> split
>>>>>> live_edge
>>>>>>
>>>>>> On 08/05/14 09:07, Zhenqiang Chen wrote:
>>>>>>>      static bool
>>>>>>>      move_insn_for_shrink_wrap (basic_block bb, rtx insn,
>>>>>>>                                const HARD_REG_SET uses,
>>>>>>>                                const HARD_REG_SET defs,
>>>>>>> -                          HARD_REG_SET *last_uses)
>>>>>>> +                          HARD_REG_SET *last_uses,
>>>>>>> +                          bool *split_p)
>>>>>>> {
>>>>>>>        rtx set, src, dest;
>>>>>>>        bitmap live_out, live_in, bb_uses, bb_defs;
>>>>>>>        unsigned int i, dregno, end_dregno, sregno, end_sregno;
>>>>>>>        basic_block next_block;
>>>>>>> +  edge live_edge;
>>>>>>>
>>>>>>>        /* Look for a simple register copy.  */
>>>>>>>        set = single_set (insn);
>>>>>>> @@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,
>>>>>> rtx insn,
>>>>>>>            || overlaps_hard_reg_set_p (defs, GET_MODE (dest),
>>>>>>> dregno))
>>>>>>>          return false;
>>>>>>>
>>>>>>> -  /* See whether there is a successor block to which we could move
>>>>>>> INSN.  */
>>>>>>> -  next_block = next_block_for_reg (bb, dregno, end_dregno);
>>>>>>> -  if (!next_block)
>>>>>>> +  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
>>>>>>> + (!live_edge)
>>>>>>>           return false;
>>>>>>>
>>>>>>> +  next_block = live_edge->dest;
>>>>>>> +
>>>>>>>        /* If the destination register is referred in later insn,
>>>>>>>           try to forward it.  */
>>>>>>>        if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest),
>>>>>>> dregno)
>>>>>>>            && !try_copy_prop (bb, insn, src, dest, last_uses))
>>>>>>>          return false;
>>>>>>>
>>>>>>> +  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
>>>>>>> + (next_block->preds) == 2)
>>>>>>> +    {
>>>>>>> +      next_block = split_edge (live_edge);
>>>>>>> +
>>>>>>> +      bitmap_copy (df_get_live_in (next_block), df_get_live_out
>>>>>>> + (bb));
>>>>>> (re-sent, looks like the first send not received by the server...)
>>>>>>
>>>>>>      for the function "_IO_wdefault_xsputn" in glibc wgenops.c (.dot
>>>>>> file
>>>>>> attached)
>>>>>>
>>>>>>      174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block
>>>>>> because
>>>>>> of the sink of move instruction "ax:DI = 0" and split edge.
>>>>>>
>>>>>>      but the live_in of this BB is copied from live_out of BB 2 which
>>>>>> is too big, and
>>>>>> actually prevent the later sink of "16: r12:DI=dx:DI".
>>>>>>
>>>>>>      Should it be better to copy live_in from "next_block->next_bb"
>>>>>> instead of
>>>>>> live_out from "bb", as it will model what's needed more accurately?
>>>>> According to the algorithm, next_block->next_bb (which is
>>>>> live_edge->dest) should have two predecessors. Live_in of
>>>>> next_block->next_bb would include live_out of the other edge,  which
>>>>> is not necessary accurately. To be more accurate, you may need an
>>>>> intersection of df_get_live_out (bb) and df_get_live_in
>>>>> (next_block->next_bb).
>>>>>
>>>> Presumably we can't recompute the liveness info here as it's too
>>>> expensive to do that each time we split an edge to sink a statement?
>>>> ie, we need the more accurate liveness within this pass so that we can
>>>> sink further instructions?
>>> for a single case, x86 bootstrap, looks like these code are not compile
>>> time critical
>>> as for all 4762 live edges which could sink instructions
>>>
>>> 4139: EDGE_COUNT (next_block->preds) == 1
>>> 270: EDGE_COUNT (next_block->preds) == 2
>>> 353: EDGE_COUNT (next_block->preds) > 2
>>>
>>> and if we make the live info more accurate here, just very few
>>> additional functions (< 10)
>>> shrink-wrapped from glibc build & gcc bootstrap test.
>> Perhaps I wasn't clear.
>>
>> The whole point behind the proposed changes is to enable additional
>> sinking that is currently blocked because of the overly large set of
>> live objects.
>>
>> So my question was an attempt to understand why we didn't just a normal
>> liveness update -- I speculated that we aren't doing a normal liveness
>> update because of the compile-time cost.
>>
>>
>>
>>> So, is it OK we simply change "bitmap_copy  (df_get_live_in
>>> (next_block),  df_get_live_out  (bb));"
>>> into "bitmap_and  (df_get_live_in  (next_block),  df_get_live_out
>>> (bb),  df_get_live_in  (next_block->next_bb));" ?
>> Probably.  Though I'd be a bit concerned with next_block->next_bb.
>> Wouldn't it be safer to stash away the relevant basic block prior to the
>> call to split_edge, then use that saved copy.  Something like this
>> (untested):
>>
>> basic_block old_dest = live_edge->dest;
>> next_block = split_edge (live_edge);
>>
>> /* We create a new basic block.  Call df_grow_bb_info to make sure
>>      all data structures are allocated.  */
>> df_grow_bb_info (df_live);
>> bitmap_and (df_get_live_in (next_block),
>>               df_get_live_out (bb),
>>               df_get_live_in (old_dest));
>>
>>
>> The idea being we don't want to depend on the precise ordering blocks in
>> the block chain.
>>
>> Could you try that and see if it does what you need?
> Jeff,
>
>    Thanks, verified, it works.
Great.  Can you send an updated patchkit for review.

Thanks,

jeff