[PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

Fri Apr 24 17:05:00 GMT 2015

Jeff Law writes:

> On 04/21/2015 08:24 AM, Jiong Wang wrote:
>>
>> Jiong Wang writes:
>>
>>> 2015-04-14 18:24 GMT+01:00 Jeff Law <law@redhat.com>:
>>>> On 04/14/2015 10:48 AM, Steven Bosscher wrote:
>>>>>>
>>>>>> So I think this stage2/3 binary difference is acceptable?
>>>>>
>>>>>
>>>>> No, they should be identical. If there's a difference, then there's a
>>>>> bug - which, it seems, you've already found, too.
>>>>
>>>> RIght.  And so the natural question is how to fix.
>>>>
>>>> At first glance it would seem like having this new code ignore dependencies
>>>> rising from debug insns would work.
>>>>
>>>> Which then begs the question, what happens to the debug insn -- it's
>>>> certainly not going to be correct anymore if the transformation is made.
>>>
>>> Exactly.
>>>
>>> The debug_insn 2776 in my example is to record the base address of a
>>> local array. the new code is doing correctly here by not shuffling the
>>> operands of insn 2556 and 2557 as there is additional reference of
>>> reg:1473 from debug insn, although the code will still execute correctly
>>> if we do the transformation.
>>>
>>> my understanding to fix this:
>>>
>>>    * delete the out-of-date mismatch debug_insn? as there is no guarantee
>>>      to generate accurate debug info under -O2.
>>>
>>>      IMO, this debug_insn may affect "DW_AT_location" field for variable
>>>      descrption of "classes" in .debug_info section, but it's omitted in
>>>      the final output already.
>>>
>>>      <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable)
>>>      <38a4f>   DW_AT_name : (indirect string, offset: 0x18db): classes
>>>      <38a53>   DW_AT_decl_file   : 1
>>>      <38a54>   DW_AT_decl_line   : 548
>>>      <38a56>   DW_AT_type        : <0x38cb4>
>>>
>>>    * update the debug_insn? if the following change is OK with dwarf standard
>>>
>>>     from
>>>
>>>       insn0: reg0 = fp + reg1
>>>       debug_insn: var_loc = reg0 + const_off
>>>       insn1: reg2 = reg0 + const_off
>>>
>>>     to
>>>
>>>       insn0: reg0 = fp + const_off
>>>       debug_insn: var_loc = reg0 + reg1
>>>       insn1: reg2 = reg0 + reg1
>>>
>>> Thanks,
>>>
>>
>> And attachment is the new patch which will update debug_insn as
>> described in the second solution above.
>>
>> Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK.
>>
>> On AArch64, this patch give 600+ new rtl loop invariants found across
>> spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new
>> invariants found in the critical function "regex_compile".
>>
>> The similar improvements may be achieved on other RISC backends like
>> powerpc/mips I guess.
>>
>> One thing to mention, for AArch64, one minor glitch in
>> aarch64_legitimize_address needs to be fixed to let this patch take
>> effect, I will send out that patch later as it's a seperate issue.
>> Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so
>> should be OK, and I verified the base address of local array in the
>> testcase given by Seb on pr62173 do hoisted on ppc64 now. I think
>> pr62173 is fixed on those 64bit arch by this patch.
>>
>> Thoughts?
>>
>> Thanks.
>>
>> 2015-04-21  Jiong Wang  <jiong.wang@arm.com>
>>
>> gcc/
>>    * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build.
>>    (vfp_const_iv): New hash table.
>>    (expensive_addr_check_p): New boolean.
>>    (init_inv_motion_data): Initialize new variables.>
>>    (free_inv_motion_data): Release hash table.
>>    (create_new_invariant): Set cheap_address to false for iv in
>>    vfp_const_iv table.
>>    (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv
>>    table.
>>    (use_for_single_du): New function.
>>    (reshuffle_insn_with_vfp): Likewise.
>>    (find_invariants_bb): Call reshuffle_insn_with_vfp.
>>
>> gcc/testsuite/
>>     * gcc.dg/pr62173.c: New testcase.
> So ultimately isn't this just a specialized version of reassociation of 
> pointer arithmetic?  And if so, don't we have all the usual problems 
> around introducing overflow?
>
>
> ISTM if this is going to go forward (ie, we somehow convince ourselves 
> that this kind of reassociation is OK), then should it be made to apply 
> on pointers in general rather than restricting to stack, frame, 
> virtual-frame?
>

Jeff,

  Thanks for the review.

  This transformation is not reassociation of pointer arithmetic.

  The idea of this patch is, given two instructions with variable value,
  we may get new instruction sequences with fixed value by reassociating
  their operands. And currently GCC only generate such instruction
  sequences for local array accessing as far as I known.

  for the statement "D.2707 = A[i]", gcc generate the following
  instruction sequence:

  (insn 92 91 93 6 (set (reg/f:DI 148)
        (plus:DI (reg/f:DI 64 sfp)
            (reg:DI 147 [ i ])))
     (expr_list:REG_DEAD (reg:DI 147 [ i ])
        (nil)))

  (insn 93 92 94 6 (set (reg:SI 149 [ D.2707 ])
        (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148)
                    (const_int -16 [0xfffffffffffffff0])) [0 A S1 A8])))
     (expr_list:REG_DEAD (reg/f:DI 148)
        (nil)))

  both insn 92 and insn 93 are with variable value, but
  "(reg/f:DI 64 sfp)" in insn 92 and "const_int -16" in insn 93 are
  fixed value.

  So my patch will transform above sequence into the following if
  "apply_change_group" succeed. Then insn 92 is with fixed value and
  thus loop invariant.

  (insn 92 88 97 5 (set (reg/f:DI 148)
        (plus:DI (reg/f:DI 64 sfp)
            (const_int -16 [0xfffffffffffffff0])))
     (nil))

  (insn 93 131 94 6 (set (reg:SI 149 [ D.2707 ])
        (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148)
                    (reg:DI 147 [ i ])) [0 A S1 A8])))
     (expr_list:REG_DEAD (reg:DI 147 [ i ])
        (expr_list:REG_DEAD (reg/f:DI 148)
            (nil))))

  This tranformation will only be done if the def of r148 in insn 92 is the single
  def for the use of it in insn 93, and the use in insn 93 is also
  single use.

  And if there is overflow, then it will happen in the original insn 93
  also, then it should not be generated in the first place. So I think
  there is no need to do overflow check.

  Does this explanation make sense? 

-- 
Regards,
Jiong