[PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

Jiong Wang wong.kwongyuan.tools@gmail.com
Tue Apr 14 15:06:00 GMT 2015


2015-02-11 18:18 GMT+00:00 Jiong Wang <jiong.wang@arm.com>:
> On 11/02/15 14:21, Kenneth Zadeck wrote:
>>
>> On 02/11/2015 06:20 AM, Jiong Wang wrote:
>>>
>>> 2014-12-19 15:21 GMT+00:00 Kenneth Zadeck <zadeck@naturalbridge.com>:
>>>>
>>>> however, since i am a nice person ....
>>>>
>>>> loop-invariant solves the DF_UD_CHAIN which builds a data structure that
>>>> connects each use with all of the defs that reach it.   I believe that
>>>> this
>>>> is the opposite of what you want here.
>>>>
>>>> if you really need this, you need to also turn on the DF_DU_CHAIN which
>>>> builds the opposite structure.    Both structures can be space hogs, but
>>>> they are both turned on in other places in the compiler so it is
>>>> unlikely to
>>>> be an issue.
>>>
>>> Exactly, Thanks, Kenneth.
>>>
>>> This approach works from my experiment and look much better than
>>> previous REG_NOTE approach.
>>> while it do have one problem. We need the UD/DU chain built before we
>>> do insn re-shuffling.
>>> While after re-shuffling, UD chain needs update, otherwise, the later
>>> "check_dependecies" in find_invariant_insn may fail.
>>>
>>> although we have re-shuffle instruction 1 into 2, the later check
>>> still using old UD info, thus
>>> decide instruction 2 is not iv.
>>>
>>> 1: regA <- vfp + regB
>>> 2: regA <- vfp + const
>>>
>>> my current fix is to insert those re-shuffled insn into a table named
>>> "vfp_const_iv", then skip those
>>> dependencies check  for them as they don't have any dependencies.
>>
>> You now are beginning to understand why Mark Wegman and I invented SSA
>> form almost 30 years ago!!!!
>
>
> Indeed, thanks.
>
> attachment is the new draft patch, pass x86-84 bootstrap
> (actually it will not affect x86-64, because of addressing mode),
>  while failed on aarch64 bootstrap during stage2/3 binary comparision,
> there must be some glitches needs to be fixed.
>
> Will clean up later after I finished my upcoming long flight and what I
> am seeking now is whether the general thoughts, code logic & framework, in
> this
> patch is acceptable to the community?
>
> personally, I think this version is much better than previous version.
> new added code integrated with existed code in a more natural way. no use
> of unsafe REG_NOTE info which is not updated in this pass.
>
> and from AArch64 gcc bootstrap, 239 new loop invariant found by this
> re-shuffling. so, this small improvement on rtl loop invariant pass do have
> some value.
>
> please review and give comments.
>
> Thanks.

Ping ~

And for the bootstrap stage2/3 binary comparision failure,  the
different is in ira-costs.o.

because for stage2, we actually add one extra compilation option
"-gtoggle" while not for stage3, as
we want to make sure debug info generation will not affect any real
instruction generation.

but, after some investigation I found gcc actually generate difference
code when debug info enabled because
DEBUG_INSN will affect data flow analysis.

(insn 2556 2555 2776 257 (set (reg/f:DI 1473)
        (plus:DI (reg/f:DI 64 sfp)
            (reg:DI 1474))) ../../gcc/gcc/ira-costs.c:628 87 {*adddi3_aarch64}
     (expr_list:REG_DEAD (reg:DI 1474)
        (nil)))
(debug_insn 2776 2556 2557 257 (var_location:SI D#105 (mem:SI (plus:DI
(reg/f:DI 1473)
            (const_int -240 [0xffffffffffffff10])) [23 classes S4 A32])) -1
     (nil))
(insn 2557 2776 2558 257 (set (reg:SI 591 [ D.69930 ])
        (mem:SI (plus:DI (reg/f:DI 1473)
                (const_int -240 [0xffffffffffffff10])) [23 classes S4
A32])) ../../gcc/gcc/ira-costs.c:628 39 {*movsi_aarch64}
     (expr_list:REG_DEAD (reg/f:DI 1473)
        (nil)))

without "debug_insn 2776", operands of insn 2556 and 2557 can be
shuffled, so that insn 2556 becomes "reg 1473 = sfp  - 240",
thus hoisted out of the loop as invariant.  while with debug_insn
2776,   reg 1473 are used in both insn 2776 and insn 2557, thus
the single def-use analysis returns false, thus we won't do any
transformation on this which is correct.

So I think this stage2/3 binary difference is acceptable?

for compile time, I haven't found difference from -ftime-report. and
ree.c and enabled "DF_UD_CHAIN + DF_DU_CHAIN" also.

>
> 2015-02-11  Jiong Wang  <jiong.wang@arm.com>
>
> gcc/
>   * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build.
>   (vfp_const_iv): New hash table.
>   (expensive_addr_check_p): New boolean.
>   (init_inv_motion_data): Initialize new variables.
>   (free_inv_motion_data): Release hash table.
>   (create_new_invariant): Set cheap_address to false for iv in
>   vfp_const_iv table.
>   (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv
> table.
>   (use_for_single_du): New function.
>   (reshuffle_insn_with_vfp): Likewise.
>   (find_invariants_bb): Call reshuffle_insn_with_vfp.
>  gcc/testsuite/
>   * gcc.dg/pr62173.c: New testcase.



More information about the Gcc-patches mailing list