This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
- From: Jiong Wang <jiong dot wang at arm dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Steven Bosscher <stevenb dot gcc at gmail dot com>, "gcc-patches\ at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Kenneth Zadeck <zadeck at naturalbridge dot com>
- Date: Fri, 24 Apr 2015 17:18:24 +0100
- Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
- Authentication-results: sourceware.org; auth=none
- References: <54803EBE dot 2060607 at arm dot com> <CAFiYyc0gEQt_Ci1TyCfYys=JnZMr8FmYW7dFtq+mBmqKjeuttw at mail dot gmail dot com> <5480B6D6 dot 2020201 at arm dot com> <548EFE0D dot 1070808 at arm dot com> <548EFE55 dot 6090901 at arm dot com> <CAFiYyc3oYRsYkQwivE+T4A4mysDBe0gjZqjroQ8B2p1J6sakQg at mail dot gmail dot com> <54930811 dot 1020003 at arm dot com> <20141218220908 dot GA20720 at gate dot crashing dot org> <CAHFci28ajc8KqKEvyYYvQHbhYkZ-ExV8ixJ+SNuqV8bg3n7JJQ at mail dot gmail dot com> <CAAfDdZ0EZ6EVN_wYFFuh81ptL2c_Em-Ub-2s4GO7Vp0QKjd-=Q at mail dot gmail dot com> <CAFiYyc32CJJTjakxMLjkCQAJLrv1u0PSjifTs=A4V4q4nOFTKg at mail dot gmail dot com> <5494426A dot 9010209 at naturalbridge dot com> <CAAfDdZ2xrfRYoD8eO1L+8StWh53OhFNBy4ZMRt-K4xSj6r64eA at mail dot gmail dot com> <54DB6587 dot 1020207 at naturalbridge dot com> <54DB9CDB dot 5090304 at arm dot com> <CAAfDdZ29jHnFFGCpi8Adgf4hXk80QQH-vCrV=m0wdZNkT0x84A at mail dot gmail dot com> <CABu31nPMZ5ZCx+frisV+AT9pmC+DumN+Sjt=UscZ48kze4_3YQ at mail dot gmail dot com> <552D4D61 dot 9040100 at redhat dot com> <CAAfDdZ1G_0A0k0RRF2XO_5W6xN13BoA_18+s-Z68P1YTS! 32mMA at mail dot gmail dot com> <n99vbgp35kl dot fsf at arm dot com> <5539A2A8 dot 6050505 at redhat dot com>
Jeff Law writes:
> On 04/21/2015 08:24 AM, Jiong Wang wrote:
>>
>> Jiong Wang writes:
>>
>>> 2015-04-14 18:24 GMT+01:00 Jeff Law <law@redhat.com>:
>>>> On 04/14/2015 10:48 AM, Steven Bosscher wrote:
>>>>>>
>>>>>> So I think this stage2/3 binary difference is acceptable?
>>>>>
>>>>>
>>>>> No, they should be identical. If there's a difference, then there's a
>>>>> bug - which, it seems, you've already found, too.
>>>>
>>>> RIght. And so the natural question is how to fix.
>>>>
>>>> At first glance it would seem like having this new code ignore dependencies
>>>> rising from debug insns would work.
>>>>
>>>> Which then begs the question, what happens to the debug insn -- it's
>>>> certainly not going to be correct anymore if the transformation is made.
>>>
>>> Exactly.
>>>
>>> The debug_insn 2776 in my example is to record the base address of a
>>> local array. the new code is doing correctly here by not shuffling the
>>> operands of insn 2556 and 2557 as there is additional reference of
>>> reg:1473 from debug insn, although the code will still execute correctly
>>> if we do the transformation.
>>>
>>> my understanding to fix this:
>>>
>>> * delete the out-of-date mismatch debug_insn? as there is no guarantee
>>> to generate accurate debug info under -O2.
>>>
>>> IMO, this debug_insn may affect "DW_AT_location" field for variable
>>> descrption of "classes" in .debug_info section, but it's omitted in
>>> the final output already.
>>>
>>> <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable)
>>> <38a4f> DW_AT_name : (indirect string, offset: 0x18db): classes
>>> <38a53> DW_AT_decl_file : 1
>>> <38a54> DW_AT_decl_line : 548
>>> <38a56> DW_AT_type : <0x38cb4>
>>>
>>> * update the debug_insn? if the following change is OK with dwarf standard
>>>
>>> from
>>>
>>> insn0: reg0 = fp + reg1
>>> debug_insn: var_loc = reg0 + const_off
>>> insn1: reg2 = reg0 + const_off
>>>
>>> to
>>>
>>> insn0: reg0 = fp + const_off
>>> debug_insn: var_loc = reg0 + reg1
>>> insn1: reg2 = reg0 + reg1
>>>
>>> Thanks,
>>>
>>
>> And attachment is the new patch which will update debug_insn as
>> described in the second solution above.
>>
>> Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK.
>>
>> On AArch64, this patch give 600+ new rtl loop invariants found across
>> spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new
>> invariants found in the critical function "regex_compile".
>>
>> The similar improvements may be achieved on other RISC backends like
>> powerpc/mips I guess.
>>
>> One thing to mention, for AArch64, one minor glitch in
>> aarch64_legitimize_address needs to be fixed to let this patch take
>> effect, I will send out that patch later as it's a seperate issue.
>> Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so
>> should be OK, and I verified the base address of local array in the
>> testcase given by Seb on pr62173 do hoisted on ppc64 now. I think
>> pr62173 is fixed on those 64bit arch by this patch.
>>
>> Thoughts?
>>
>> Thanks.
>>
>> 2015-04-21 Jiong Wang <jiong.wang@arm.com>
>>
>> gcc/
>> * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build.
>> (vfp_const_iv): New hash table.
>> (expensive_addr_check_p): New boolean.
>> (init_inv_motion_data): Initialize new variables.>
>> (free_inv_motion_data): Release hash table.
>> (create_new_invariant): Set cheap_address to false for iv in
>> vfp_const_iv table.
>> (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv
>> table.
>> (use_for_single_du): New function.
>> (reshuffle_insn_with_vfp): Likewise.
>> (find_invariants_bb): Call reshuffle_insn_with_vfp.
>>
>> gcc/testsuite/
>> * gcc.dg/pr62173.c: New testcase.
> So ultimately isn't this just a specialized version of reassociation of
> pointer arithmetic? And if so, don't we have all the usual problems
> around introducing overflow?
>
>
> ISTM if this is going to go forward (ie, we somehow convince ourselves
> that this kind of reassociation is OK), then should it be made to apply
> on pointers in general rather than restricting to stack, frame,
> virtual-frame?
>
Jeff,
Thanks for the review.
This transformation is not reassociation of pointer arithmetic.
The idea of this patch is, given two instructions with variable value,
we may get new instruction sequences with fixed value by reassociating
their operands. And currently GCC only generate such instruction
sequences for local array accessing as far as I known.
for the statement "D.2707 = A[i]", gcc generate the following
instruction sequence:
(insn 92 91 93 6 (set (reg/f:DI 148)
(plus:DI (reg/f:DI 64 sfp)
(reg:DI 147 [ i ])))
(expr_list:REG_DEAD (reg:DI 147 [ i ])
(nil)))
(insn 93 92 94 6 (set (reg:SI 149 [ D.2707 ])
(zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148)
(const_int -16 [0xfffffffffffffff0])) [0 A S1 A8])))
(expr_list:REG_DEAD (reg/f:DI 148)
(nil)))
both insn 92 and insn 93 are with variable value, but
"(reg/f:DI 64 sfp)" in insn 92 and "const_int -16" in insn 93 are
fixed value.
So my patch will transform above sequence into the following if
"apply_change_group" succeed. Then insn 92 is with fixed value and
thus loop invariant.
(insn 92 88 97 5 (set (reg/f:DI 148)
(plus:DI (reg/f:DI 64 sfp)
(const_int -16 [0xfffffffffffffff0])))
(nil))
(insn 93 131 94 6 (set (reg:SI 149 [ D.2707 ])
(zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 148)
(reg:DI 147 [ i ])) [0 A S1 A8])))
(expr_list:REG_DEAD (reg:DI 147 [ i ])
(expr_list:REG_DEAD (reg/f:DI 148)
(nil))))
This tranformation will only be done if the def of r148 in insn 92 is the single
def for the use of it in insn 93, and the use in insn 93 is also
single use.
And if there is overflow, then it will happen in the original insn 93
also, then it should not be generated in the first place. So I think
there is no need to do overflow check.
Does this explanation make sense?
--
Regards,
Jiong