This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Jiong Wang <jiong dot wang at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 4 Dec 2014 12:07:03 +0100
- Subject: Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
- Authentication-results: sourceware.org; auth=none
- References: <54803EBE dot 2060607 at arm dot com>
On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang <jiong.wang@arm.com> wrote:
> For PR62173, the ideal solution is to resolve the problem on tree level
> ivopt pass.
>
> While, apart from the tree level issue, PR 62173 also exposed another two
> RTL level issues.
> one of them is looks like we could improve RTL level loop invariant hoisting
> by re-shuffle insns.
>
> for Seb's testcase
>
> void bar(int i) {
> char A[10];
> int d = 0;
> while (i > 0)
> A[d++] = i--;
>
> while (d > 0)
> foo(A[d--]);
> }
>
> the insn sequences to calculate A[I]'s address looks like:
>
> (insn 76 75 77 22 (set (reg/f:DI 109)
> (plus:DI (reg/f:DI 64 sfp)
> (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64}
> (expr_list:REG_DEAD (reg:DI 108 [ i ])
> (nil)))
> (insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ])
> (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109)
> (const_int -16 [0xfffffffffffffff0])) [0 A S1 A8]))) seb-pop.c:8 76
> {*zero_extendqisi2_aarch64}
> (expr_list:REG_DEAD (reg/f:DI 109)
> (nil)))
>
> while for most RISC archs, reg + reg addressing is typical, so if we
> re-shuffle
> the instruction sequences into the following:
>
> (insn 96 94 97 22 (set (reg/f:DI 129)
> (plus:DI (reg/f:DI 64 sfp)
> (const_int -16 [0xfffffffffffffff0]))) seb-pop.c:8 84 {*adddi3_aarch64}
> (nil))
> (insn 97 96 98 22 (set (reg:DI 130 [ i ])
> (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70
> {*extendsidi2_aarch64}
> (expr_list:REG_DEAD (reg/v:SI 97 [ i ])
> (nil)))
> (insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ])
> (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129)
> (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76
> {*zero_extendqisi2_aarch64}
> (expr_list:REG_DEAD (reg:DI 130 [ i ])
> (expr_list:REG_DEAD (reg/f:DI 129)
> (nil))))
>
> which means re-associate the constant imm with the virtual frame pointer.
>
> transform
>
> RA <- fixed_reg + RC
> RD <- MEM (RA + const_offset)
>
> into:
>
> RA <- fixed_reg + const_offset
> RD <- MEM (RA + RC)
>
> then RA <- fixed_reg + const_offset is actually loop invariant, so the later
> RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate
> what tree
> level ivopts could not sort out.
There is a LIM pass after gimple ivopts - if the invariantness is already
visible there why not handle it there similar to the special-cases in
rewrite_bittest and rewrite_reciprocal?
And of course similar tricks could be applied on the RTL level to
RTL invariant motion?
Thanks,
Richard.
> and this patch only tries to re-shuffle instructions within single basic
> block which
> is a inner loop which is perf critical.
>
> I am reusing the loop info in fwprop because there is loop info and it's run
> before
> GCSE.
>
> verified on aarch64 and mips64, the array base address hoisted out of loop.
>
> bootstrap ok on x86-64 and aarch64.
>
> comments?
>
> thanks.
>
> gcc/
> PR62173
> fwprop.c (prepare_for_gcse_pre): New function.
> (fwprop_done): Call it.