This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: kyrylo dot tkachov at foss dot arm dot com
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <marcus dot shawcroft at arm dot com>, Richard Earnshaw <richard dot earnshaw at arm dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>, Richard Sandiford <richard dot sandiford at arm dot com>
- Date: Fri, 9 Nov 2018 13:18:54 +0100
- Subject: Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64
- References: <5BE565CE.5000709@foss.arm.com>
On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi all,
>
> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling:
>
> fully_peel_me:
> mov x1, 5
> ptrue p1.d, all
> whilelo p0.d, xzr, x1
> ld1d z0.d, p0/z, [x0]
> fadd z0.d, z0.d, z0.d
> st1d z0.d, p0, [x0]
> cntd x2
> addvl x3, x0, #1
> whilelo p0.d, x2, x1
> beq .L1
> ld1d z0.d, p0/z, [x0, #1, mul vl]
> fadd z0.d, z0.d, z0.d
> st1d z0.d, p0, [x3]
> cntw x2
> incb x0, all, mul #2
> whilelo p0.d, x2, x1
> beq .L1
> ld1d z0.d, p0/z, [x0]
> fadd z0.d, z0.d, z0.d
> st1d z0.d, p0, [x0]
> .L1:
> ret
>
> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count.
> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size
> and hurts icache performance.
>
> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration
> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some
> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate
> the branches.
>
> So for the above testcase we generate now:
> fully_peel_me:
> mov x2, 5
> mov x3, x2
> mov x1, 0
> whilelo p0.d, xzr, x2
> ptrue p1.d, all
> .L2:
> ld1d z0.d, p0/z, [x0, x1, lsl 3]
> fadd z0.d, z0.d, z0.d
> st1d z0.d, p0, [x0, x1, lsl 3]
> incd x1
> whilelo p0.d, x1, x3
> bne .L2
> ret
>
> Not perfect still, but it's preferable to the original code.
> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged
> (until other target people experiment with it and set it, if appropriate).
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance.
>
> Ok for trunk?
Hum. Why introduce a new --param and not simply key on
flag_peel_loops instead? That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.
It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.
Richard.
> Thanks,
> Kyrill
>
>
> 2018-11-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
> disable unrolling on unknown iteration count.
> * config/aarch64/aarch64.c (aarch64_override_options_internal): Set
> PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
> * doc/invoke.texi (--param unroll-known-loop-iterations-only):
> Document.
>
> 2018-11-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/sve/unroll-1.c: New test.
>