This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64


On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi all,
>
> In this testcase the codegen for VLA SVE is worse than it could be due to unrolling:
>
> fully_peel_me:
>          mov     x1, 5
>          ptrue   p1.d, all
>          whilelo p0.d, xzr, x1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
>          cntd    x2
>          addvl   x3, x0, #1
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0, #1, mul vl]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x3]
>          cntw    x2
>          incb    x0, all, mul #2
>          whilelo p0.d, x2, x1
>          beq     .L1
>          ld1d    z0.d, p0/z, [x0]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0]
> .L1:
>          ret
>
> In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count.
> For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size
> and hurts icache performance.
>
> This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration
> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some
> Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate
> the branches.
>
> So for the above testcase we generate now:
> fully_peel_me:
>          mov     x2, 5
>          mov     x3, x2
>          mov     x1, 0
>          whilelo p0.d, xzr, x2
>          ptrue   p1.d, all
> .L2:
>          ld1d    z0.d, p0/z, [x0, x1, lsl 3]
>          fadd    z0.d, z0.d, z0.d
>          st1d    z0.d, p0, [x0, x1, lsl 3]
>          incd    x1
>          whilelo p0.d, x1, x3
>          bne     .L2
>          ret
>
> Not perfect still, but it's preferable to the original code.
> The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged
> (until other target people experiment with it and set it, if appropriate).
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance.
>
> Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>         * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define.
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to
>         disable unrolling on unknown iteration count.
>         * config/aarch64/aarch64.c (aarch64_override_options_internal): Set
>         PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1.
>         * doc/invoke.texi (--param unroll-known-loop-iterations-only):
>         Document.
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>         * gcc.target/aarch64/sve/unroll-1.c: New test.
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]