This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64


Hi Richard
​
I apologise I should have given more explanation on my cover letter.​
Although the bug was filed for vectorization, the conversation on it talked​
about loops with two exits not being supported in the vectorizer and being not​
being possible without lto and peeling causing more harm than benefit.​
​
There was also no clear consensus among the discussion about the best way to do​
unrolling. So I looked at Wilco's suggestion of unrolling here​
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8​;
Although unroll_stupid does not exactly unroll it as he shows but​
it gets closer than unroll_runtime_iterations. So I ran an experiment​
to see if unrolliong the loop with unroll_stupid gets any benefit. The code​
size benefit was easy to see with the small example but it also gave performance​
benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds​
a switch case at the beginning for iteration check. This is less efficient​
because it creates too many branch close together specially for a loop which​
has more than 1 exit.​
        beq     .L70​
        cmp     x12, 1​
        beq     .L55​
        cmp     x12, 2​
        beq     .L57​
        cmp     x12, 3​
        beq     .L59​
        cmp     x12, 4​
        beq     .L61​
        cmp     x12, 5​
        beq     .L63​
        cmp     x12, 6​
        bne     .L72​
​
Finally I agree that unroll_stupid by default did not touch loops with multiple​
exists but that was marked as a "TODO" to change later so I assumed that check​
was not a hard requirement for the unrolling alghorithm.​
/* Do not unroll loops with branches inside -- it increases number​
     of mispredicts. ​
     TODO: this heuristic needs tunning; call inside the loop body​
     is also relatively good reason to not unroll.  */​
unroll_stupid is also not touched unless there is -funroll-all-loops or a​ loop
pragma incidcating that maybe this could be potentially harmful on certain targets.​
Since my experiments on AArch64 showed otherwise, I thought the easiest starting​
point would be to do this in a target hook and only for a specific case​
(multiple exits).​
​
Thanks​
Sudi







From: Richard Biener <richard.guenther@gmail.com>

Sent: Friday, November 15, 2019 9:32 AM

To: Sudakshina Das <Sudi.Das@arm.com>

Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; bin.cheng@linux.alibaba.com <bin.cheng@linux.alibaba.com>;
 ook@ucw.cz <ook@ucw.cz>

Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
 

On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das <Sudi.Das@arm.com> wrote:

>

> Hi

>

> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,

> loop unrolling is probably what we can do here. As an easy fix, the

> existing unroll_stupid is unrolling the given example better than the

> unroll_runtime_iterations since the the loop contains a break inside it.


Hm, the bug reference doesn't help me at all in reviewing this - the bug

is about vectorization.


So why is unroll_stupid better than unroll_runtime_iterations for a loop

with a break (or as your implementation, with multiple exists)?


I don't like this target hook, it seems like general heuristics can be

improved here, but it seems unroll-stupid doesn't consider loops

with multiple exits at all?


Richard.


> So all I have done here is:

> 1) Add a target hook so that this is AArch64 specific.

> 2) We are not unrolling the loops that decide_unroll_runtime_iterations

> would reject.

> 3) Out of the ones that decide_unroll_runtime_iterations would accept,

> check if the loop has more than 1 exit (this is done in the new target

> hook) and if it does, try to unroll using unroll_stupid.

>

> Regression tested on AArch64 and added the test from the PR. This gives

> an overall code size reduction of 2.35% and performance gain of 0.498%

> on Spec2017 Intrate.

>

> Is this ok for trunk?

>

> Thanks

> Sudi

>

> gcc/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

>

>         PR88398

>         * cfgloop.h: Include target.h.

>         (lpt_dec): Move to...

>         * target.h (lpt_dec): ... Here.

>         * target.def: Define TARGET_LOOP_DECISION_ADJUST.

>         * loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook.

>         (decide_unroll_stupid): Likewise.

>         * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function.

>         (TARGET_LOOP_DECISION_ADJUST): Define for AArch64.

>         * doc/tm.texi: Regenerated.

>         * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

>

> gcc/testsuite/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

>

>         PR88398

>         * gcc.target/aarch64/pr88398.c: New test.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]