This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

From: Sudakshina Das <Sudi dot Das at arm dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>, James Greenhalgh <James dot Greenhalgh at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, "bin dot cheng at linux dot alibaba dot com" <bin dot cheng at linux dot alibaba dot com>, "ook at ucw dot cz" <ook at ucw dot cz>
Date: Fri, 15 Nov 2019 12:07:46 +0000
Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JD59nChWJCHlD1otmVa476CKQMRhPLpk1FESjnxJOIc=; b=frOvV2276yELrGNdjZLjJIRmZaLQuU6OE5mg4tdALiAQfSO/u6og2HddLEX0B0HOkwDoqJNiGlBoU8hulNFrT/G7tr4sfe9KmkKeiTgVtUFzcXcL58tCQ2Yw97kn9C8cZScdYwvoevfgiOYfrBEji/fXH5nvoVpJU4yYyGYuA45EKxYJuwAciVGfa3BOCch8b9YlC6JNkQZFbN2sPO0dy1Wfey59c6jQ+cEWsMpMlkKhcxLewoiyRhKirkaPtYO8WWRjnNg0Z4HnfjtG7k0bdSW4zR5ea+5JDE+ijbbumhLMzMYqgKKoioJYOymzbz3QN0GYYl9lNrQzsvp+GmzNpg==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SwvUVVd1K0c0GDWtrAW9gtj1qeAyabkyV+vgoPwYNja24uFAbcRDLGNhN0SS7Ybb/tYpLofGr93PBu/5Bz9yBJCW58EqLitMGh6VuBnbQhUcx2t9LoTcy0QrCpG+bUC+bw1ellNQPV8JOUdDV/zMePdIxzXwdFxyUhOefVa25hrreY8/+Z54ZHZ1Ant0ho/6uGXz3D6IQM9pvqRUfgaYDZ9UlwgKgd4dBU1fUGDpH1HqFx8G6LMH761k1wjA8y76ohmxWATxwqSkxShVB1sirDB9S1IM75NbWv7QCfYIms8B/zW1ohYenZcy+2TMOe2P2N91fA74emUrOyv9NOjHIw==
Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Sudi dot Das at arm dot com;
References: <09fd5ee6-33cd-82ce-6d01-89c4b4a116a0@arm.com>,<CAFiYyc13z0AucQDFUC6o7FR3kpCPcPp4vvxjePzWewn-oX59Og@mail.gmail.com>

Hi Richard

I apologise I should have given more explanation on my cover letter.
Although the bug was filed for vectorization, the conversation on it talked
about loops with two exits not being supported in the vectorizer and being not
being possible without lto and peeling causing more harm than benefit.

There was also no clear consensus among the discussion about the best way to do
unrolling. So I looked at Wilco's suggestion of unrolling here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8;
Although unroll_stupid does not exactly unroll it as he shows but
it gets closer than unroll_runtime_iterations. So I ran an experiment
to see if unrolliong the loop with unroll_stupid gets any benefit. The code
size benefit was easy to see with the small example but it also gave performance
benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds
a switch case at the beginning for iteration check. This is less efficient
because it creates too many branch close together specially for a loop which
has more than 1 exit.
        beq     .L70
        cmp     x12, 1
        beq     .L55
        cmp     x12, 2
        beq     .L57
        cmp     x12, 3
        beq     .L59
        cmp     x12, 4
        beq     .L61
        cmp     x12, 5
        beq     .L63
        cmp     x12, 6
        bne     .L72

Finally I agree that unroll_stupid by default did not touch loops with multiple
exists but that was marked as a "TODO" to change later so I assumed that check
was not a hard requirement for the unrolling alghorithm.
/* Do not unroll loops with branches inside -- it increases number
     of mispredicts. 
     TODO: this heuristic needs tunning; call inside the loop body
     is also relatively good reason to not unroll.  */
unroll_stupid is also not touched unless there is -funroll-all-loops or a loop
pragma incidcating that maybe this could be potentially harmful on certain targets.
Since my experiments on AArch64 showed otherwise, I thought the easiest starting
point would be to do this in a target hook and only for a specific case
(multiple exits).

Thanks
Sudi

From: Richard Biener <richard.guenther@gmail.com>

Sent: Friday, November 15, 2019 9:32 AM

To: Sudakshina Das <Sudi.Das@arm.com>

Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; bin.cheng@linux.alibaba.com <bin.cheng@linux.alibaba.com>;
 ook@ucw.cz <ook@ucw.cz>

Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das <Sudi.Das@arm.com> wrote:

>

> Hi

>

> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,

> loop unrolling is probably what we can do here. As an easy fix, the

> existing unroll_stupid is unrolling the given example better than the

> unroll_runtime_iterations since the the loop contains a break inside it.

Hm, the bug reference doesn't help me at all in reviewing this - the bug

is about vectorization.

So why is unroll_stupid better than unroll_runtime_iterations for a loop

with a break (or as your implementation, with multiple exists)?

I don't like this target hook, it seems like general heuristics can be

improved here, but it seems unroll-stupid doesn't consider loops

with multiple exits at all?

Richard.

> So all I have done here is:

> 1) Add a target hook so that this is AArch64 specific.

> 2) We are not unrolling the loops that decide_unroll_runtime_iterations

> would reject.

> 3) Out of the ones that decide_unroll_runtime_iterations would accept,

> check if the loop has more than 1 exit (this is done in the new target

> hook) and if it does, try to unroll using unroll_stupid.

>

> Regression tested on AArch64 and added the test from the PR. This gives

> an overall code size reduction of 2.35% and performance gain of 0.498%

> on Spec2017 Intrate.

>

> Is this ok for trunk?

>

> Thanks

> Sudi

>

> gcc/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

>

>         PR88398

>         * cfgloop.h: Include target.h.

>         (lpt_dec): Move to...

>         * target.h (lpt_dec): ... Here.

>         * target.def: Define TARGET_LOOP_DECISION_ADJUST.

>         * loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook.

>         (decide_unroll_stupid): Likewise.

>         * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function.

>         (TARGET_LOOP_DECISION_ADJUST): Define for AArch64.

>         * doc/tm.texi: Regenerated.

>         * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

>

> gcc/testsuite/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  <sudi.das@arm.com>

>

>         PR88398

>         * gcc.target/aarch64/pr88398.c: New test.

Follow-Ups:
- Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
  - From: Richard Biener

References:
- [PATCH, GCC, AArch64] Fix PR88398 for AArch64
  - From: Sudakshina Das
- Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]