This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Loop fusion.

From: Richard Biener <richard dot guenther at gmail dot com>
To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
Cc: "Bin.Cheng" <amker dot cheng at gmail dot com>, Toon Moene <toon at moene dot org>, gcc mailing list <gcc at gcc dot gnu dot org>
Date: Mon, 23 Apr 2018 14:47:06 +0200
Subject: Re: Loop fusion.
References: <71dd38d0-2dbc-0100-7419-6f1ca1d5e077@moene.org> <CAHFci2--Z+=foKyXWMp9PqAmufuDvDE=HLk9pm8Bbu+5b2M91w@mail.gmail.com> <CAFiYyc3V18L0K=5C4hrUx44a=7rR56n9g2i8D9RitHO9pxu5zw@mail.gmail.com> <CAO9iq9EW5O0n52OGqSbbp2=ZWniy=LEMXMR-4t1yDALHwfhgiA@mail.gmail.com>

On Mon, Apr 23, 2018 at 2:31 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guenther@gmail.com>
> wrote:
>>
>> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <toon@moene.org> wrote:
>> >> A few days ago there was a rant on the Fortran Standardization
>> >> Committee's
>> >> e-mail list about Fortran's "whole array arithmetic" being
>> >> unoptimizable.
>> >>
>> >> An example picked at random from our weather forecasting code:
>> >>
>> >>     ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
>> >>     ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
>> >>     ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
>> >>     ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>> >>
>> >> The reaction from one of the members of the committee (about "their"
>> >> compiler):
>> >>
>> >> 'And multiple consecutive array statements with the same shape are
>> >> “fused”
>> >> exactly so that the compiler can generate good cache use. This sort of
>> >> optimization is pretty low hanging fruit.'
>> >>
>> >> As far as I can see loop fusion as a stand-alone optimization is not
>> >> supported as yet, although some mention is made in the context of
>> >> graphite.
>> >>
>> >> Is this something that should be pursued ?
>> > Hi,
>> > I don't know the current status of fusion in graphite.  As for
>> > traditional fusion transformation, I think it's not very difficult to
>> > be implemented along with existing distribution, actually, quite lot
>> > of code should be shared.  What we do need are something like: more
>> > motivation cases, good/conservative cost model.
>>
>> Yes, I guess before distribution you want to do maximum fusion and then
>> apply (re-)distribution on the fused loop.  The cost model should be the
>> very same for distribution/fusion.
>>
>> Richard.
>
>
>
> I recall Fujitsu bragging that the key to them getting good application
> performance (read: outside linpack) on the K computer is extensive use of
> loop FISSION + software pipelining. Though I guess sw-pipelining is only
> useful if you have lots of architectural registers, which disqualifies
> x86-64..

FISSION we can do quite well (though we lack a cost model here), that's
what loop distribution does.

Richard.

>
> --
> Janne Blomqvist

References:
- Loop fusion.
  - From: Toon Moene
- Re: Loop fusion.
  - From: Bin.Cheng
- Re: Loop fusion.
  - From: Richard Biener
- Re: Loop fusion.
  - From: Janne Blomqvist

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]