This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Loop fusion.
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
- Cc: "Bin.Cheng" <amker dot cheng at gmail dot com>, Toon Moene <toon at moene dot org>, gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Mon, 23 Apr 2018 14:47:06 +0200
- Subject: Re: Loop fusion.
- References: <71dd38d0-2dbc-0100-7419-6f1ca1d5e077@moene.org> <CAHFci2--Z+=foKyXWMp9PqAmufuDvDE=HLk9pm8Bbu+5b2M91w@mail.gmail.com> <CAFiYyc3V18L0K=5C4hrUx44a=7rR56n9g2i8D9RitHO9pxu5zw@mail.gmail.com> <CAO9iq9EW5O0n52OGqSbbp2=ZWniy=LEMXMR-4t1yDALHwfhgiA@mail.gmail.com>
On Mon, Apr 23, 2018 at 2:31 PM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 2:02 PM, Richard Biener <richard.guenther@gmail.com>
> wrote:
>>
>> On Mon, Apr 23, 2018 at 12:59 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> > On Sun, Apr 22, 2018 at 3:27 PM, Toon Moene <toon@moene.org> wrote:
>> >> A few days ago there was a rant on the Fortran Standardization
>> >> Committee's
>> >> e-mail list about Fortran's "whole array arithmetic" being
>> >> unoptimizable.
>> >>
>> >> An example picked at random from our weather forecasting code:
>> >>
>> >> ZQICE(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YI%MP)
>> >> ZQLI(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YL%MP)
>> >> ZQRAIN(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YR%MP)
>> >> ZQSNOW(1:NPROMA,1:NFLEVG) = PGFL(1:NPROMA,1:NFLEVG,YS%MP)
>> >>
>> >> The reaction from one of the members of the committee (about "their"
>> >> compiler):
>> >>
>> >> 'And multiple consecutive array statements with the same shape are
>> >> “fused”
>> >> exactly so that the compiler can generate good cache use. This sort of
>> >> optimization is pretty low hanging fruit.'
>> >>
>> >> As far as I can see loop fusion as a stand-alone optimization is not
>> >> supported as yet, although some mention is made in the context of
>> >> graphite.
>> >>
>> >> Is this something that should be pursued ?
>> > Hi,
>> > I don't know the current status of fusion in graphite. As for
>> > traditional fusion transformation, I think it's not very difficult to
>> > be implemented along with existing distribution, actually, quite lot
>> > of code should be shared. What we do need are something like: more
>> > motivation cases, good/conservative cost model.
>>
>> Yes, I guess before distribution you want to do maximum fusion and then
>> apply (re-)distribution on the fused loop. The cost model should be the
>> very same for distribution/fusion.
>>
>> Richard.
>
>
>
> I recall Fujitsu bragging that the key to them getting good application
> performance (read: outside linpack) on the K computer is extensive use of
> loop FISSION + software pipelining. Though I guess sw-pipelining is only
> useful if you have lots of architectural registers, which disqualifies
> x86-64..
FISSION we can do quite well (though we lack a cost model here), that's
what loop distribution does.
Richard.
>
> --
> Janne Blomqvist