This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: SPEC 456.hmmer vectorization question

From: Richard Biener <richard dot guenther at gmail dot com>
To: Steve Ellcey <sellcey at caviumnetworks dot com>
Cc: Michael Matz <matz at suse dot de>, GCC Development <gcc at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>
Date: Thu, 9 Mar 2017 09:02:38 +0100
Subject: Re: SPEC 456.hmmer vectorization question
Authentication-results: sourceware.org; auth=none
References: <201703062237.v26MbW5e008866@sellcey-dt.caveonetworks.com> <alpine.LSU.2.20.1703071423440.13579@wotan.suse.de> <1489002090.22552.19.camel@caviumnetworks.com>

On Wed, Mar 8, 2017 at 8:41 PM, Steve Ellcey <sellcey@caviumnetworks.com> wrote:
> On Tue, 2017-03-07 at 14:45 +0100, Michael Matz wrote:
>> Hi Steve,
>>
>> On Mon, 6 Mar 2017, Steve Ellcey wrote:
>>
>> >
>> > I was looking at the spec 456.hmmer benchmark and this email string
>> > from Jeff Law and Micheal Matz:
>> >
>> >   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01970.html
>> >
>> > and was wondering if anyone was looking at what more it would take
>> > for GCC to vectorize the loop in P7Viterbi.
>
>> It takes what I wrote in there.  There are two important things that need
>> to happen to get the best performance (at least from an analysis I did in
>> 2011, but nothing material should have changed since then):
>
> I guess I was hoping that some progress had been made since then, but
> it sounds like it hasn't.
>
>> (1) loop distribution to make some memory streams vectorizable (and leave
>>     the others in non-vectorized form).
>> (1a) loop splitting based on conditional (to remove the k
>> (2) a predictive commoning (or loop carried store reuse) on the dc[]
>>     stream
>>
>> None of these is valid if the loop streams can't be disambiguated, and as
>> this is C only adding explicit restrict qualifiers would give you that, or
>> runtime disambiguation, like ICC is doing, that's part (0).
>
> So it sounds like the loop would have to be split up using runtime
> disambiguation before we could do any of the optimizations.  Would that
> check and split be something that could or should be done using the
> graphite framework or would it be a seperate pass done before the
> graphite phase is called?  I am not sure how one would determine what
> loops would be worth splitting and which ones would not during such a
> phase.

It would need to be done before graphite, and yes, the question is when
to do this (given the non-trival text size and runtime cost).  One option is
to do sth similar like we do with IFN_LOOP_VECTORIZED, that is, after
followup transforms decide whether the specialized version received any
important optimization.  Another option is to add value profile counters
for aliasing and only do this with FDO when we know at runtime there
is no aliasing.

Richard.

> Steve Ellcey
> sellcey@cavium.com

Follow-Ups:
- Re: SPEC 456.hmmer vectorization question
  - From: Jakub Jelinek

References:
- SPEC 456.hmmer vectorization question
  - From: Steve Ellcey
- Re: SPEC 456.hmmer vectorization question
  - From: Michael Matz
- Re: SPEC 456.hmmer vectorization question
  - From: Steve Ellcey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]