This is the mail archive of the
mailing list for the GCC project.
Re: SPEC 456.hmmer vectorization question
On Thu, Mar 9, 2017 at 9:12 AM, Jakub Jelinek <firstname.lastname@example.org> wrote:
> On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote:
>> It would need to be done before graphite, and yes, the question is when
>> to do this (given the non-trival text size and runtime cost). One option is
>> to do sth similar like we do with IFN_LOOP_VECTORIZED, that is, after
>> followup transforms decide whether the specialized version received any
>> important optimization. Another option is to add value profile counters
>> for aliasing and only do this with FDO when we know at runtime there
>> is no aliasing.
> It doesn't have to be either/or. If we have FDO, we can do it
> unconditionally if we have gathered into that there is likely no aliasing,
> and optimize the other loop (for the case of aliasing) for size.
> If we don't have FDO, we could do the IFN_LOOP_VERSIONED way.
> For IFN_LOOP_VERSIONED, if we check all aliasing cases we could then either
> use the OpenMP/Cilk/ivdep pragma loop properties (loop->safelen etc.),
> or even have something stronger (that would say that there aren't
> any inter-iteration memory dependencies).
We can use MR_DEPENDENCE_* to partition the dependences properly
For loop distribution we can also check profitability before adding any
dependence related edges and version according to them. Of course
that needs a meaningful cost model...
Similarly you can run the ISL optimizer as if there were no dependences
and compare the resulting code to the original one with a cost model.
This is what the vectorizer does before doing the versioning. For enablement
transforms cost modeling is of course hard unless you can chain analysis
parts of multiple passes (basically integrate loop passes into "one").
Of course this breaks down once you consider not disambiguating all
unknown dependences but only a few (in case the transform can still
handle some of those cases - the vectorizer for example cannot deal
with any unknown dependences). (breaks down in complexity)