This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Combine vectorized loops with its scalar remainder.


On Mon, Nov 23, 2015 at 4:52 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> Did you have a chance to look at this?

It's on my list - I'm still swamped with patches to review.

Richard.

> Thanks.
> Yuri.
>
> 2015-11-13 13:35 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Hi Richard,
>>
>> Here is updated version of the patch which 91) is in sync with trunk
>> compiler and (2) contains simple cost model to estimate profitability
>> of scalar epilogue elimination. The part related to vectorization of
>> loops with small trip count is in process of developing. Note that
>> implemented cost model was not tuned  well for HASWELL and KNL but we
>> got  ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL.
>>
>> 2015-11-10 17:52 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2015-11-10 15:30 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> It looks like misunderstanding - we assume that for GCCv6 the simple
>>>>>> scheme of remainder will be used through introducing new IV :
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>>>>>>
>>>>>> Is it true or we missed something?
>>>>>
>>>>> <quote>
>>>>>> > Do you have an idea how "masking" is better be organized to be usable
>>>>>> > for both 4b and 4c?
>>>>>>
>>>>>> Do 2a ...
>>>>> Okay.
>>>>> </quote>
>>>>
>>>> 2a was 'transform already vectorized loop as a separate
>>>> post-processing'. Isn't it what this prototype patch implements?
>>>> Current version only masks loop body which is in practice applicable
>>>> for AVX-512 only in the most cases.  With AVX-512 it's easier to see
>>>> how profitable masking might be and it is a main target for the first
>>>> masking version.  Extending it to prologues/epilogues and thus making
>>>> it more profitable for other targets is the next step and is out of
>>>> the scope of this patch.
>>>
>>> Ok, technically the prototype transforms the already vectorized loop.
>>> Of course I meant the vectorized loop be copied, masked and that
>>> result used as epilogue...
>>>
>>> I'll queue a more detailed look into the patch for this week.
>>>
>>> Did you perform any measurements with this patch like # of
>>> masked epilogues in SPEC 2006 FP (and any speedup?)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Ilya
>>>>
>>>>>
>>>>> Richard.
>>>>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]