This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
- From: Evgeny Stupachenko <evstupac at gmail dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>, Uros Bizjak <ubizjak at gmail dot com>
- Date: Wed, 30 Apr 2014 18:31:45 +0400
- Subject: Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xxEQ3tm+fwL5EfVSOUKDQnaBt+jTz4huK66T_8+TXzzfQ at mail dot gmail dot com> <alpine dot LSU dot 2 dot 11 dot 1402111351160 dot 1593 at zhemvz dot fhfr dot qr> <CAOvf_xw5iRcDyrQ3isi1d0HCdG8Hb9HV6OwjSctTLyGEq7e_zQ at mail dot gmail dot com> <alpine dot LSU dot 2 dot 11 dot 1402111614400 dot 1593 at zhemvz dot fhfr dot qr> <CAOvf_xy1sASGm-GxFAGemU48U+xkJZn_u6+R4OwE2+=r0eNKFQ at mail dot gmail dot com> <CAOvf_xwm5xa=LFeHOc_XPG7DwF6jzY42GN5u36Rp=mU71P9_3A at mail dot gmail dot com> <CAOvf_xwFX6dG9ypKBYGyF18MBKapdoOEPstk1xxrFc9KXGvVKw at mail dot gmail dot com>
Ping.
On Fri, Apr 18, 2014 at 2:05 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Hi,
>
> Merged with current master the patch passes bootstrap and is giving
> expected gains.
> Patch and new tests are attached.
>
> ChangeLog:
>
> 2014-04-18 Evgeny Stupachenko <evstupac@gmail.com>
>
> * tree-vect-data-refs.c (vect_grouped_store_supported): New
> check for stores group of length 3.
> (vect_permute_store_chain): New permutations for stores group of
> length 3.
> (vect_grouped_load_supported): New check for loads group of length 3.
> (vect_permute_load_chain): New permutations for loads group of length 3.
> * tree-vect-stmts.c (vect_model_store_cost): Change cost
> of vec_perm_shuffle for the new permutations.
> (vect_model_load_cost): Ditto.
>
> ChangeLog for testsuite:
>
> 2014-04-18 Evgeny Stupachenko <evstupac@gmail.com>
>
> PR tree-optimization/52252
> * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
> * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3.
>
> Evgeny
>
> On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Missed attachment.
>>
>> On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> I've separated the patch into 2: cost model tuning and load/store
>>> groups parallelism.
>>> SLM tuning was partially introduced in the patch:
>>> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
>>> The patch introducing vectorization for load/store groups of size 3 attached.
>>>
>>> Is it ok for stage1?
>>>
>>> ChangeLog:
>>>
>>> 2014-03-06 Evgeny Stupachenko <evstupac@gmail.com>
>>>
>>> * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>> check for stores group of length 3.
>>> (vect_permute_store_chain): New permutations for stores group of
>>> length 3.
>>> (vect_grouped_load_supported): New check for loads group of length 3.
>>> (vect_permute_load_chain): New permutations for loads group of length 3.
>>> * tree-vect-stmts.c (vect_model_store_cost): Change cost
>>> of vec_perm_shuffle for the new permutations.
>>> (vect_model_load_cost): Ditto.
>>>
>>>
>>>
>>> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguenther@suse.de> wrote:
>>>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>>
>>>>> Missed patch attached in plain-text.
>>>>>
>>>>> I have copyright assignment on file with the FSF covering work on GCC.
>>>>>
>>>>> Load/stores groups of length 3 is the most frequent non-power-of-2
>>>>> case. It is used in RGB image processing (like test case in PR52252).
>>>>> For sure we can extend the patch to length 5 and more. However, this
>>>>> potentially affect performance on some other architectures and
>>>>> requires larger testing. So length 3 it is just first step.The
>>>>> algorithm in the patch could be modified for a general case in several
>>>>> steps.
>>>>>
>>>>> I understand that the patch should wait for the stage 1, however since
>>>>> its ready we can discuss it right now and make some changes (like
>>>>> general size of group).
>>>>
>>>> Other than that I'd like to see a vectorizer hook querying the cost of a
>>>> vec_perm_const expansion instead of adding vec_perm_shuffle
>>>> (thus requires the constant shuffle mask to be passed as well
>>>> as the vector type). That's more useful for other uses that
>>>> would require (arbitrary) shuffles.
>>>>
>>>> Didn't look at the rest of the patch yet - queued in my review
>>>> pipeline.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Evgeny
>>>>>
>>>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>> >
>>>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>>> >
>>>>> > > Hi,
>>>>> > >
>>>>> > > The patch gives an expected 3 times gain for the test case in the PR52252
>>>>> > > (and even 6 times for AVX2).
>>>>> > > It passes make check and bootstrap on x86.
>>>>> > > spec2000/spec2006 got no regressions/gains on x86.
>>>>> > >
>>>>> > > Is this patch ok?
>>>>> >
>>>>> > I've worked on generalizing the permutation support in the light
>>>>> > of the availability of the generic shuffle support in the IL
>>>>> > but hit some road-blocks in the way code-generation works for
>>>>> > group loads with permutations (I don't remember if I posted all patches).
>>>>> >
>>>>> > This patch seems to be to a slightly different place but it again
>>>>> > special-cases a specific permutation. Why's that? Why can't we
>>>>> > support groups of size 7 for example? So - can this be generalized
>>>>> > to support arbitrary non-power-of-two load/store groups?
>>>>> >
>>>>> > Other than that the patch has to wait for stage1 to open again,
>>>>> > of course. And it misses a testcase.
>>>>> >
>>>>> > Btw, do you have a copyright assignment on file with the FSF covering
>>>>> > work on GCC?
>>>>> >
>>>>> > Thanks,
>>>>> > Richard.
>>>>> >
>>>>> > > ChangeLog:
>>>>> > >
>>>>> > > 2014-02-11 Evgeny Stupachenko <evstupac@gmail.com>
>>>>> > >
>>>>> > > * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle.
>>>>> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>>>> > > check for stores group of length 3.
>>>>> > > (vect_permute_store_chain): New permutations for stores group of
>>>>> > > length 3.
>>>>> > > (vect_grouped_load_supported): New check for loads group of length
>>>>> > > 3.
>>>>> > > (vect_permute_load_chain): New permutations for loads group of
>>>>> > > length 3.
>>>>> > > * tree-vect-stmts.c (vect_model_store_cost): New cost
>>>>> > > vec_perm_shuffle
>>>>> > > for the new permutations.
>>>>> > > (vect_model_load_cost): Ditto.
>>>>> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>>>>> > > vec_perm_shuffle cost as equvivalent of vec_perm cost.
>>>>> > > * config/arm/arm.c: Ditto.
>>>>> > > * config/rs6000/rs6000.c: Ditto.
>>>>> > > * config/spu/spu.c: Ditto.
>>>>> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
>>>>> > > byte
>>>>> > > shuffle on some x86 architectures.
>>>>> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>>>>> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>>>>> > > (ix86_builtin_vectorization_cost): Adding cost for the new
>>>>> > > permutations.
>>>>> > > Fixing cost for other permutations.
>>>>> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>>>>> > > slow (TARGET_SLOW_PHUFFB).
>>>>> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>>>>> > > Adding new shuffle cost only when byte shuffle is expected.
>>>>> > > Fixing cost model for Silvermont.
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Evgeny
>>>>> > >
>>>>> >
>>>>> > --
>>>>> > Richard Biener <rguenther@suse.de>
>>>>> > SUSE / SUSE Labs
>>>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>>>>>
>>>>
>>>> --
>>>> Richard Biener <rguenther@suse.de>
>>>> SUSE / SUSE Labs
>>>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer