This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
Hi,
Merged with current master the patch passes bootstrap and is giving
expected gains.
Patch and new tests are attached.
ChangeLog:
2014-04-18 Evgeny Stupachenko <evstupac@gmail.com>
* tree-vect-data-refs.c (vect_grouped_store_supported): New
check for stores group of length 3.
(vect_permute_store_chain): New permutations for stores group of
length 3.
(vect_grouped_load_supported): New check for loads group of length 3.
(vect_permute_load_chain): New permutations for loads group of length 3.
* tree-vect-stmts.c (vect_model_store_cost): Change cost
of vec_perm_shuffle for the new permutations.
(vect_model_load_cost): Ditto.
ChangeLog for testsuite:
2014-04-18 Evgeny Stupachenko <evstupac@gmail.com>
PR tree-optimization/52252
* gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
* gcc.dg/vect/pr52252-st.c: Test on stores group of size 3.
Evgeny
On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Missed attachment.
>
> On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> I've separated the patch into 2: cost model tuning and load/store
>> groups parallelism.
>> SLM tuning was partially introduced in the patch:
>> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
>> The patch introducing vectorization for load/store groups of size 3 attached.
>>
>> Is it ok for stage1?
>>
>> ChangeLog:
>>
>> 2014-03-06 Evgeny Stupachenko <evstupac@gmail.com>
>>
>> * tree-vect-data-refs.c (vect_grouped_store_supported): New
>> check for stores group of length 3.
>> (vect_permute_store_chain): New permutations for stores group of
>> length 3.
>> (vect_grouped_load_supported): New check for loads group of length 3.
>> (vect_permute_load_chain): New permutations for loads group of length 3.
>> * tree-vect-stmts.c (vect_model_store_cost): Change cost
>> of vec_perm_shuffle for the new permutations.
>> (vect_model_load_cost): Ditto.
>>
>>
>>
>> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguenther@suse.de> wrote:
>>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>
>>>> Missed patch attached in plain-text.
>>>>
>>>> I have copyright assignment on file with the FSF covering work on GCC.
>>>>
>>>> Load/stores groups of length 3 is the most frequent non-power-of-2
>>>> case. It is used in RGB image processing (like test case in PR52252).
>>>> For sure we can extend the patch to length 5 and more. However, this
>>>> potentially affect performance on some other architectures and
>>>> requires larger testing. So length 3 it is just first step.The
>>>> algorithm in the patch could be modified for a general case in several
>>>> steps.
>>>>
>>>> I understand that the patch should wait for the stage 1, however since
>>>> its ready we can discuss it right now and make some changes (like
>>>> general size of group).
>>>
>>> Other than that I'd like to see a vectorizer hook querying the cost of a
>>> vec_perm_const expansion instead of adding vec_perm_shuffle
>>> (thus requires the constant shuffle mask to be passed as well
>>> as the vector type). That's more useful for other uses that
>>> would require (arbitrary) shuffles.
>>>
>>> Didn't look at the rest of the patch yet - queued in my review
>>> pipeline.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Evgeny
>>>>
>>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguenther@suse.de> wrote:
>>>> >
>>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>> >
>>>> > > Hi,
>>>> > >
>>>> > > The patch gives an expected 3 times gain for the test case in the PR52252
>>>> > > (and even 6 times for AVX2).
>>>> > > It passes make check and bootstrap on x86.
>>>> > > spec2000/spec2006 got no regressions/gains on x86.
>>>> > >
>>>> > > Is this patch ok?
>>>> >
>>>> > I've worked on generalizing the permutation support in the light
>>>> > of the availability of the generic shuffle support in the IL
>>>> > but hit some road-blocks in the way code-generation works for
>>>> > group loads with permutations (I don't remember if I posted all patches).
>>>> >
>>>> > This patch seems to be to a slightly different place but it again
>>>> > special-cases a specific permutation. Why's that? Why can't we
>>>> > support groups of size 7 for example? So - can this be generalized
>>>> > to support arbitrary non-power-of-two load/store groups?
>>>> >
>>>> > Other than that the patch has to wait for stage1 to open again,
>>>> > of course. And it misses a testcase.
>>>> >
>>>> > Btw, do you have a copyright assignment on file with the FSF covering
>>>> > work on GCC?
>>>> >
>>>> > Thanks,
>>>> > Richard.
>>>> >
>>>> > > ChangeLog:
>>>> > >
>>>> > > 2014-02-11 Evgeny Stupachenko <evstupac@gmail.com>
>>>> > >
>>>> > > * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle.
>>>> > > * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>>> > > check for stores group of length 3.
>>>> > > (vect_permute_store_chain): New permutations for stores group of
>>>> > > length 3.
>>>> > > (vect_grouped_load_supported): New check for loads group of length
>>>> > > 3.
>>>> > > (vect_permute_load_chain): New permutations for loads group of
>>>> > > length 3.
>>>> > > * tree-vect-stmts.c (vect_model_store_cost): New cost
>>>> > > vec_perm_shuffle
>>>> > > for the new permutations.
>>>> > > (vect_model_load_cost): Ditto.
>>>> > > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>>>> > > vec_perm_shuffle cost as equvivalent of vec_perm cost.
>>>> > > * config/arm/arm.c: Ditto.
>>>> > > * config/rs6000/rs6000.c: Ditto.
>>>> > > * config/spu/spu.c: Ditto.
>>>> > > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
>>>> > > byte
>>>> > > shuffle on some x86 architectures.
>>>> > > * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>>>> > > * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>>>> > > (ix86_builtin_vectorization_cost): Adding cost for the new
>>>> > > permutations.
>>>> > > Fixing cost for other permutations.
>>>> > > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>>>> > > slow (TARGET_SLOW_PHUFFB).
>>>> > > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>>>> > > Adding new shuffle cost only when byte shuffle is expected.
>>>> > > Fixing cost model for Silvermont.
>>>> > >
>>>> > > Thanks,
>>>> > > Evgeny
>>>> > >
>>>> >
>>>> > --
>>>> > Richard Biener <rguenther@suse.de>
>>>> > SUSE / SUSE Labs
>>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>>>>
>>>
>>> --
>>> Richard Biener <rguenther@suse.de>
>>> SUSE / SUSE Labs
>>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
Attachment:
vect3.patch
Description: Binary data
Attachment:
vect3_tests.patch
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |