This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
- From: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>
- To: Evgeny Stupachenko <evstupac at gmail dot com>
- Cc: Richard Biener <rguenther at suse dot de>, Uros Bizjak <ubizjak at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Thu, 05 Jun 2014 12:54:35 +0100
- Subject: Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
- Authentication-results: sourceware.org; auth=none
- References: <CAOvf_xz4y6u9-YZCdTM8j3Awm7pdARvyb-58=obT+U9Tkt0HNg at mail dot gmail dot com> <CAJA7tRb4qV7PCbYSQzkFRnP4TkqqvZiA4nmCmopCzCCvDs-THw at mail dot gmail dot com> <CAOvf_xzj6=MkCPnLvVuQbRh1B_7LaHuNaSuZAHgAZQrX=+h59Q at mail dot gmail dot com>
On 06/05/14 12:43, Evgeny Stupachenko wrote:
New hook is related to vector instructions only. Vector instructions
could be sequential in pipeline, but scalar - parallel. For x86
architectures TARGET_SCHED_REASSOC_WIDTH does not give required
General hooks could be potentially reused in other algorithms/by other
It already takes a "mode" argument. Couldn't you use a vector mode to
work this out ?
If it is not enough then please be more specific about the documentation
of this hook about where it is useful so that it's easy for people
reading the documentation to understand at a glance what purpose it serves.
On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan
On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko <email@example.com> wrote:
The patch introduces alternative way of permutations for load groups
of size 2 and 3 which should be faster on architectures with low
The patch gives 2 times gain on Silvermont to the test from PR52252
(in addition to already committed 3 times gain).
Patch passes bootstrap on x86. Make check is in progress.
Why do we need a new hook ? Can't you derive this information from
something which is equally badly named TARGET_SCHED_REASSOC_WIDTH
though used in the reassociation logic but also serves a similar
Also the documentation of this hook is incomplete at best and wrong at
worst as this is not applied everywhere in the vectorizer but just for
this special case for load store permuting. Implying this is useful
everywhere in the vectorizer does not appear to be correct.
2014-05-28 Evgeny Stupachenko <firstname.lastname@example.org>
* config/i386/i386.c (ix86_have_vector_parallel_execution): New.
* config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
* config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
* target.def (have_vector_parallel_execution): New.
* doc/tm.texi.in (have_vector_parallel_execution)): New.
* doc/tm.texi: Regenerate.
* targhooks.c (default_have_vector_parallel_execution): New.
* tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
Introduces alternative way of loads group permutaions.
(vect_transform_grouped_load): Try alternative way of permutaions.