This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)


Hi Alan,

The cost modeling of the epilogue code seems pretty target-specific ("An
EQ stmt and an AND stmt, reduction of the max index and a reduction of
the found values, a broadcast of the max value," resulting in two
vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
this will not represent the cost accurately, and the cost will indeed be
quite different depending on the mode (logarithmic in the number of
elements).  I think that you need to create a new entry in
vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
each target to calculate the cost appropriately.

(Powerpc doesn't have a max-reduction hardware instruction, but because
the reduction will be only in the epilogue code, it may still be
profitable for us to generate the somewhat expensive reduction sequence
in order to vectorize the loop.  But we definitely want to model it as
costly in and of itself.  Also, the sequence will produce the maximum
value in all positions without a separate broadcast.)

Thanks,
Bill

On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
> 
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = a[i];
>   return last;
> }
> 
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
> 
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
> 
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
> 
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
> 
> 
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = i;
>   return last;
> }
> In this case a lot of the new code can be optimized away.
> 
> I have run check for aarch64, arm and x86 and have seen no regressions.
> 
> 
> Changelog:
> 
>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
> 
>         PR tree-optimization/65947
>         * tree-vect-loop.c
>         (vect_is_simple_reduction_1): Find condition reductions.
>         (vect_model_reduction_cost): Add condition reduction costs.
>         (get_initial_def_for_reduction): Add condition reduction initial
> var.
>         (vect_create_epilog_for_reduction): Add condition reduction epilog.
>         (vectorizable_reduction): Condition reduction support.
>         * tree-vect-stmts.c
>         (vectorizable_condition): Add vect reduction arg
>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>         vect_max_reduc
> 
>     testsuite/Changelog:
> 
>         PR tree-optimization/65947
>         * lib/target-supports.exp
>         (check_effective_target_vect_max_reduc): Add.
>         * gcc.dg/vect/pr65947-1.c: New test.
>         * gcc.dg/vect/pr65947-2.c: New test.
>         * gcc.dg/vect/pr65947-3.c: New test.
>         * gcc.dg/vect/pr65947-4.c: New test.
>         * gcc.dg/vect/pr65947-5.c: New test.
>         * gcc.dg/vect/pr65947-6.c: New test.
>         * gcc.dg/vect/pr65947-7.c: New test.
>         * gcc.dg/vect/pr65947-8.c: New test.
>         * gcc.dg/vect/pr65947-9.c: New test.
>         * gcc.dg/vect/pr65947-10.c: New test.
>         * gcc.dg/vect/pr65947-11.c: New test.
> 
> 
> 
> Thanks,
> Alan
> 
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]