[PATCH] Fix PR90332 by extending half size vector mode
Richard Biener
richard.guenther@gmail.com
Wed Mar 18 10:39:03 GMT 2020
On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> As PR90332 shows, the current scalar epilogue peeling for gaps
> elimination requires expected vec_init optab with two half size
> vector mode. On Power, we don't support vector mode like V8QI,
> so can't support optab like vec_initv16qiv8qi. But we want to
> leverage existing scalar mode like DI to init the desirable
> vector mode. This patch is to extend the existing support for
> Power, as evaluated on Power9 we can see expected 1.9% speed up
> on SPEC2017 525.x264_r.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
>
> Is it ok for trunk?
There's already code exercising such a case in vectorizable_load
(VMAT_STRIDED_SLP) which you could have factored out.
vectype, bool slp,
than the alignment boundary B. Every vector access will
be a multiple of B and so we are guaranteed to access a
non-gap element in the same B-sized block. */
+ machine_mode half_mode;
if (overrun_p
&& gap < (vect_known_alignment_in_bytes (first_dr_info)
/ vect_get_scalar_dr_size (first_dr_info)))
- overrun_p = false;
-
+ {
+ overrun_p = false;
+ if (known_eq (nunits, (group_size - gap) * 2)
+ && known_eq (nunits, group_size)
+ && get_half_mode_for_vector (vectype, &half_mode))
+ DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
+ }
why do you need to amend this case?
I don't like storing DR_GROUP_HALF_MODE very much, later
you need a vector type and it looks cheap enough to recompute
it where you need it? Iff then it doesn't belong to DR_GROUP
but to the stmt-info.
I realize the original optimization was kind of a hack (and I was too
lazy to implement the integer mode construction path ...).
So, can you factor out the existing code into a function returning
the vector type for construction for a vector type and a
pieces size? So for V16QI and a pieces-size of 4 we'd
get either V16QI back (then construction from V4QI pieces
should work) or V4SI (then construction from SImode pieces
should work)? Eventually as secondary output provide that
piece type (SI / V4QI).
Thanks,
Richard.
> BR,
> Kewen
> -----------
>
> gcc/ChangeLog
>
> 2020-MM-DD Kewen Lin <linkw@gcc.gnu.org>
>
> PR tree-optimization/90332
> * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode field.
> (DR_GROUP_HALF_MODE): New macro.
> * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function.
> (get_group_load_store_type): Call get_half_mode_for_vector to query target
> whether support half size mode and update DR_GROUP_HALF_MODE if yes.
> (vectorizable_load): Build appropriate vector type based on
> DR_GROUP_HALF_MODE.
More information about the Gcc-patches
mailing list