[Vectorizer] Add SLP support for masked loads

Alejandro Martinez Vicente Alejandro.MartinezVicente@arm.com
Thu Jan 17 09:14:00 GMT 2019


> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 17 January 2019 07:53
> To: Alejandro Martinez Vicente <Alejandro.MartinezVicente@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Richard
> Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [Vectorizer] Add SLP support for masked loads
> 
> On Wed, Jan 16, 2019 at 2:37 PM Alejandro Martinez Vicente
> <Alejandro.MartinezVicente@arm.com> wrote:
> >
> > Hi,
> >
> > Current vectorizer doesn't support masked loads for SLP. We should add
> > that, to allow things like:
> >
> > void
> > f (int *restrict x, int *restrict y, int *restrict z, int n) {
> >   for (int i = 0; i < n; i += 2)
> >     {
> >       x[i] = y[i] ? z[i] : 1;
> >       x[i + 1] = y[i + 1] ? z[i + 1] : 2;
> >     }
> > }
> >
> > to be vectorized using contiguous loads rather than LD2 and ST2.
> >
> > This patch was motivated by SVE, but it is completely generic and
> > should apply to any architecture with masked loads.
> >
> > After the patch is applied, the above code generates this output
> > (-march=armv8.2-a+sve -O2 -ftree-vectorize):
> >
> > 0000000000000000 <f>:
> >    0:   7100007f        cmp     w3, #0x0
> >    4:   540002cd        b.le    5c <f+0x5c>
> >    8:   51000464        sub     w4, w3, #0x1
> >    c:   d2800003        mov     x3, #0x0                        // #0
> >   10:   90000005        adrp    x5, 0 <f>
> >   14:   25d8e3e0        ptrue   p0.d
> >   18:   53017c84        lsr     w4, w4, #1
> >   1c:   910000a5        add     x5, x5, #0x0
> >   20:   11000484        add     w4, w4, #0x1
> >   24:   85c0e0a1        ld1rd   {z1.d}, p0/z, [x5]
> >   28:   2598e3e3        ptrue   p3.s
> >   2c:   d37ff884        lsl     x4, x4, #1
> >   30:   25a41fe2        whilelo p2.s, xzr, x4
> >   34:   d503201f        nop
> >   38:   a5434820        ld1w    {z0.s}, p2/z, [x1, x3, lsl #2]
> >   3c:   25808c11        cmpne   p1.s, p3/z, z0.s, #0
> >   40:   25808810        cmpne   p0.s, p2/z, z0.s, #0
> >   44:   a5434040        ld1w    {z0.s}, p0/z, [x2, x3, lsl #2]
> >   48:   05a1c400        sel     z0.s, p1, z0.s, z1.s
> >   4c:   e5434800        st1w    {z0.s}, p2, [x0, x3, lsl #2]
> >   50:   04b0e3e3        incw    x3
> >   54:   25a41c62        whilelo p2.s, x3, x4
> >   58:   54ffff01        b.ne    38 <f+0x38>  // b.any
> >   5c:   d65f03c0        ret
> >
> >
> > I tested this patch in an aarch64 machine bootstrapping the compiler
> > and running the checks.
> 
> Thanks for implementing this - note this is stage1 material and I will have a
> look when time allows unless Richard beats me to it.
> 


I agree, this is for GCC 10. I'll ping you guys when we're at stage1.

> It might be interesting to note that "non-SLP" code paths are likely to go
> away in GCC 10 to streamline the vectorizer and make further changes easier
> (so you'll see group_size == 1 SLP instances).
> 

Cool, thanks for the heads up.

Alejandro

> There are quite a few other cases missing SLP handling.
> 
> Richard.
> 
> > Alejandro
> >
> > gcc/Changelog:
> >
> > 2019-01-16  Alejandro Martinez  <alejandro.martinezvicente@arm.com>
> >
> >         * config/aarch64/aarch64-sve.md (copysign<mode>3): New
> define_expand.
> >         (xorsign<mode>3): Likewise.
> >         internal-fn.c: Marked mask_load_direct and mask_store_direct as
> >         vectorizable.
> >         tree-data-ref.c (data_ref_compare_tree): Fixed comment typo.
> >         tree-vect-data-refs.c (can_group_stmts_p): Allow masked loads to be
> >         combined even if masks different.
> >         (slp_vect_only_p): New function to detect masked loads that are only
> >         vectorizable using SLP.
> >         (vect_analyze_data_ref_accesses): Mark SLP only vectorizable groups.
> >         tree-vect-loop.c (vect_dissolve_slp_only_groups): New function to
> >         dissolve SLP-only vectorizable groups when SLP has been discarded.
> >         (vect_analyze_loop_2): Call vect_dissolve_slp_only_groups when
> needed.
> >         tree-vect-slp.c (vect_get_and_check_slp_defs): Check masked loads
> >         masks.
> >         (vect_build_slp_tree_1): Fixed comment typo.
> >         (vect_build_slp_tree_2): Include masks from masked loads in SLP tree.
> >         tree-vect-stmts.c (vect_get_vec_defs_for_operand): New function to
> get
> >         vec_defs for operand with optional SLP and vectype.
> >         (vectorizable_load): Allow vectorizaion of masked loads for SLP only.
> >         tree-vectorizer.h (_stmt_vec_info): Added flag for SLP-only
> >         vectorizable.
> >         tree-vectorizer.c (vec_info::new_stmt_vec_info): Likewise.
> >
> > gcc/testsuite/Changelog:
> >
> > 2019-01-16  Alejandro Martinez  <alejandro.martinezvicente@arm.com>
> >
> >         * gcc.target/aarch64/sve/mask_load_slp_1.c: New test for SLP
> >         vectorized masked loads.


More information about the Gcc-patches mailing list