This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Try vector<bool> as a new representation for vector masks


On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> >
>> > Yes, I want to try it. But getting rid of bool patterns would mean
>> > support for all targets currently supporting vec_cond. Would it be OK
>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>
>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
>> to figure out the correct vector type (or mask type) from the uses.  Currently
>> it simply looks at the stmts LHS type but as all stmt operands already have
>> vector types it can as well compute the result type from those.  We'd want to
>> have a helper function that does this result type computation as I figure it
>> will be needed in multiple places.
>>
>> This is now on my personal TODO list (but that's already quite long for GCC 6),
>> so if you manage to get to that...  see
>> tree-vect-loop.c:vect_determine_vectorization_factor
>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>> vector type set from data-ref analysis already - there 'bool' loads
>> correctly get
>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>> the helper function (eventually with some SLP discovery order issue).
>>
>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>> > MASK_STORE).
>>
>> You can certainly try that first, but as soon as you hit complications with
>> needing to adjust bool patterns then I'd rather get rid of them.
>>
>> >
>> > I can directly build a vector type with specified mode to avoid it. Smth. like:
>> >
>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>
>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>> target attribute
>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>> on the active target).  There would also be no way for the user to
>> declare vector<bool>
>> in source (which is good because of that target attribute issue...).
>>
>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>> and adjusting
>> truth_type_for is the way to go.
>>
>> I suggest you try modifying those parts first according to this scheme
>> that will most
>> likely uncover issues we missed.
>>
>> Thanks,
>> Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).
>
> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.
>
> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.
>
> Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.
>
> Optabs for comparison were added.  These are also convert optabs checking value and result type.
>
> I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.

Indeed.

> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:
>
>   for (i=0; i<N; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f && t < 1.0e+2f)
>       if (c[i] != 0)
>         c[i] = 1;
>   }
>
> Produced vector GIMPLE (before expand):
>
>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });

Looks good to me.

> Produced assembler on AVX-512:
>
>         vmovups (%rdi), %zmm0
>         vcmpps  $25, %zmm5, %zmm0, %k1
>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>         vmovdqa32       %zmm4, (%rcx){%k1}
>
> Produced assembler on AVX-2:
>
>         vmovups (%rdx), %xmm1
>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>         vcmpltps        %ymm1, %ymm3, %ymm0
>         vcmpltps        %ymm5, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>
> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>
>         vmovups (%rdx), %xmm0
>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>         vcmpltps        %ymm0, %ymm6, %ymm1
>         vcmpltps        %ymm7, %ymm0, %ymm0
>         vpand   %ymm1, %ymm5, %ymm2
>         vpand   %ymm0, %ymm2, %ymm1
>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm1, %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>
>
> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?

So apart from bool patterns and maybe implementation details (didn't
look too closely at the patch yet, maybe tomorrow), there is

+  /* Or a boolean vector type with the same element count
+     as the comparison operand types.  */
+  else if (TREE_CODE (type) == VECTOR_TYPE
+          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+    {

so we now allow both, integer typed and boolean typed comparison
results?  I was hoping that on GIMPLE
we can canonicalize to a single form, the boolean one and for the
"old" style force the use of VEC_COND exprs
(which we did anyway, AFAIK).  The comparison in the VEC_COND would
still have vector bool result type.

I expect the vectorization factor changes to "vanish" if we remove
bool patterns and re-org vector type deduction

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-01  Ilya Enkovich  <enkovich.gnu@gmail.com>
>
>         * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
>         op_true and op_false.
>         (ix86_int_cmp_code_to_pcmp_immediate): New.
>         (ix86_fp_cmp_code_to_pcmp_immediate): New.
>         (ix86_cmp_code_to_pcmp_immediate): New.
>         (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         (ix86_expand_int_sse_cmp): New.
>         (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_get_mask_mode): New.
>         (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * config/i386/sse.md (avx512fmaskmodelower): New.
>         (vec_cmp<mode><avx512fmaskmodelower>): New.
>         (vec_cmp<mode><sseintvecmodelower>): New.
>         (vec_cmpv2div2di): New.
>         (vec_cmpu<mode><avx512fmaskmodelower>): New.
>         (vec_cmpu<mode><sseintvecmodelower>): New.
>         (vec_cmpuv2div2di): New.
>         (maskload<mode>): Rename to ...
>         (maskload<mode><sseintvecmodelower>): ... this.
>         (maskstore<mode>): Rename to ...
>         (maskstore<mode><sseintvecmodelower>): ... this.
>         (maskload<mode><avx512fmaskmodelower>): New.
>         (maskstore<mode><avx512fmaskmodelower>): New.
>         * doc/tm.texi: Regenerated.
>         * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
>         * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
>         (expand_MASK_STORE): Likewise.
>         * optabs.c (vector_compare_rtx): Add OPNO arg.
>         (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
>         (get_vec_cmp_icode): New.
>         (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * optabs.def (vec_cmp_optab): New.
>         (vec_cmpu_optab): New.
>         (maskload_optab): Transform into convert optab.
>         (maskstore_optab): Likewise.
>         * optabs.h (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * target.def (get_mask_mode): New.
>         * targhooks.c (default_vector_alignment): Use mode alignment
>         for vector masks.
>         (default_get_mask_mode): New.
>         * targhooks.h (default_get_mask_mode): New.
>         * tree-cfg.c (verify_gimple_comparison): Support vector mask.
>         * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
>         can_vec_mask_load_store_p signature change.
>         (predicate_mem_writes): Use boolean mask.
>         * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
>         (vect_create_destination_var): Likewise.
>         * tree-vect-generic.c (expand_vector_comparison): Use
>         expand_vec_cmp_expr_p for comparison availability.
>         (expand_vector_operations_1): Ignore statements with scalar mode.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
>         operations for VF.  Add mask type computation.
>         * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
>         constant.
>         (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
>         signature change.
>         (vectorizable_comparison): New.
>         (vect_analyze_stmt): Add vectorizable_comparison.
>         (vect_transform_stmt): Likewise.
>         (get_mask_type_for_scalar_type): New.
>         * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
>         (enum stmt_vec_info_type): Add comparison_vec_info_type.
>         (get_mask_type_for_scalar_type): New.
>         * tree.c (build_truth_vector_type): New.
>         (truth_type_for): Use build_truth_vector_type for vectors.
>         * tree.h (build_truth_vector_type): New.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]