[RFC] Try vector<bool> as a new representation for vector masks

Thu Sep 3 12:42:00 GMT 2015

On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> On 27 Aug 09:55, Richard Biener wrote:
>>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>>> >
>>>>> > Yes, I want to try it. But getting rid of bool patterns would mean
>>>>> > support for all targets currently supporting vec_cond. Would it be OK
>>>>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>>>>
>>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be able
>>>>> to figure out the correct vector type (or mask type) from the uses.  Currently
>>>>> it simply looks at the stmts LHS type but as all stmt operands already have
>>>>> vector types it can as well compute the result type from those.  We'd want to
>>>>> have a helper function that does this result type computation as I figure it
>>>>> will be needed in multiple places.
>>>>>
>>>>> This is now on my personal TODO list (but that's already quite long for GCC 6),
>>>>> so if you manage to get to that...  see
>>>>> tree-vect-loop.c:vect_determine_vectorization_factor
>>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>>>>> vector type set from data-ref analysis already - there 'bool' loads
>>>>> correctly get
>>>>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>>>>> the helper function (eventually with some SLP discovery order issue).
>>>>>
>>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>>>>> > MASK_STORE).
>>>>>
>>>>> You can certainly try that first, but as soon as you hit complications with
>>>>> needing to adjust bool patterns then I'd rather get rid of them.
>>>>>
>>>>> >
>>>>> > I can directly build a vector type with specified mode to avoid it. Smth. like:
>>>>> >
>>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>>>>
>>>>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>>>>> target attribute
>>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>>>>> on the active target).  There would also be no way for the user to
>>>>> declare vector<bool>
>>>>> in source (which is good because of that target attribute issue...).
>>>>>
>>>>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>>>>> and adjusting
>>>>> truth_type_for is the way to go.
>>>>>
>>>>> I suggest you try modifying those parts first according to this scheme
>>>>> that will most
>>>>> likely uncover issues we missed.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  There were no major issues (for now).
>>>>
>>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to build a mask type.  It is always a vector of bools but its mode is determined by a target using number of units and currently used vector length.
>>>>
>>>> As previously I fixed if-conversion to apply boolean masks for loads and stores which automatically disables bool patterns for them and flow goes by a mask path.  Vectorization factor computation is fixed to have a separate computation for mask types.  Comparison is now handled separately by vectorizer and is vectorized into vector comparison.
>>>>
>>>> Optabs for masked loads and stores were transformed into convert optabs.  Now it is checked using both value and mask modes.
>>>>
>>>> Optabs for comparison were added.  These are also convert optabs checking value and result type.
>>>>
>>>> I had to introduce significant number of new patterns in i386 target to support new optabs.  The reason was vector compare was never expanded separately and always was a part of a vec_cond expansion.
>>>
>>> Indeed.
>>>
>>>> As a result it's possible to use the sage GIMPLE representation for both vector and scalar masks target type.  Here is an example I used as a simple test:
>>>>
>>>>   for (i=0; i<N; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f && t < 1.0e+2f)
>>>>       if (c[i] != 0)
>>>>         c[i] = 1;
>>>>   }
>>>>
>>>> Produced vector GIMPLE (before expand):
>>>>
>>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 });
>>>
>>> Looks good to me.
>>>
>>>> Produced assembler on AVX-512:
>>>>
>>>>         vmovups (%rdi), %zmm0
>>>>         vcmpps  $25, %zmm5, %zmm0, %k1
>>>>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>>>>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>>>>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>>>>         vmovdqa32       %zmm4, (%rcx){%k1}
>>>>
>>>> Produced assembler on AVX-2:
>>>>
>>>>         vmovups (%rdx), %xmm1
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>>>>         vcmpltps        %ymm1, %ymm3, %ymm0
>>>>         vcmpltps        %ymm5, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>>>>
>>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>>>>
>>>>         vmovups (%rdx), %xmm0
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>>>>         vcmpltps        %ymm0, %ymm6, %ymm1
>>>>         vcmpltps        %ymm7, %ymm0, %ymm0
>>>>         vpand   %ymm1, %ymm5, %ymm2
>>>>         vpand   %ymm0, %ymm2, %ymm1
>>>>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm1, %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>>>>
>>>>
>>>> For now I still don't disable bool patterns, thus new masks apply to masked loads and stores only.  Patch is also not tested and tried on several small tests only.  Could you please look at what I currently have and say if it's in sync with your view on vector masking?
>>>
>>> So apart from bool patterns and maybe implementation details (didn't
>>> look too closely at the patch yet, maybe tomorrow), there is
>>>
>>> +  /* Or a boolean vector type with the same element count
>>> +     as the comparison operand types.  */
>>> +  else if (TREE_CODE (type) == VECTOR_TYPE
>>> +          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>> +    {
>>>
>>> so we now allow both, integer typed and boolean typed comparison
>>> results?  I was hoping that on GIMPLE
>>> we can canonicalize to a single form, the boolean one and for the
>>> "old" style force the use of VEC_COND exprs
>>> (which we did anyway, AFAIK).  The comparison in the VEC_COND would
>>> still have vector bool result type.
>>>
>>> I expect the vectorization factor changes to "vanish" if we remove
>>> bool patterns and re-org vector type deduction
>>>
>>> Richard.
>>>
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.

Well, the frontends already create vec_cond exprs I believe.  So for
bool patterns the vectorizer would have to do the same, but the
comparison result in there would still use vec<bool>.  Thus the scalar

 _Bool a = b < c;
 _Bool c = a || d;
 if (c)

would become

 vec<int> a = VEC_COND <a < b ? -1 : 0>;
 vec<int> c = a | d;

when the target does not have vec<bool>s directly and otherwise
vec<boo> directly (dropping the VEC_COND).

Just the vector comparison inside the VEC_COND would always
have vec<bool> type.

And the "bool patterns" I am talking about are those in
tree-vect-patterns.c, not any targets instruction patterns.

Richard.

>>
>> Ilya