RFC: Variable-length VECTOR_CSTs

Wed Nov 29 18:24:00 GMT 2017

Thanks for the quick feedback!

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> It was clear from the SVE reviews that people were unhappy with how
>> "special" the variable-length case was.  One particular concern was
>> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
>> that would in turn lead to different representations of VEC_PERM_EXPRs
>> with constant permute vectors, and difficulties in invoking
>> vec_perm_const_ok.
>>
>> This is an RFC for a VECTOR_CST representation that treats each
>> specific constant as one instance of an arbitrary-length sequence.
>> The reprensentation then extends to variable-length vectors in a
>> natural way.
>>
>> As discussed on IRC, if a vector contains X*N elements for some
>> constant N and integer X>0, the main features we need are:
>>
>> 1) the ability to represent a sequence that duplicates N values
>>
>>    This is useful for SLP invariants.
>>
>> 2) the ability to represent a sequence that starts with N values and
>>    is followed by zeros
>>
>>    This is useful for the initial value in a double or SLP reduction
>>
>> 3) the ability to represent N interleaved series
>>
>>    This is useful for SLP inductions and for VEC_PERM_EXPRs.
>>
>> For (2), zero isn't necessarily special, since vectors used in an AND
>> reduction might need to fill with ones.  Also, we might need up to N
>> different fill values with mixed SLP operations; it isn't necessarily
>> safe to assume that a single fill value will always be enough.
>>
>> The same goes for (3): there's no reason in principle why the
>> steps in an SLP induction should all be the same (although they
>> do need to be at the moment).  E.g. once we support SLP on:
>>
>>   for (unsigned int i = 0; i < n; i += 2)
>>     {
>>       x[i] += 4 + i;
>>       x[i + 1] += 11 + i * 3;
>>     }
>>
>> we'll need {[4, 14], +, [2, 6]}.
>>
>> So the idea is to represent vectors as P interleaved patterns of the form:
>>
>>   [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]
>>
>> where the STEP is always zero (actually null) for non-integer vectors.
>> This is effectively projecting a "foreground" value of P elements
>> onto an arbitrary-length "background" sequenece, where the background
>> sequence contains P parallel linear series.
>>
>> E.g. to pick an extreme and unlikely example,
>>
>>   [42, 99, 2, 20, 3, 30, 4, 40, ...]
>>
>> has 2 patterns:
>>
>>   BASE0 = 42, BASE1 = 2, STEP = 1
>>   BASE0 = 99, BASE1 = 20, STEP = 10
>>
>> The more useful cases are degenerate versions of this general case.
>>
>> As far as memory consumption goes: the number of patterns needed for a
>> fixed-length vector with 2*N elements is always at most N; in the worst
>> case, we simply interleave the first N elements with the second N elements.
>> The worst-case increase in footprint is therefore N trees for the steps.
>> In practice the footprint is usually smaller than it was before, since
>> most constants do have a pattern.
>>
>> The patch below implements this for trees.  I have patches to use the
>> same style of encoding for CONST_VECTOR and vec_perm_indices, but the
>> tree one is probably easiest to read.
>>
>> The patch only adds the representation.  Follow-on patches make more
>> use of it (and usually make things simpler; e.g. integer_zerop is no
>> longer a looping operation).
>>
>> Does this look better?
>
> Yes, the overall design looks good.  I wonder why you chose to have
> the number of patterns being a power of two?  I suppose this is
> to have the same number of elements from all patterns in the final
> vector (which is power-of-two sized)?

Right.  The rtl and vec_perm_indices parts don't have this restriction,
since some ports do define non-power-of-2 vectors for internal use.
The problem is that, since VECTOR_CSTs are used by the FE, we need
to support all valid vector lengths without blowing the 16-bit field.
Using the same style of representation as TYPE_VECTOR_SUBPARTS seemed
like the safest way of doing that.

> I wonder if there exists a vector where say a three-pattern
> interleaving would be smaller than a four-pattern one?

Only in the non-power-of-2 case.

> Given you add flags for various purposes would it make sense to
> overload 'step' with a regular element to avoid the storage increase
> in case step is unnecessary?  This makes it have three elements
> which is of course awkward :/

I wondered about keeping it as an array of trees and tacking the
steps onto the end as an optional addition.  But the idea is that
tree_vector_pattern becomes the preferred way of handling constant
vectors, if it can be used, so it seemed neater to use in the tree
node too.

> Few more comments below.
>
> Otherwise looks good to go!
>
> Thanks for doing this.
>
> [...]
>> ! /* PATTERNS represents a constant of type TYPE.  Compress it into the
>> !    canonical form, returning the new vector of patterns.  Use CUR and NEXT
>> !    as temporary workspace.  */
>> !
>> ! static vec<tree_vector_pattern>
>> ! compress_vector_patterns (vec<tree_vector_pattern> *cur,
>> !                         vec<tree_vector_pattern> *next,
>> !                         tree type, vec<tree_vector_pattern> patterns)
>> ! {
>> !   while (patterns.length () > 1)
>> !     {
>> !       unsigned int npatterns = patterns.length ();
>> !       gcc_assert ((npatterns & 1) == 0);
>> !       unsigned int step = npatterns / 2;
>> !       cur->truncate (0);
>> !       cur->reserve (step);
>> !       bool first_p = npatterns * 2 == TYPE_VECTOR_SUBPARTS (type);
>> !       for (unsigned int i = 0; i < step; ++i)
>> !       if (!combine_patterns (cur, &patterns[i], &patterns[i + step],
>> !                              first_p))
>> !         return patterns;
>> !       patterns = *cur;
>> !       std::swap (cur, next);
>> !     }
>> !   return patterns;
>> ! }
>
> Wow, that looks complicated ;)  Which means it needs some more comments.

OK :-)

> I wonder if avoiding all this for the special cases we like to handle by
> providing vector_cst constructors for one and two pattern cases would
> be best.  Or even more specialized, not exposing "patterns".

We still have build_vector_from_val, and there'll be one for building
normal series.  But code working on general vectors should usually
operate on patterns where possible, so that it extends to variable-length
vectors.  Such code can't make any assumptions about the shape of an
existing vector.

We also need the complication above for building vectors from vec<tree>s,
so that there's only one canonical representation of a given vector,
however it was built.

>> ! /* Return true if PATTERNS has an overflow bit set.  */
>> !
>> ! static bool
>> ! vector_overflow_p (vec<tree_vector_pattern> patterns)
>> ! {
>> !   unsigned int i;
>> !   tree_vector_pattern *pattern;
>> !   FOR_EACH_VEC_ELT (patterns, i, pattern)
>> !     for (unsigned int j = 0; j < 2; ++j)
>> !       if (CONSTANT_CLASS_P (pattern->base[j])
>> !         && TREE_OVERFLOW (pattern->base[j]))
>> !       return true;
>> !   return false;
>> ! }
>> !
>> ! /* Return true if PATTERNS represents a duplicate operation.  */
>>
>> ! static bool
>> ! vector_duplicate_p (vec<tree_vector_pattern> patterns)
>> ! {
>> !   unsigned int i;
>> !   tree_vector_pattern *pattern;
>> !   FOR_EACH_VEC_ELT (patterns, i, pattern)
>> !     {
>> !       if (pattern->step && wi::to_wide (pattern->step) != 0)
>
> I wonder if step == 0 should be canonicalized to NULL_TREE?

Kept wondering about that, but couldn't make my mind up which way was
better.  I'll see how it looks with that change.

>> !       return false;
>> !       if (!operand_equal_p (pattern->base[0], pattern->base[1], 0))
>> !       return false;
>>       }
>> +   return true;
>> + }
>> +
>> + /* Return true if PATTERNS represents a vector series.  */
>> +
>> + static bool
>> + vector_series_p (vec<tree_vector_pattern> patterns)
>> + {
>> +   unsigned int i;
>> +   tree_vector_pattern *pattern;
>> +   FOR_EACH_VEC_ELT (patterns, i, pattern)
>> +     if (!pattern->step
>> +       || (wi::to_wide (pattern->base[1]) - wi::to_wide (pattern->base[0])
>> +           != wi::to_wide (pattern->step)))
>> +       return false;
>> +   return true;
>> + }
>> +
>> + /* Build a VECTOR_CST of type TYPE using the patterns in PATTERNS,
>> +    canonicalizing it first if necessary.  */
>> +
>> + tree
>> + build_vector (tree type, vec<tree_vector_pattern> patterns MEM_STAT_DECL)
>> + {
>> +   auto_vec<tree_vector_pattern, 16> tmp1, tmp2;
>> +   patterns = compress_vector_patterns (&tmp1, &tmp2, type, patterns);
>> +   unsigned int npatterns = patterns.length ();
>> +
>> +   gcc_assert (pow2p_hwi (npatterns));
>> +   bool overflow_p = vector_overflow_p (patterns);
>> +   bool duplicate_p = vector_duplicate_p (patterns);
>> +   bool series_p = vector_series_p (patterns);
>> +   tree v = make_vector (exact_log2 (npatterns), duplicate_p, series_p);
>> +   TREE_TYPE (v) = type;
>> +   TREE_OVERFLOW (v) = overflow_p;
>>
>> !   memcpy (v->vector.patterns, patterns.address (),
>> !         npatterns * sizeof (tree_vector_pattern));
>>     return v;
>>   }
>>
>>   /* Return a new VECTOR_CST node whose type is TYPE and whose values
>> +    are given by ELTS.  */
>> +
>> + tree
>> + build_vector (tree type, vec<tree> elts MEM_STAT_DECL)
>> + {
>> +   unsigned int nelts = elts.length ();
>> +   gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
>> +
>> +   gcc_assert (pow2p_hwi (nelts));
>> +   bool single_p = nelts == 1;
>> +   unsigned int npatterns = single_p ? 1 : nelts / 2;
>> +   auto_vec<tree_vector_pattern, 16> patterns (npatterns);
>> +   for (unsigned int i = 0; i < npatterns; ++i)
>> +     {
>> +       tree_vector_pattern pattern;
>> +       pattern.base[0] = elts[i];
>> +       pattern.base[1] = elts[single_p ? i : i + npatterns];
>> +       if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>> +         && TREE_CODE (pattern.base[0]) == INTEGER_CST
>> +         && TREE_CODE (pattern.base[1]) == INTEGER_CST)
>> +       pattern.step = wide_int_to_tree (TREE_TYPE (type),
>> +                                        wi::to_wide (pattern.base[1])
>> +                                        - wi::to_wide (pattern.base[0]));
>> +       else
>> +       pattern.step = NULL_TREE;
>
> I wonder why given the simple representation with n/2 patterns
> you don't use step == NULL_TREE unconditionally here.

The idea is to make constants with exactly 2*npatterns elements less
special.  When a constant ends after exactly 2 patterns, we could
choose to extend it with any step.  But since the most common use of a
series is to have a+b*x (with no special case for the first element),
using the difference seemed like the best default choice.

E.g. if we want to ask "is element x equal to 0+x*2?", the answer
will be the same for {0, 2} and {0, 2, 4, 6}.

Of course, this means that "is this equal to { 1, 0, 0, ... }?"
does need to treat the 2-pattern case as special.  So yeah,
using a null step and always treating the 2-pattern case as
special would be another alternative.

> Is there a target macro specifying the maximum target supported
> vector length?  It might be nice to use that in the auto_vec <> size.

Not that I know of.  But this code is laso used for generic vectors
that don't have target support, so it might make sense to have a
target-independent choice anyway.  I can use a typedef to avoid
hard-coding the number everywhere.

> OTOH sometimes I really like
>
>   tree_vector_pattern *p = XALLOCAVEC (npatterns);
>
> more... (ok, __attribute__((vector_size(1048576))) might blow the stack...)

Yeah :-)  Plus it gets awkward in loops.

Thanks,
Richard