RFC: Variable-length VECTOR_CSTs
Richard Sandiford
richard.sandiford@linaro.org
Wed Nov 29 18:24:00 GMT 2017
Thanks for the quick feedback!
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> It was clear from the SVE reviews that people were unhappy with how
>> "special" the variable-length case was. One particular concern was
>> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
>> that would in turn lead to different representations of VEC_PERM_EXPRs
>> with constant permute vectors, and difficulties in invoking
>> vec_perm_const_ok.
>>
>> This is an RFC for a VECTOR_CST representation that treats each
>> specific constant as one instance of an arbitrary-length sequence.
>> The reprensentation then extends to variable-length vectors in a
>> natural way.
>>
>> As discussed on IRC, if a vector contains X*N elements for some
>> constant N and integer X>0, the main features we need are:
>>
>> 1) the ability to represent a sequence that duplicates N values
>>
>> This is useful for SLP invariants.
>>
>> 2) the ability to represent a sequence that starts with N values and
>> is followed by zeros
>>
>> This is useful for the initial value in a double or SLP reduction
>>
>> 3) the ability to represent N interleaved series
>>
>> This is useful for SLP inductions and for VEC_PERM_EXPRs.
>>
>> For (2), zero isn't necessarily special, since vectors used in an AND
>> reduction might need to fill with ones. Also, we might need up to N
>> different fill values with mixed SLP operations; it isn't necessarily
>> safe to assume that a single fill value will always be enough.
>>
>> The same goes for (3): there's no reason in principle why the
>> steps in an SLP induction should all be the same (although they
>> do need to be at the moment). E.g. once we support SLP on:
>>
>> for (unsigned int i = 0; i < n; i += 2)
>> {
>> x[i] += 4 + i;
>> x[i + 1] += 11 + i * 3;
>> }
>>
>> we'll need {[4, 14], +, [2, 6]}.
>>
>> So the idea is to represent vectors as P interleaved patterns of the form:
>>
>> [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]
>>
>> where the STEP is always zero (actually null) for non-integer vectors.
>> This is effectively projecting a "foreground" value of P elements
>> onto an arbitrary-length "background" sequenece, where the background
>> sequence contains P parallel linear series.
>>
>> E.g. to pick an extreme and unlikely example,
>>
>> [42, 99, 2, 20, 3, 30, 4, 40, ...]
>>
>> has 2 patterns:
>>
>> BASE0 = 42, BASE1 = 2, STEP = 1
>> BASE0 = 99, BASE1 = 20, STEP = 10
>>
>> The more useful cases are degenerate versions of this general case.
>>
>> As far as memory consumption goes: the number of patterns needed for a
>> fixed-length vector with 2*N elements is always at most N; in the worst
>> case, we simply interleave the first N elements with the second N elements.
>> The worst-case increase in footprint is therefore N trees for the steps.
>> In practice the footprint is usually smaller than it was before, since
>> most constants do have a pattern.
>>
>> The patch below implements this for trees. I have patches to use the
>> same style of encoding for CONST_VECTOR and vec_perm_indices, but the
>> tree one is probably easiest to read.
>>
>> The patch only adds the representation. Follow-on patches make more
>> use of it (and usually make things simpler; e.g. integer_zerop is no
>> longer a looping operation).
>>
>> Does this look better?
>
> Yes, the overall design looks good. I wonder why you chose to have
> the number of patterns being a power of two? I suppose this is
> to have the same number of elements from all patterns in the final
> vector (which is power-of-two sized)?
Right. The rtl and vec_perm_indices parts don't have this restriction,
since some ports do define non-power-of-2 vectors for internal use.
The problem is that, since VECTOR_CSTs are used by the FE, we need
to support all valid vector lengths without blowing the 16-bit field.
Using the same style of representation as TYPE_VECTOR_SUBPARTS seemed
like the safest way of doing that.
> I wonder if there exists a vector where say a three-pattern
> interleaving would be smaller than a four-pattern one?
Only in the non-power-of-2 case.
> Given you add flags for various purposes would it make sense to
> overload 'step' with a regular element to avoid the storage increase
> in case step is unnecessary? This makes it have three elements
> which is of course awkward :/
I wondered about keeping it as an array of trees and tacking the
steps onto the end as an optional addition. But the idea is that
tree_vector_pattern becomes the preferred way of handling constant
vectors, if it can be used, so it seemed neater to use in the tree
node too.
> Few more comments below.
>
> Otherwise looks good to go!
>
> Thanks for doing this.
>
> [...]
>> ! /* PATTERNS represents a constant of type TYPE. Compress it into the
>> ! canonical form, returning the new vector of patterns. Use CUR and NEXT
>> ! as temporary workspace. */
>> !
>> ! static vec<tree_vector_pattern>
>> ! compress_vector_patterns (vec<tree_vector_pattern> *cur,
>> ! vec<tree_vector_pattern> *next,
>> ! tree type, vec<tree_vector_pattern> patterns)
>> ! {
>> ! while (patterns.length () > 1)
>> ! {
>> ! unsigned int npatterns = patterns.length ();
>> ! gcc_assert ((npatterns & 1) == 0);
>> ! unsigned int step = npatterns / 2;
>> ! cur->truncate (0);
>> ! cur->reserve (step);
>> ! bool first_p = npatterns * 2 == TYPE_VECTOR_SUBPARTS (type);
>> ! for (unsigned int i = 0; i < step; ++i)
>> ! if (!combine_patterns (cur, &patterns[i], &patterns[i + step],
>> ! first_p))
>> ! return patterns;
>> ! patterns = *cur;
>> ! std::swap (cur, next);
>> ! }
>> ! return patterns;
>> ! }
>
> Wow, that looks complicated ;) Which means it needs some more comments.
OK :-)
> I wonder if avoiding all this for the special cases we like to handle by
> providing vector_cst constructors for one and two pattern cases would
> be best. Or even more specialized, not exposing "patterns".
We still have build_vector_from_val, and there'll be one for building
normal series. But code working on general vectors should usually
operate on patterns where possible, so that it extends to variable-length
vectors. Such code can't make any assumptions about the shape of an
existing vector.
We also need the complication above for building vectors from vec<tree>s,
so that there's only one canonical representation of a given vector,
however it was built.
>> ! /* Return true if PATTERNS has an overflow bit set. */
>> !
>> ! static bool
>> ! vector_overflow_p (vec<tree_vector_pattern> patterns)
>> ! {
>> ! unsigned int i;
>> ! tree_vector_pattern *pattern;
>> ! FOR_EACH_VEC_ELT (patterns, i, pattern)
>> ! for (unsigned int j = 0; j < 2; ++j)
>> ! if (CONSTANT_CLASS_P (pattern->base[j])
>> ! && TREE_OVERFLOW (pattern->base[j]))
>> ! return true;
>> ! return false;
>> ! }
>> !
>> ! /* Return true if PATTERNS represents a duplicate operation. */
>>
>> ! static bool
>> ! vector_duplicate_p (vec<tree_vector_pattern> patterns)
>> ! {
>> ! unsigned int i;
>> ! tree_vector_pattern *pattern;
>> ! FOR_EACH_VEC_ELT (patterns, i, pattern)
>> ! {
>> ! if (pattern->step && wi::to_wide (pattern->step) != 0)
>
> I wonder if step == 0 should be canonicalized to NULL_TREE?
Kept wondering about that, but couldn't make my mind up which way was
better. I'll see how it looks with that change.
>> ! return false;
>> ! if (!operand_equal_p (pattern->base[0], pattern->base[1], 0))
>> ! return false;
>> }
>> + return true;
>> + }
>> +
>> + /* Return true if PATTERNS represents a vector series. */
>> +
>> + static bool
>> + vector_series_p (vec<tree_vector_pattern> patterns)
>> + {
>> + unsigned int i;
>> + tree_vector_pattern *pattern;
>> + FOR_EACH_VEC_ELT (patterns, i, pattern)
>> + if (!pattern->step
>> + || (wi::to_wide (pattern->base[1]) - wi::to_wide (pattern->base[0])
>> + != wi::to_wide (pattern->step)))
>> + return false;
>> + return true;
>> + }
>> +
>> + /* Build a VECTOR_CST of type TYPE using the patterns in PATTERNS,
>> + canonicalizing it first if necessary. */
>> +
>> + tree
>> + build_vector (tree type, vec<tree_vector_pattern> patterns MEM_STAT_DECL)
>> + {
>> + auto_vec<tree_vector_pattern, 16> tmp1, tmp2;
>> + patterns = compress_vector_patterns (&tmp1, &tmp2, type, patterns);
>> + unsigned int npatterns = patterns.length ();
>> +
>> + gcc_assert (pow2p_hwi (npatterns));
>> + bool overflow_p = vector_overflow_p (patterns);
>> + bool duplicate_p = vector_duplicate_p (patterns);
>> + bool series_p = vector_series_p (patterns);
>> + tree v = make_vector (exact_log2 (npatterns), duplicate_p, series_p);
>> + TREE_TYPE (v) = type;
>> + TREE_OVERFLOW (v) = overflow_p;
>>
>> ! memcpy (v->vector.patterns, patterns.address (),
>> ! npatterns * sizeof (tree_vector_pattern));
>> return v;
>> }
>>
>> /* Return a new VECTOR_CST node whose type is TYPE and whose values
>> + are given by ELTS. */
>> +
>> + tree
>> + build_vector (tree type, vec<tree> elts MEM_STAT_DECL)
>> + {
>> + unsigned int nelts = elts.length ();
>> + gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
>> +
>> + gcc_assert (pow2p_hwi (nelts));
>> + bool single_p = nelts == 1;
>> + unsigned int npatterns = single_p ? 1 : nelts / 2;
>> + auto_vec<tree_vector_pattern, 16> patterns (npatterns);
>> + for (unsigned int i = 0; i < npatterns; ++i)
>> + {
>> + tree_vector_pattern pattern;
>> + pattern.base[0] = elts[i];
>> + pattern.base[1] = elts[single_p ? i : i + npatterns];
>> + if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>> + && TREE_CODE (pattern.base[0]) == INTEGER_CST
>> + && TREE_CODE (pattern.base[1]) == INTEGER_CST)
>> + pattern.step = wide_int_to_tree (TREE_TYPE (type),
>> + wi::to_wide (pattern.base[1])
>> + - wi::to_wide (pattern.base[0]));
>> + else
>> + pattern.step = NULL_TREE;
>
> I wonder why given the simple representation with n/2 patterns
> you don't use step == NULL_TREE unconditionally here.
The idea is to make constants with exactly 2*npatterns elements less
special. When a constant ends after exactly 2 patterns, we could
choose to extend it with any step. But since the most common use of a
series is to have a+b*x (with no special case for the first element),
using the difference seemed like the best default choice.
E.g. if we want to ask "is element x equal to 0+x*2?", the answer
will be the same for {0, 2} and {0, 2, 4, 6}.
Of course, this means that "is this equal to { 1, 0, 0, ... }?"
does need to treat the 2-pattern case as special. So yeah,
using a null step and always treating the 2-pattern case as
special would be another alternative.
> Is there a target macro specifying the maximum target supported
> vector length? It might be nice to use that in the auto_vec <> size.
Not that I know of. But this code is laso used for generic vectors
that don't have target support, so it might make sense to have a
target-independent choice anyway. I can use a typedef to avoid
hard-coding the number everywhere.
> OTOH sometimes I really like
>
> tree_vector_pattern *p = XALLOCAVEC (npatterns);
>
> more... (ok, __attribute__((vector_size(1048576))) might blow the stack...)
Yeah :-) Plus it gets awkward in loops.
Thanks,
Richard
More information about the Gcc-patches
mailing list