[103/nnn] poly_int: TYPE_VECTOR_SUBPARTS

Tue Oct 24 11:20:00 GMT 2017

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
>>>> encoded in the 10-bit precision field and was previously always stored
>>>> as a simple log2 value.  The challenge was to use this 10 bits to
>>>> encode the number of elements in variable-length vectors, so that
>>>> we didn't need to increase the size of the tree.
>>>>
>>>> In practice the number of vector elements should always have the form
>>>> N + N * X (where X is the runtime value), and as for constant-length
>>>> vectors, N must be a power of 2 (even though X itself might not be).
>>>> The patch therefore uses the low bit to select between constant-length
>>>> and variable-length and uses the upper 9 bits to encode log2(N).
>>>> Targets without variable-length vectors continue to use the old scheme.
>>>>
>>>> A new valid_vector_subparts_p function tests whether a given number
>>>> of elements can be encoded.  This is false for the vector modes that
>>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>>>> of vectors rather than single vectors).
>>>>
>>>> Most of the patch is mechanical; previous patches handled the changes
>>>> that weren't entirely straightforward.
>>>
>>> One comment, w/o actually reviewing may/must stuff (will comment on that
>>> elsewhere).
>>>
>>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
>>> lower 8 bits for the log2 value of N and either of the two remaining bits
>>> for the flag?  That way the 8 bits for the shift amount can be eventually
>>> accessed in a more efficient way.
>>>
>>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
>>> accessor on aarch64 / x86_64.
>>
>> Ah, yeah.  I'll give that a go.
>>
>>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
>>> have variable length vector modes?
>>
>> Right.  1 is the default and only AArch64 defines it to anything else (2).
>
> Going to be interesting (bitrot) times then?  I wonder if it makes sense
> to initially define it to 2 globally and only change it to 1 later?

Well, the target-independent code doesn't have the implicit conversion
from poly_int<1, C> to C, so it can't e.g. do:

  poly_int64 x = ...;
  HOST_WIDE_INT y = x;

even when NUM_POLY_INT_COEFFS==1.  Only target-specific code (identified
by IN_TARGET_CODE) can do that.

So to target-independent code it doesn't really matter what
NUM_POLY_INT_COEFFS is.  Even if we bumped it to 2, the extra coefficient
would always be zero.

FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as
supported) 3 for all targets, so that part isn't sensitive to
NUM_POLY_INT_COEFFS.

> Do you have any numbers on the effect of poly-int on compile-times?
> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?

I've just tried that for an x86_64 -j24 build and got:

real: +7%
user: +8.6%

I don't know how noisy the results are though.

It's compile-time neutral in terms of running a gcc built with
--enable-checking=release, within a margin of about [-0.1%, 0.1%].

Thanks,
Richard