[103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
Richard Sandiford
richard.sandiford@linaro.org
Tue Oct 24 11:20:00 GMT 2017
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64. The value is
>>>> encoded in the 10-bit precision field and was previously always stored
>>>> as a simple log2 value. The challenge was to use this 10 bits to
>>>> encode the number of elements in variable-length vectors, so that
>>>> we didn't need to increase the size of the tree.
>>>>
>>>> In practice the number of vector elements should always have the form
>>>> N + N * X (where X is the runtime value), and as for constant-length
>>>> vectors, N must be a power of 2 (even though X itself might not be).
>>>> The patch therefore uses the low bit to select between constant-length
>>>> and variable-length and uses the upper 9 bits to encode log2(N).
>>>> Targets without variable-length vectors continue to use the old scheme.
>>>>
>>>> A new valid_vector_subparts_p function tests whether a given number
>>>> of elements can be encoded. This is false for the vector modes that
>>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>>>> of vectors rather than single vectors).
>>>>
>>>> Most of the patch is mechanical; previous patches handled the changes
>>>> that weren't entirely straightforward.
>>>
>>> One comment, w/o actually reviewing may/must stuff (will comment on that
>>> elsewhere).
>>>
>>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
>>> lower 8 bits for the log2 value of N and either of the two remaining bits
>>> for the flag? That way the 8 bits for the shift amount can be eventually
>>> accessed in a more efficient way.
>>>
>>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
>>> accessor on aarch64 / x86_64.
>>
>> Ah, yeah. I'll give that a go.
>>
>>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
>>> have variable length vector modes?
>>
>> Right. 1 is the default and only AArch64 defines it to anything else (2).
>
> Going to be interesting (bitrot) times then? I wonder if it makes sense
> to initially define it to 2 globally and only change it to 1 later?
Well, the target-independent code doesn't have the implicit conversion
from poly_int<1, C> to C, so it can't e.g. do:
poly_int64 x = ...;
HOST_WIDE_INT y = x;
even when NUM_POLY_INT_COEFFS==1. Only target-specific code (identified
by IN_TARGET_CODE) can do that.
So to target-independent code it doesn't really matter what
NUM_POLY_INT_COEFFS is. Even if we bumped it to 2, the extra coefficient
would always be zero.
FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as
supported) 3 for all targets, so that part isn't sensitive to
NUM_POLY_INT_COEFFS.
> Do you have any numbers on the effect of poly-int on compile-times?
> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?
I've just tried that for an x86_64 -j24 build and got:
real: +7%
user: +8.6%
I don't know how noisy the results are though.
It's compile-time neutral in terms of running a gcc built with
--enable-checking=release, within a margin of about [-0.1%, 0.1%].
Thanks,
Richard
More information about the Gcc-patches
mailing list