[Patch, RTL] Eliminate redundant vec_select moves.
Tejas Belagod
tbelagod@arm.com
Thu Nov 7 14:37:00 GMT 2013
Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> Richard Sandiford wrote:
>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> + /* This is big-endian-safe because the elements are kept in target
>>>>>> + memory order. So, for eg. PARALLEL element value of 2 is the same in
>>>>>> + either endian-ness. */
>>>>>> + if (GET_CODE (src) == VEC_SELECT
>>>>>> + && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>> + && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>> + {
>>>>>> + rtx par = XEXP (src, 1);
>>>>>> + int i;
>>>>>> +
>>>>>> + for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>> + {
>>>>>> + rtx tem = XVECEXP (par, 0, i);
>>>>>> + if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>> + return 0;
>>>>>> + }
>>>>>> + return 1;
>>>>>> + }
>>>>>> +
>>>>> I think for big endian it needs to be:
>>>>>
>>>>> INTVAL (tem) != i + base
>>>>>
>>>>> where base is something like:
>>>>>
>>>>> int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>
>>>>> E.g. a big-endian V4HI looks like:
>>>>>
>>>>> msb lsb
>>>>> 0000111122223333
>>>>>
>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>
>>>>> msb lsb
>>>>> 22223333
>>>> But, in this case we want
>>>>
>>>> msb lsb
>>>> 00001111
>>> It depends on whether the result occupies a full register or not.
>>> I was thinking of the case where it didn't, but I realise now you were
>>> thinking of the case where it did. And yeah, my suggestion doesn't
>>> cope with that...
>>>
>>>> I was under the impression that the const vector parallel for vec_select
>>>> represents the element indexes of the array in memory order.
>>>>
>>>> Therefore, in bigendian,
>>>>
>>>> msb lsb
>>>> 0000 1111 2222 3333
>>>> element a[0] a[1] a[2] a[3]
>>>>
>>>> and in littleendian
>>>>
>>>> msb lsb
>>>> 3333 2222 1111 0000
>>>> element a[3] a[2] a[1] a[0]
>>> Right. But if an N-bit value is stored in a register, it's assumed to
>>> occupy the lsb of the register and the N-1 bits above that. The other
>>> bits in the register are don't-care.
>>>
>>> E.g., leaving vectors to one side, if you have:
>>>
>>> (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>
>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>
>>> msb lsb
>>> 01234567
>>> VVVV
>>> xxxx4567
>>>
>>> rather than:
>>>
>>> msb lsb
>>> 01234567
>>> VVVV
>>> 0123xxxx
>>>
>>> for both endiannesses. The same principle applies to vectors.
>>> The lsb of the register is always assumed to be significant.
>>>
>>> So maybe the original patch was correct for partial-register and
>>> full-register results on little-endian, but only for full-register
>>> results on big-endian.
>> Ah, ok! I think I get it. By eliminating
>> set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>
>> using the check INTVAL (tem) != i, I'm essentially making subsequent operations
>> use (reg:V2DI n) in DI mode which is a partial register result and this
>> gives me
>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial
>> register mode, I have to make sure the correct elements coincide with
>> the lsb in
>> big-endian...
>>
>> Thanks for your input, I'll apply the offset correction for big-endian you
>> suggested. I'll respin the patch.
>
> Thanks. Just for avoidance of doubt, the result might be a full or
> partial register, depending on the mode and target. I was trying to
> correct myself by agreeing that your original was right and mine was
> wrong for big-endian if the result is a full register.
>
What I had in mind when I implemented this was a partial-reg result, but
obviously it was wrong.
Sorry, I'm going to take a step back - I'm trying to figure out what a full
register result would look like. Looking at the pattern,
set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
the result is always a mode smaller than the src if the vec_select selects a
subset of lanes. Plus if the src and dst are the same reg, because we're
re-writing the src reg, wouldn't it always end up being a partial-reg?
In this case, wouldn't (reg:DI n) always represent
msb lsb
22223333
Thanks,
Tejas.
> I don't know if there are existing helper functions for this kind of thing.
>
> Richard
>
More information about the Gcc-patches
mailing list