This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch, RTL] Eliminate redundant vec_select moves.
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: Tejas Belagod <tbelagod at arm dot com>
- Cc: "gcc-patches\ at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 07 Nov 2013 14:48:02 +0000
- Subject: Re: [Patch, RTL] Eliminate redundant vec_select moves.
- Authentication-results: sourceware.org; auth=none
- References: <527A4309 dot 70209 at arm dot com> <8738n9sj8o dot fsf at talisman dot default> <527A5EF4 dot 5090505 at arm dot com> <87y551r01p dot fsf at talisman dot default> <527A7612 dot 2080406 at arm dot com> <877gcll9ht dot fsf at talisman dot default> <527BA073 dot 30900 at arm dot com>
Tejas Belagod <tbelagod@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbelagod@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>> Richard Sandiford wrote:
>>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>>> + /* This is big-endian-safe because the elements are kept in target
>>>>>>> + memory order. So, for eg. PARALLEL element value of 2 is the
>>>>>>> same in
>>>>>>> + either endian-ness. */
>>>>>>> + if (GET_CODE (src) == VEC_SELECT
>>>>>>> + && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>>> + && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>>> + {
>>>>>>> + rtx par = XEXP (src, 1);
>>>>>>> + int i;
>>>>>>> +
>>>>>>> + for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>>> + {
>>>>>>> + rtx tem = XVECEXP (par, 0, i);
>>>>>>> + if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>>> + return 0;
>>>>>>> + }
>>>>>>> + return 1;
>>>>>>> + }
>>>>>>> +
>>>>>> I think for big endian it needs to be:
>>>>>>
>>>>>> INTVAL (tem) != i + base
>>>>>>
>>>>>> where base is something like:
>>>>>>
>>>>>> int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>>
>>>>>> E.g. a big-endian V4HI looks like:
>>>>>>
>>>>>> msb lsb
>>>>>> 0000111122223333
>>>>>>
>>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>>
>>>>>> msb lsb
>>>>>> 22223333
>>>>> But, in this case we want
>>>>>
>>>>> msb lsb
>>>>> 00001111
>>>> It depends on whether the result occupies a full register or not.
>>>> I was thinking of the case where it didn't, but I realise now you were
>>>> thinking of the case where it did. And yeah, my suggestion doesn't
>>>> cope with that...
>>>>
>>>>> I was under the impression that the const vector parallel for vec_select
>>>>> represents the element indexes of the array in memory order.
>>>>>
>>>>> Therefore, in bigendian,
>>>>>
>>>>> msb lsb
>>>>> 0000 1111 2222 3333
>>>>> element a[0] a[1] a[2] a[3]
>>>>>
>>>>> and in littleendian
>>>>>
>>>>> msb lsb
>>>>> 3333 2222 1111 0000
>>>>> element a[3] a[2] a[1] a[0]
>>>> Right. But if an N-bit value is stored in a register, it's assumed to
>>>> occupy the lsb of the register and the N-1 bits above that. The other
>>>> bits in the register are don't-care.
>>>>
>>>> E.g., leaving vectors to one side, if you have:
>>>>
>>>> (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>>
>>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>>
>>>> msb lsb
>>>> 01234567
>>>> VVVV
>>>> xxxx4567
>>>>
>>>> rather than:
>>>>
>>>> msb lsb
>>>> 01234567
>>>> VVVV
>>>> 0123xxxx
>>>>
>>>> for both endiannesses. The same principle applies to vectors.
>>>> The lsb of the register is always assumed to be significant.
>>>>
>>>> So maybe the original patch was correct for partial-register and
>>>> full-register results on little-endian, but only for full-register
>>>> results on big-endian.
>>> Ah, ok! I think I get it. By eliminating
>>> set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>>
>>> using the check INTVAL (tem) != i, I'm essentially making subsequent
>>> operations
>>> use (reg:V2DI n) in DI mode which is a partial register result and this
>>> gives me
>>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial
>>> register mode, I have to make sure the correct elements coincide with
>>> the lsb in
>>> big-endian...
>>>
>>> Thanks for your input, I'll apply the offset correction for big-endian you
>>> suggested. I'll respin the patch.
>>
>> Thanks. Just for avoidance of doubt, the result might be a full or
>> partial register, depending on the mode and target. I was trying to
>> correct myself by agreeing that your original was right and mine was
>> wrong for big-endian if the result is a full register.
>>
>
> What I had in mind when I implemented this was a partial-reg result, but
> obviously it was wrong.
>
> Sorry, I'm going to take a step back - I'm trying to figure out what a full
> register result would look like. Looking at the pattern,
>
> set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> the result is always a mode smaller than the src if the vec_select selects a
> subset of lanes. Plus if the src and dst are the same reg, because we're
> re-writing the src reg, wouldn't it always end up being a partial-reg?
> In this case, wouldn't (reg:DI n) always represent
>
> msb lsb
> 22223333
The problem is that one reg rtx can span several hard registers.
E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
but it might instead represent two 32-bit registers (nos. 32 and 33).
Obviously the latter's not very likely for vectors this small,
but more likely for larger ones (including on NEON IIRC).
So if we had 2 32-bit registers being treated as a V4HI, it would be:
<--32--><--33-->
msb lsb
0000111122223333
VVVVVVVV
00001111
msb lsb
<--32-->
for big endian and:
<--33--><--32-->
msb lsb
3333222211110000
VVVVVVVV
11110000
msb lsb
<--32-->
for little endian.
Thanks,
Richard