[Patch, RTL] Eliminate redundant vec_select moves.

Thu Nov 7 14:37:00 GMT 2013

Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
>> Richard Sandiford wrote:
>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>> Richard Sandiford wrote:
>>>>> Tejas Belagod <tbelagod@arm.com> writes:
>>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same in
>>>>>> +     either endian-ness.  */
>>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>> +    {
>>>>>> +      rtx par = XEXP (src, 1);
>>>>>> +      int i;
>>>>>> +
>>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>> +	{
>>>>>> +	  rtx tem = XVECEXP (par, 0, i);
>>>>>> +	  if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>> +	    return 0;
>>>>>> +	}
>>>>>> +      return 1;
>>>>>> +    }
>>>>>> +
>>>>> I think for big endian it needs to be:
>>>>>
>>>>>     INTVAL (tem) != i + base
>>>>>
>>>>> where base is something like:
>>>>>
>>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
>>>>>
>>>>> E.g. a big-endian V4HI looks like:
>>>>>
>>>>>     msb          lsb
>>>>>     0000111122223333
>>>>>
>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>
>>>>>             msb  lsb
>>>>>             22223333
>>>> But, in this case we want
>>>>
>>>>          msb  lsb
>>>>          00001111
>>> It depends on whether the result occupies a full register or not.
>>> I was thinking of the case where it didn't, but I realise now you were
>>> thinking of the case where it did.  And yeah, my suggestion doesn't
>>> cope with that...
>>>
>>>> I was under the impression that the const vector parallel for vec_select 
>>>> represents the element indexes of the array in memory order.
>>>>
>>>> Therefore, in bigendian,
>>>>
>>>>           msb             lsb
>>>>           0000 1111 2222 3333
>>>> element  a[0] a[1] a[2] a[3]
>>>>
>>>> and in littleendian
>>>>
>>>>           msb             lsb
>>>>           3333 2222 1111 0000
>>>> element  a[3] a[2] a[1] a[0]
>>> Right.  But if an N-bit value is stored in a register, it's assumed to
>>> occupy the lsb of the register and the N-1 bits above that.  The other
>>> bits in the register are don't-care.
>>>
>>> E.g., leaving vectors to one side, if you have:
>>>
>>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>
>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>
>>>    msb  lsb
>>>    01234567
>>>        VVVV
>>>    xxxx4567
>>>
>>> rather than:
>>>
>>>    msb  lsb
>>>    01234567
>>>    VVVV
>>>    0123xxxx
>>>
>>> for both endiannesses.  The same principle applies to vectors.
>>> The lsb of the register is always assumed to be significant.
>>>
>>> So maybe the original patch was correct for partial-register and
>>> full-register results on little-endian, but only for full-register
>>> results on big-endian.
>> Ah, ok! I think I get it. By eliminating
>> 	set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>
>> using the check INTVAL (tem) != i, I'm essentially making subsequent operations 
>> use (reg:V2DI n) in DI mode which is a partial register result and this
>> gives me
>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
>> register mode, I have to make sure the correct elements coincide with
>> the lsb in
>> big-endian...
>>
>> Thanks for your input, I'll apply the offset correction for big-endian you 
>> suggested. I'll respin the patch.
> 
> Thanks.  Just for avoidance of doubt, the result might be a full or
> partial register, depending on the mode and target.  I was trying to
> correct myself by agreeing that your original was right and mine was
> wrong for big-endian if the result is a full register.
> 

What I had in mind when I implemented this was a partial-reg result, but 
obviously it was wrong.

Sorry, I'm going to take a step back - I'm trying to figure out what a full 
register result would look like. Looking at the pattern,

   set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))

the result is always a mode smaller than the src if the vec_select selects a 
subset of lanes. Plus if the src and dst are the same reg, because we're 
re-writing the src reg, wouldn't it always end up being a partial-reg?
In this case, wouldn't (reg:DI n) always represent

      msb  lsb
      22223333

Thanks,
Tejas.

> I don't know if there are existing helper functions for this kind of thing.
> 
> Richard
>