This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch, RTL] Eliminate redundant vec_select moves.
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: Richard Sandiford <rdsandiford at googlemail dot com>
- Cc: Tejas Belagod <tbelagod at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 06 Nov 2013 13:12:08 -0600
- Subject: Re: [Patch, RTL] Eliminate redundant vec_select moves.
- Authentication-results: sourceware.org; auth=none
- References: <527A4309 dot 70209 at arm dot com> <8738n9sj8o dot fsf at talisman dot default> <527A5EF4 dot 5090505 at arm dot com> <87y551r01p dot fsf at talisman dot default> <527A7612 dot 2080406 at arm dot com> <877gcll9ht dot fsf at talisman dot default>
On Wed, 2013-11-06 at 17:34 +0000, Richard Sandiford wrote:
> Tejas Belagod <tbelagod@arm.com> writes:
> > Richard Sandiford wrote:
> >> Tejas Belagod <tbelagod@arm.com> writes:
> >>> Richard Sandiford wrote:
> >>>> Tejas Belagod <tbelagod@arm.com> writes:
> >>>>> + /* This is big-endian-safe because the elements are kept in target
> >>>>> + memory order. So, for eg. PARALLEL element value of 2 is the same in
> >>>>> + either endian-ness. */
> >>>>> + if (GET_CODE (src) == VEC_SELECT
> >>>>> + && REG_P (XEXP (src, 0)) && REG_P (dst)
> >>>>> + && REGNO (XEXP (src, 0)) == REGNO (dst))
> >>>>> + {
> >>>>> + rtx par = XEXP (src, 1);
> >>>>> + int i;
> >>>>> +
> >>>>> + for (i = 0; i < XVECLEN (par, 0); i++)
> >>>>> + {
> >>>>> + rtx tem = XVECEXP (par, 0, i);
> >>>>> + if (!CONST_INT_P (tem) || INTVAL (tem) != i)
> >>>>> + return 0;
> >>>>> + }
> >>>>> + return 1;
> >>>>> + }
> >>>>> +
> >>>> I think for big endian it needs to be:
> >>>>
> >>>> INTVAL (tem) != i + base
> >>>>
> >>>> where base is something like:
> >>>>
> >>>> int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
> >>>>
> >>>> E.g. a big-endian V4HI looks like:
> >>>>
> >>>> msb lsb
> >>>> 0000111122223333
> >>>>
> >>>> and shortening it to say V2HI only gives the low 32 bits:
> >>>>
> >>>> msb lsb
> >>>> 22223333
> >>> But, in this case we want
> >>>
> >>> msb lsb
> >>> 00001111
> >>
> >> It depends on whether the result occupies a full register or not.
> >> I was thinking of the case where it didn't, but I realise now you were
> >> thinking of the case where it did. And yeah, my suggestion doesn't
> >> cope with that...
> >>
> >>> I was under the impression that the const vector parallel for vec_select
> >>> represents the element indexes of the array in memory order.
> >>>
> >>> Therefore, in bigendian,
> >>>
> >>> msb lsb
> >>> 0000 1111 2222 3333
> >>> element a[0] a[1] a[2] a[3]
> >>>
> >>> and in littleendian
> >>>
> >>> msb lsb
> >>> 3333 2222 1111 0000
> >>> element a[3] a[2] a[1] a[0]
> >>
> >> Right. But if an N-bit value is stored in a register, it's assumed to
> >> occupy the lsb of the register and the N-1 bits above that. The other
> >> bits in the register are don't-care.
> >>
> >> E.g., leaving vectors to one side, if you have:
> >>
> >> (set (reg:HI N) (truncate:SI (reg:SI N)))
> >>
> >> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
> >>
> >> msb lsb
> >> 01234567
> >> VVVV
> >> xxxx4567
> >>
> >> rather than:
> >>
> >> msb lsb
> >> 01234567
> >> VVVV
> >> 0123xxxx
> >>
> >> for both endiannesses. The same principle applies to vectors.
> >> The lsb of the register is always assumed to be significant.
> >>
> >> So maybe the original patch was correct for partial-register and
> >> full-register results on little-endian, but only for full-register
> >> results on big-endian.
> >
> > Ah, ok! I think I get it. By eliminating
> > set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
> >
> > using the check INTVAL (tem) != i, I'm essentially making subsequent operations
> > use (reg:V2DI n) in DI mode which is a partial register result and this
> > gives me
> > the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial
> > register mode, I have to make sure the correct elements coincide with
> > the lsb in
> > big-endian...
> >
> > Thanks for your input, I'll apply the offset correction for big-endian you
> > suggested. I'll respin the patch.
>
> Thanks. Just for avoidance of doubt, the result might be a full or
> partial register, depending on the mode and target. I was trying to
> correct myself by agreeing that your original was right and mine was
> wrong for big-endian if the result is a full register.
>
> I don't know if there are existing helper functions for this kind of thing.
Tejas, can you please include test cases for both possibilities? The
existing test suite is not sufficient (your original patch does not
demonstrate regressions on powerpc64 big endian, even though we know
it's not correct). Thanks!
Bill
>
> Richard
>