[RFC PR48941 / 51980] Rewrite arm_neon.h to use __builtin_shuffle
Ramana Radhakrishnan
ramana.radhakrishnan@linaro.org
Thu Jun 14 12:18:00 GMT 2012
On 12 June 2012 10:22, Julian Brown <julian@codesourcery.com> wrote:
> On Mon, 11 Jun 2012 16:46:27 +0100
> Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> wrote:
>
>> Hi,
>>
>> I don't like the ML bits of the patch as it stands today and before
>> committing I would like to clean up the ML bits quite a bit further
>> especially in areas where I've put FIXMEs [...]
>
> I had a go at this, see attached. Untested. Note there are some
> semantic differences in output:
>
> vzipq_p8 (poly8x16_t __a, poly8x16_t __b)
> {
> poly8x16x2_t __rv;
> - uint8x16_t __mask1 = {0, 2};
> - uint8x16_t __mask2 = {1, 3};
> - __rv.val[0] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask1);
> - __rv.val[1] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask2);
> + uint8x16_t __mask1 = { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6,
> 22, 7, 23 };
> + uint8x16_t __mask2 = { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29,
> 14, 30, 15, 31 };
> + __rv.val[0] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask1);
> + __rv.val[1] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask2);
> return __rv;
> }
>
> I wasn't quite sure which version was correct -- but your version
> doesn't seem to have enough elements for these cases?
I still have a small cleanup to do with the tests as we now correctly
generate one instruction for all of vzip.32 , vuzp.32 , vtrn.32 - A
2x2 matrix transform is the same as an interleave of 2 2 element
vectors or deinterleave of 2 2 element vectors. This is however
blocked on __builtin_shuffle making it to the C++ frontend.
regards,
Ramana
>
> HTH,
>
> Julian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shuf.patch
Type: application/octet-stream
Size: 46606 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20120614/1147a312/attachment.obj>
More information about the Gcc-patches
mailing list