This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PR48941 / 51980] Rewrite arm_neon.h to use __builtin_shuffle


On Tue, Jun 12, 2012 at 1:07 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 12 June 2012 11:46, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Tue, Jun 12, 2012 at 11:22 AM, Julian Brown <julian@codesourcery.com> wrote:
>>> On Mon, 11 Jun 2012 16:46:27 +0100
>>> Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> I don't like the ML bits of the patch as it stands today and before
>>>> committing I would like to clean up the ML bits quite a bit further
>>>> especially in areas where I've put FIXMEs [...]
>>>
>>> I had a go at this, see attached. Untested. Note there are some
>>> semantic differences in output:
>>>
>>> ?vzipq_p8 (poly8x16_t __a, poly8x16_t __b)
>>> ?{
>>> ? poly8x16x2_t __rv;
>>> - ?uint8x16_t __mask1 = {0, 2};
>>> - ?uint8x16_t __mask2 = {1, 3};
>>> - ?__rv.val[0] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask1);
>>> - ?__rv.val[1] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask2);
>>> + ?uint8x16_t __mask1 = { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6,
>>> ?22, 7, 23 };
>>> + ?uint8x16_t __mask2 = { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29,
>>> ?14, 30, 15, 31 };
>>> + ?__rv.val[0] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask1);
>>> + ?__rv.val[1] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask2);
>>
>> You should get better code at -O0 when not using a temporary __mask1/__mask2
>> but directly pasting the constant in the builtin call.
>
> I tried that yesterday but it didn't seem to help - from a quick peek
> at the dumps it looks like we could do with some limited const prop
> just for the vec_perm expand cases.
>
> ? ? ?D.14032 = { 0, 8, 1, 9, 2, 10, 3, 11 };
> ? ? ?D.14044 = VEC_PERM_EXPR <__a, __b, D.14032>;
>
> That's what I see from the dumps and from a quick skim of the sources
> - my suspicion is that lower_vec_perm in tree-vect-generic.c is where
> we could try doing a limited constant propagation in this case. ? Is
> that where one should attempt to fix this ?
>
> Consider the following testcase at O0 rewritten with just
> __builtin_shuffle so that you can see it on other platforms as well
> that have vec_perm_const defined for doing the interleave style
> operations. and look at what you get for O1. On arm-linux-gnueabi with
> -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a9 at O0 you'd see it use
> the generic permute operations and at O1 you'd see a vzip.32
> instruction
>
>
> typedef int v4si __attribute__ ((vector_size (16)));
>
> v4si vs (v4si a, v4si b)
> {
> ?return __builtin_shuffle (a, b, (v4si) {0, 4, 1, 5});
> }

Ok, I see the C frontend hands us this as

  return  VEC_PERM_EXPR < a , b , <<< Unknown tree: compound_literal_expr
    v4si D.1712 = { 0, 4, 1, 5 }; >>> > ;

and gimplification in some way fails to gimplify it to { 0, 4, 1, 5 }.  Yes,
tree-vect-generic.c could just lookup the SSA def stmt in this case - it
does so in most cases already.

Richard.

>
> regards
> Ramana
>
>>
>>> ? return __rv;
>>> ?}
>>>
>>> I wasn't quite sure which version was correct -- but your version
>>> doesn't seem to have enough elements for these cases?
>>>
>>> HTH,
>>>
>>> Julian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]