[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated
rearnsha at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jan 6 17:01:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Joel Holdsworth from comment #5)
> I found that if I make modified versions of the intrinsics in arm_neon.h
> that are designed more along the lines of the x86_64 SSE intrinsics defined
> with a simple pointer dereference, then gcc does the right thing [1].
>
>
> #include <arm_neon.h>
>
> __extension__ extern __inline void
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> vst1q_s32_fixed (int32_t * __a, int32x4_t __b)
> {
> *(int32x4_t*)__a = __b;
> }
>
> __extension__ extern __inline int32x4_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> vld1q_s32_fixed (const int32_t * __a)
> {
> return *(const int32x4_t*)__a;
> }
>
> int32x4_t foo(int32x4_t a)
> {
> int32_t temp[4];
> vst1q_s32_fixed(temp, a);
> return vld1q_s32_fixed(temp);
> }
>
>
>
> ...compiles to:
>
> foo(long __vector(4)):
> bx lr
>
>
> Is there any reason not to simply redefine vst1q_s32, vld1q_s32 and friends
> to stop using builtins?
>
Did you test it with big-endian?
More information about the Gcc-bugs
mailing list