[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

rearnsha at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Jan 6 17:01:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005

--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Joel Holdsworth from comment #5)
> I found that if I make modified versions of the intrinsics in arm_neon.h
> that are designed more along the lines of the x86_64 SSE intrinsics defined
> with a simple pointer dereference, then gcc does the right thing [1].
> 
> 
> #include <arm_neon.h>
> 
> __extension__ extern __inline void
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vst1q_s32_fixed (int32_t * __a, int32x4_t __b)
> {
>   *(int32x4_t*)__a = __b;
> }
> 
> __extension__ extern __inline int32x4_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vld1q_s32_fixed (const int32_t * __a)
> {
>   return *(const int32x4_t*)__a;
> }
> 
> int32x4_t foo(int32x4_t a)
> {
>     int32_t temp[4];
>     vst1q_s32_fixed(temp, a);
>     return vld1q_s32_fixed(temp);
> }
> 
> 
> 
> ...compiles to:
> 
> foo(long __vector(4)):
>         bx      lr
> 
> 
> Is there any reason not to simply redefine vst1q_s32, vld1q_s32 and friends
> to stop using builtins?
> 

Did you test it with big-endian?


More information about the Gcc-bugs mailing list