This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
NEON and instruct GCC to move a lane without using a regular register?
- From: Jeffrey Walton <noloader at gmail dot com>
- To: "gcc-help at gcc dot gnu dot org" <gcc-help at gcc dot gnu dot org>
- Date: Sat, 14 May 2016 01:42:54 -0400
- Subject: NEON and instruct GCC to move a lane without using a regular register?
- Authentication-results: sourceware.org; auth=none
- Reply-to: noloader at gmail dot com
I'm having a heck of a time getting GCC to perform a lane to register
lane transfer among D registers.
I have the following C-code:
#define set_high_from_high(d, m) \
d=vsetq_lane_u64(vgetq_lane_u64(m,LANE_H64),d,LANE_H64);
uint64x2_t x, m;
...
set_high_from_high(x, m);
GCC is generating something like:
mov v1.2d[0], x0
mov x0, v2.2d[0]
Instead of:
mov v1.2d[0], v2.2d[0]
I've abandoned inline functions in favor of defines. I've also tried
with and without the 'd=' in the define.
How do I instruct GCC to perform the NEON to NEON lane transfer?
*****
I know it can be done because Clang is doing it. GCC is lagging behind
Clang by about 4 cycles per byte. Here's some relative counts:
GCC at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
2021
Clang at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
445