NEON and instruct GCC to move a lane without using a regular register?

Jeffrey Walton noloader@gmail.com
Sat May 14 05:42:00 GMT 2016


I'm having a heck of a time getting GCC to perform a lane to register
lane transfer among D registers.

I have the following C-code:

    #define set_high_from_high(d, m) \
        d=vsetq_lane_u64(vgetq_lane_u64(m,LANE_H64),d,LANE_H64);


    uint64x2_t x, m;
    ...

   set_high_from_high(x, m);

GCC is generating something like:

    mov v1.2d[0], x0
    mov x0, v2.2d[0]

Instead of:

    mov v1.2d[0], v2.2d[0]

I've abandoned inline functions in favor of defines. I've also tried
with and without the 'd=' in the define.

How do I instruct GCC to perform the NEON to NEON lane transfer?

*****

I know it can be done because Clang is doing it. GCC is lagging behind
Clang by about 4 cycles per byte. Here's some relative counts:

GCC at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
2021

Clang at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
445



More information about the Gcc-help mailing list