[PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for Advanced SIMD

Kyrylo Tkachov ktkachov@nvidia.com
Fri Aug 9 08:09:47 GMT 2024


Hi all,

On many cores, including Neoverse V2 the throughput of vector ADD
instructions is higher than vector shifts like SHL.  We can lean on that
to emit code like:
  add     v0.4s, v0.4s, v0.4s
instead of:
  shl     v0.4s, v0.4s, 1

LLVM already does this trick.
In RTL the code gets canonincalised from (plus x x) to (ashift x 1) so I
opted to instead do this at the final assembly printing stage, similar
to how we emit CMLT instead of SSHR elsewhere in the backend.

I'd like to also do this for SVE shifts, but those will have to be
separate patches.

Bootstrapped and tested on aarch64-none-linux-gnu.
I’ll leave it up for comments for a few days and commit next week if no objections.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md
        (aarch64_simd_imm_shl<mode><vczle><vczbe>): Rewrite to new
        syntax.  Add =w,w,vs1 alternative.
        * config/aarch64/constraints.md (vs1): New constraint.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/advsimd_shl_add.c: New test.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-Adva.patch
Type: application/octet-stream
Size: 4495 bytes
Desc: 0002-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-Adva.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20240809/34229c73/attachment.obj>


More information about the Gcc-patches mailing list