[Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull

generictoadhuman at gmail dot com gcc-bugzilla@gcc.gnu.org
Thu May 21 22:36:47 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265

            Bug ID: 95265
           Summary: aarch64: suboptimal code generation for common neon
                    intrinsic sequence involving shrn and mull
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: generictoadhuman at gmail dot com
  Target Milestone: ---

Compileable example:

#include <arm_neon.h>

int32x4_t func(int32x4_t a, int32x4_t b)
{
    return vshrn_high_n_s64(
        vshrn_n_s64(vmull_s32(vget_low_s32(a), vget_low_s32(b)), 12), 
        vmull_high_s32(a, b), 12);
}

with gcc -O3 the generated code contains two superfluent movs and and one
unecessary dup.

output of gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\opt\devkitpro\devkitA64\bin\aarch64-none-elf-gcc.exe
COLLECT_LTO_WRAPPER=c:/msys64/opt/devkitpro/devkita64/bin/../libexec/gcc/aarch64-none-elf/10.1.0/lto-wrapper.exe
Target: aarch64-none-elf
Configured with: ../../gcc-10.1.0/configure --enable-languages=c,c++,objc,lto
--with-gnu-as --with-gnu-ld --with-gcc --with-march=armv8
--enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose
--enable-poison-system-directories --enable-interwork --enable-multilib
--enable-threads --disable-win32-registry --disable-nls --disable-debug
--disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-libstdcxx-filesystem-ts
--target=aarch64-none-elf --with-newlib=yes
--with-headers=../../newlib-3.3.0/newlib/libc/include
--prefix=/opt/devkitpro/x86_64-w64-mingw32/devkitA64 --enable-lto
--with-system-zlib
--with-bugurl=https://github.com/devkitPro/buildscripts/issues
--with-pkgversion='devkitA64 release 15' --build=x86_64-unknown-linux-gnu
--host=x86_64-w64-mingw32 --with-gmp=/opt/mingw64/mingw
--with-mpfr=/opt/mingw64/mingw --with-mpc=/opt/mingw64/mingw
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (devkitA64 release 15)


More information about the Gcc-bugs mailing list