[Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
generictoadhuman at gmail dot com
gcc-bugzilla@gcc.gnu.org
Thu May 21 22:36:47 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug ID: 95265
Summary: aarch64: suboptimal code generation for common neon
intrinsic sequence involving shrn and mull
Product: gcc
Version: 10.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: generictoadhuman at gmail dot com
Target Milestone: ---
Compileable example:
#include <arm_neon.h>
int32x4_t func(int32x4_t a, int32x4_t b)
{
return vshrn_high_n_s64(
vshrn_n_s64(vmull_s32(vget_low_s32(a), vget_low_s32(b)), 12),
vmull_high_s32(a, b), 12);
}
with gcc -O3 the generated code contains two superfluent movs and and one
unecessary dup.
output of gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\opt\devkitpro\devkitA64\bin\aarch64-none-elf-gcc.exe
COLLECT_LTO_WRAPPER=c:/msys64/opt/devkitpro/devkita64/bin/../libexec/gcc/aarch64-none-elf/10.1.0/lto-wrapper.exe
Target: aarch64-none-elf
Configured with: ../../gcc-10.1.0/configure --enable-languages=c,c++,objc,lto
--with-gnu-as --with-gnu-ld --with-gcc --with-march=armv8
--enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose
--enable-poison-system-directories --enable-interwork --enable-multilib
--enable-threads --disable-win32-registry --disable-nls --disable-debug
--disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-libstdcxx-filesystem-ts
--target=aarch64-none-elf --with-newlib=yes
--with-headers=../../newlib-3.3.0/newlib/libc/include
--prefix=/opt/devkitpro/x86_64-w64-mingw32/devkitA64 --enable-lto
--with-system-zlib
--with-bugurl=https://github.com/devkitPro/buildscripts/issues
--with-pkgversion='devkitA64 release 15' --build=x86_64-unknown-linux-gnu
--host=x86_64-w64-mingw32 --with-gmp=/opt/mingw64/mingw
--with-mpfr=/opt/mingw64/mingw --with-mpc=/opt/mingw64/mingw
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (devkitA64 release 15)
More information about the Gcc-bugs
mailing list