[Bug rtl-optimization/92665] New: [AArch64] low lanes select not optimized out for vmlal intrinsics
spop at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Nov 25 18:59:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665
Bug ID: 92665
Summary: [AArch64] low lanes select not optimized out for vmlal
intrinsics
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: spop at gcc dot gnu.org
Target Milestone: ---
With gcc as of today I see dup instructions that could be optimized out:
$ cat red.c
#include "arm_neon.h"
int32x4_t fun(int32x4_t a, int16x8_t b, int16x8_t c) {
a = vmlal_s16(a, vget_low_s16(b), vget_low_s16(c));
a = vmlal_high_s16(a, b, c);
return a;
}
$ gcc -O3 -S -o- red.c
fun:
dup d3, v1.d[0]
dup d4, v2.d[0]
smlal v0.4s,v3.4h,v4.4h
smlal2 v0.4s,v1.8h,v2.8h
ret
$ clang -O3 -S -o- red.c
fun:
smlal v0.4s, v1.4h, v2.4h
smlal2 v0.4s, v1.8h, v2.8h
ret
More information about the Gcc-bugs
mailing list