[Bug target/98877] New: [AArch64] Inefficient code generated for tbl NEON intrinsics
spop at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Fri Jan 29 06:51:01 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877
Bug ID: 98877
Summary: [AArch64] Inefficient code generated for tbl NEON
intrinsics
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: spop at gcc dot gnu.org
Target Milestone: ---
The use of NEON intrinsics is inefficient and leads developers to prefer inline
assembly instead of intrinsics.
A similar performance bug for vmlal intrinsics was reported in
https://gcc.gnu.org/PR92665
The code generated by GCC for table lookups is also inefficient:
$ cat red.c
#include "arm_neon.h"
uint8x16_t fun(uint8x16_t lo, uint8x16_t hi, uint8x16_t idx) {
uint8x16x2_t tab = { .val = {lo, hi} };
uint8x16_t res = vqtbl2q_u8(tab, idx);
return res;
}
$ gcc -O3 -S -o- red.c
fun:
mov v4.16b, v0.16b
mov v5.16b, v1.16b
tbl v0.16b, {v4.16b - v5.16b}, v2.16b
ret
$ clang -O3 -S -o- red.c
fun:
tbl v0.16b, { v0.16b, v1.16b }, v2.16b
ret
More information about the Gcc-bugs
mailing list