[Bug target/81496] New: AVX load from adjacent memory location followed by concatenation
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jul 20 17:01:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81496
Bug ID: 81496
Summary: AVX load from adjacent memory location followed by
concatenation
Product: gcc
Version: 7.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jakub at gcc dot gnu.org
Target Milestone: ---
With -O2 -mavx{,2,512f}, we get on the following testcase:
typedef __int128 V __attribute__((vector_size (32)));
typedef long long W __attribute__((vector_size (32)));
typedef int X __attribute__((vector_size (16)));
typedef __int128 Y __attribute__((vector_size (64)));
typedef long long Z __attribute__((vector_size (64)));
W f1 (__int128 x, __int128 y) { return (W) ((V) { x, y }); }
W f2 (__int128 x, __int128 y) { return (W) ((V) { y, x }); }
movq %rdi, -16(%rsp)
movq %rsi, -8(%rsp)
movq %rdx, -32(%rsp)
movq %rcx, -24(%rsp)
vmovdqa -32(%rsp), %xmm0
vmovdqa -16(%rsp), %xmm1
vinserti128 $0x1, %xmm0, %ymm1, %ymm0
for f1, which I'm afraid is hard to do anything about, because RA didn't see
the usefulness to spill in different order, but for f2:
movq %rdx, -32(%rsp)
movq %rcx, -24(%rsp)
vmovdqa -32(%rsp), %xmm0
movq %rdi, -16(%rsp)
movq %rsi, -8(%rsp)
vinserti128 $0x1, -16(%rsp), %ymm0, %ymm0
Before scheduling, the movdqa is next to vinserti128 from the adjacent mem; in
that case it might be a win to use a vmovdqa -32(%rsp), %ymm0 instead.
Though, the MEM has just A128 in the rtl dump, so maybe we need to use vmovdqu
instead, unless we can prove it is 256-bit aligned (it is in this case, but not
generally).
More information about the Gcc-bugs
mailing list