[Bug rtl-optimization/93943] New: IRA/LRA happily rematerialize (un-CSEs) loads without register pressure
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Wed Feb 26 10:53:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93943
Bug ID: 93943
Summary: IRA/LRA happily rematerialize (un-CSEs) loads without
register pressure
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
long a[1024], b[512], c[512];
void foo ()
{
for (int i = 0; i < 256; ++i)
{
b[2*i] = a[4*i];
b[2*i+1] = a[4*i+2];
c[2*i] = a[4*i+1];
c[2*i+1] = a[4*i+3];
}
}
at -O3 is vectorized with SSE2 V2DImode vectors doing two vector loads,
two shuffles and two vector stores. But we then emit
.L2:
movdqa a(%rax,%rax), %xmm0
movdqa %xmm0, %xmm1
punpckhqdq a+16(%rax,%rax), %xmm0
punpcklqdq a+16(%rax,%rax), %xmm1
addq $16, %rax
movaps %xmm1, b-16(%rax)
movaps %xmm0, c-16(%rax)
cmpq $4096, %rax
jne .L2
so took advantage of the memory op variant of the punpck instructions
enlarging the code and using more load uops.
More information about the Gcc-bugs
mailing list