This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/55590] New: SRA still produces unnecessarily unaligned memory accesses
- From: "jamborm at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 04 Dec 2012 10:34:46 +0000
- Subject: [Bug tree-optimization/55590] New: SRA still produces unnecessarily unaligned memory accesses
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55590
Bug #: 55590
Summary: SRA still produces unnecessarily unaligned memory
accesses
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: jamborm@gcc.gnu.org
ReportedBy: jamborm@gcc.gnu.org
SRA can still produce unaligned memory accesses which should be
aligned when it's basing its new scalar access on a MEM_REF buried
below COMPONENT_REFs or ARRAY_REFs.
Testcase 1:
/* { dg-do compile } */
/* { dg-options "-O2 -mavx" } */
#include <immintrin.h>
struct S
{
__m128 a, b;
};
struct T
{
int a;
struct S s;
};
void foo (struct T *p, __m128 v)
{
struct S s;
s = p->s;
s.b = _mm_add_ps(s.b, v);
p->s = s;
}
/* { dg-final { scan-assembler-not "vmovups" } } */
on x86_64 compiles to
vmovups 32(%rdi), %xmm1
vaddps %xmm0, %xmm1, %xmm0
vmovups %xmm0, 32(%rdi)
even though it should really be
vaddps 32(%rdi), %xmm0, %xmm0
vmovaps %xmm0, 32(%rdi)
ret
Testcase 2 (which describes why this should be fixed differently from
the recent IPA-SRA patch because of the variable array index):
/* { dg-do compile } */
/* { dg-options "-O2 -mavx" } */
#include <immintrin.h>
struct S
{
__m128 a, b;
};
struct T
{
int a;
struct S s[8];
};
void foo (struct T *p, int i, __m128 v)
{
struct S s;
s = p->s[i];
s.b = _mm_add_ps(s.b, v);
p->s[i] = s;
}
/* { dg-final { scan-assembler-not "vmovups" } } */
Compiles to
movslq %esi, %rsi
salq $5, %rsi
leaq 16(%rdi,%rsi), %rax
vmovups 16(%rax), %xmm1
vaddps %xmm0, %xmm1, %xmm0
vmovups %xmm0, 16(%rax)
ret
when it should produce
movslq %esi, %rsi
salq $5, %rsi
leaq 16(%rdi,%rsi), %rax
vaddps 16(%rax), %xmm0, %xmm0
vmovaps %xmm0, 16(%rax)
ret
I'm testing a patch.