This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/47059] New: compiler fails to coalesce loads/stores
- From: "rahul at icerasemi dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 24 Dec 2010 11:01:36 +0000
- Subject: [Bug tree-optimization/47059] New: compiler fails to coalesce loads/stores
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059
Summary: compiler fails to coalesce loads/stores
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: rahul@icerasemi.com
CC: sdkteam-gnu@icerasemi.com
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Consider the following test case compiled with GCC4.5.1 (x86) and the following
command:
gcc -S -Os test.c
struct struct1
{
void *data;
unsigned short f1;
unsigned short f2;
};
typedef struct struct1 S1;
struct struct2
{
int f3;
S1 f4;
};
typedef struct struct2 S2;
extern void foo (S1 *ptr);
extern S2 gstruct2_var;
extern S1 gstruct1_var;
static S1 bar (const S1 *ptr) __attribute__ ((always_inline));
static S1
bar (const S1 *ptr)
{
S1 ls_var = *ptr;
foo (&ls_var);
return ls_var;
}
int
main ()
{
S2 *ps_var;
ps_var = &gstruct2_var;
ps_var->f4 = bar (&gstruct1_var);
return 0;
}
We get:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $32, %esp
movl gstruct1_var, %eax
movl gstruct1_var+4, %edx
movl %eax, -16(%ebp)
leal -16(%ebp), %eax
pushl %eax
movl %edx, -12(%ebp)
call foo
movl -16(%ebp), %eax
movl -4(%ebp), %ecx
movl %eax, gstruct2_var+4
movl -12(%ebp), %eax <-- load1 [ebp - 12] @ 4 bytes
movw %ax, gstruct2_var+8 <-- store1 [gstruct2_var + 8] @ 2 bytes
movw -10(%ebp), %ax <-- load2 [ebp - 10] @ 2 bytes
movw %ax, gstruct2_var+10 <-- store2 [gstruct2_var + 10] @ 2
bytes
xorl %eax, %eax
leave
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.5.1"
.section .note.GNU-stack,"",@progbits
With GCC4.4.1 we get:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $32, %esp
movl gstruct1_var, %eax
movl gstruct1_var+4, %edx
movl %eax, -16(%ebp)
leal -16(%ebp), %eax
movl %edx, -12(%ebp)
pushl %eax
call foo
movl -12(%ebp), %eax <-- Load1 [ebp - 12] @ 4 bytes
movl -4(%ebp), %ecx
movl %eax, gstruct2_var+8 <-- Store1 [gstruct2_var + 8] @ 4 bytes
movl -16(%ebp), %eax
movl %eax, gstruct2_var+4
xorl %eax, %eax
leave
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.1"
.section .note.GNU-stack,"",@progbits
The extra load stores appear to be the result of change to SRA fully
scalarizing structure members f1 and f2. With GCC4.4.1 the access to these
fields is done using a BIT_FIELD_REF which combines the two loads and stores.
Talking to MartinJ on IRC I was told the changes to SRA make aggressive
scalarization of aggregates. In the past there was some functionality to try
and combine appropriate components into BIT_FIELD_REFs so as to reduce the
number of loads/stores. This has been removed from 4.5 in favour of simplicity
of the Gimple IR and working towards generic MEM_REFs. The plan is to introduce
new IR constructs to load/store individual bits and in a separate gimple pass
decide how to combine them together. But, this will only be available in 4.7+.
We also have the exact same issue on our port and causes a significant
performance regression on our software.