[Bug tree-optimization/60577] New: inefficient FDO instrumentation code
carrot at google dot com
gcc-bugzilla@gcc.gnu.org
Wed Mar 19 03:02:00 GMT 2014
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60577
Bug ID: 60577
Summary: inefficient FDO instrumentation code
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: carrot at google dot com
This is actually a regression caused by r175916.
Compile the following code with options -O2 -fno-strict-aliasing
-fprofile-generate
struct thread_param
{
long* buf;
long iterations;
long accesses;
} param;
void access_buf(struct thread_param* p)
{
long i,j;
long iterations = p->iterations;
long accesses = p->accesses;
for (i=0; i<iterations; i++)
{
long* pbuf = p->buf;
for (j=0; j<accesses; j++)
pbuf[j] += 1;
}
}
Trunk gcc generates following for innermost loop:
.L9:
addq $1, __gcov0.access_buf(%rip)
addq $1, (%rax)
addq $8, %rax
cmpq %rdx, %rax
jne .L9
The fdo counter in memory is incremented in each iteration.
GCC at revision r175915 generates following for innermost loop
movq .LPBX1(%rip), %rsi
...
.L4:
addq $1, (%rax)
addq $8, %rax
cmpq %rdx, %rax
jne .L4
leaq 1(%rsi,%r9), %rsi
...
movq %rsi, .LPBX1(%rip)
The fdo counter doesn't bring any overhead to the innermost loop.
GCC at revision r175916 generates following for innermost loop
movq .LPBX1(%rip), %rcx
xorl %eax, %eax
leaq 1(%rcx), %r8
.p2align 4,,10
.p2align 3
.L4:
leaq (%r8,%rax), %rcx
movq %rcx, .LPBX1(%rip)
addq $1, (%rdx,%rax,8)
addq $1, %rax
cmpq %rsi, %rax
jne .L4
The fdo counter is incremented and written to memory in each iteration.
More information about the Gcc-bugs
mailing list