This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/82732] New: malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 26 Oct 2017 10:52:05 +0000
- Subject: [Bug tree-optimization/82732] New: malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82732
Bug ID: 82732
Summary: malloc+zeroing other than memset not optimized to
calloc, so asm output is malloc+memset
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
#include <string.h>
#include <stdlib.h>
int *foo(unsigned size)
{
int *p = malloc(size*sizeof(int));
//memset(p,0, size*sizeof(int));
for (unsigned i=0; i<size; i++) {
p[i]=0;
}
return p;
}
gcc -O3 -march=haswell https://godbolt.org/g/bpGHoa
pushq %rbx
movl %edi, %edi # zero-extend
movq %rdi, %rbx # why 64-bit operand-size here?
salq $2, %rdi
call malloc
movq %rax, %rcx
testl %ebx, %ebx # check that size was non-zero before looping
je .L6
leal -1(%rbx), %eax
movq %rcx, %rdi
xorl %esi, %esi
leaq 4(,%rax,4), %rdx # redo the left-shift
call memset
movq %rax, %rcx
.L6:
movq %rcx, %rax # this is dumb, either way we get here malloc
return value is already in %rax. memset returns it.
popq %rbx
ret
So gcc figures out that this is malloc+memset, but I guess not until after the
pass that recognizes that as calloc.
But with explicit memset and gcc -O3, we get the zeroing loop to optimize away
as well
foo:
movl %edi, %edi
movl $1, %esi
salq $2, %rdi
jmp calloc
Unfortunately at -O2 we still get a loop that stores 4 bytes at a time, *after
calloc*. I know -O2 doesn't enable all the optimizations, but I thought it
would do better than this for "manual" zeroing loops.