This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/33717] New: slow code generated for 64-bit arithmetic
- From: "felix-gcc at fefe dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 9 Oct 2007 16:53:35 -0000
- Subject: [Bug rtl-optimization/33717] New: slow code generated for 64-bit arithmetic
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
gcc generates very poor code on some bignum code I wrote.
I put the sample code to http://dl.fefe.de/bignum-add.c for you to look at.
The crucial loop is this (x, y and z are arrays of unsigned int).
for (i=0; i<100; ++i) {
l += (unsigned long long)x[i] + y[i];
z[i]=l;
l>>=32;
}
gcc code (-O3 -march=athlon64):
movl -820(%ebp,%esi,4), %eax
movl -420(%ebp,%esi,4), %ecx
xorl %edx, %edx
xorl %ebx, %ebx
addl %ecx, %eax
adcl %ebx, %edx
addl -1224(%ebp), %eax
adcl -1220(%ebp), %edx
movl %eax, -4(%edi,%esi,4)
incl %esi
movl %edx, %eax
xorl %edx, %edx
cmpl $101, %esi
movl %eax, -1224(%ebp)
movl %edx, -1220(%ebp)
jne .L4
As you can see, gcc keeps the long long accumulator in memory. icc keeps it
in registers instead:
movl 4(%esp,%edx,4), %eax #25.30
xorl %ebx, %ebx #25.5
addl 404(%esp,%edx,4), %eax #25.5
adcl $0, %ebx #25.5
addl %esi, %eax #25.37
movl %ebx, %esi #25.37
adcl $0, %esi #25.37
movl %eax, 804(%esp,%edx,4) #26.5
addl $1, %edx #24.22
cmpl $100, %edx #24.15
jb ..B1.4 # Prob 99% #24.15
The difference is staggering: 2000 cycles for gcc, 1000 for icc.
This only happens on x86, btw. On amd64 there are enough registers, so gcc and
icc are closer (840 vs 924, icc still generates better code here).
Still: both compilers could generate even better code. I put some inline asm
in the file for comparison, which could be improved further by loop unrolling.
--
Summary: slow code generated for 64-bit arithmetic
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: felix-gcc at fefe dot de
GCC build triplet: i386-pc-linux-gnu
GCC host triplet: i386-pc-linux-gnu
GCC target triplet: i386-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33717