This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Pessimization in compiler support for builtin __complex__
- To: Benjamin Kosnik <bkoz at redhat dot com>
- Subject: Pessimization in compiler support for builtin __complex__
- From: Gabriel Dos Reis <gdr at codesourcery dot com>
- Date: 02 Nov 2001 16:13:59 +0100
- Cc: Brad Lucier <lucier at math dot purdue dot edu>, Gabriel Dos Reis <gdr at codesourcery dot com>, gcc at gcc dot gnu dot org, hjstein at bloomberg dot com, nbecker at fred dot net, libstdc++ at gcc dot gnu dot org
- Organization: CodeSourcery, LLC
- References: <Pine.SOL.3.91.1011101181415.17604C-100000@taarna.cygnus.com>
Benjamin Kosnik <bkoz@redhat.com> writes:
[...]
| Does this impact the performance noted here:
|
| http://gcc.gnu.org/ml/libstdc++/2001-09/msg00091.html
| http://gcc.gnu.org/ml/libstdc++/2001-09/msg00110.html
No, the performance regression noted there is an orthogonal story,
which I'm trying to understand. The analysis below is conducted with
GCC-3.1 20011101 on a sparc sun4u SUNW,Ultra-2.
gcc -O2 translate the following
__complex__ double add(__complex__ double z1, __complex__ double z2)
{
return z1 + z2;
}
into
add:
!#PROLOGUE# 0
save %sp, -120, %sp
!#PROLOGUE# 1
std %i0, [%fp-24]
ldd [%fp-24], %f2
st %i4, [%fp+84]
fmovs %f2, %f4
std %i2, [%fp-24]
st %i5, [%fp+88]
fmovs %f3, %f5
ldd [%fp-24], %f2
fmovs %f2, %f6
fmovs %f3, %f7
ld [%fp+84], %f2
ld [%fp+88], %f3
faddd %f4, %f2, %f4
ld [%fp+92], %f2
std %f4, [%fp-24]
ld [%fp+96], %f3
faddd %f6, %f2, %f6
ldd [%fp-24], %o0
mov %o0, %i0
mov %o1, %i1
std %f6, [%fp-24]
ldd [%fp-24], %o0
mov %o0, %i2
mov %o1, %i3
ret
restore
which I find quite surprising, compared to the following
struct complex { double re, im; };
complex add(complex z1, complex z2)
{
complex w;
w.re = z1.re + z2.re;
w.im = z1.im + z2.im;
return w;
}
translated by g++ -O2 into
add(complex, complex):
.LLFB2:
!#PROLOGUE# 0
!#PROLOGUE# 1
mov %o0, %o3
ldd [%o1], %f2
ldd [%o3], %f4
faddd %f4, %f2, %f4
ld [%sp+64], %o2
mov %o2, %o0
std %f4, [%o2]
ldd [%o3+8], %f2
ldd [%o1+8], %f4
faddd %f2, %f4, %f2
jmp %o7+12
std %f2, [%o2+8]
Multiplication of complex numbers is also affected in the same way.
That is something I would like any middle-end/back-end expert
explain to me -- I know g++ implements the named return value
optimiztion but that doesn't suffice to explain the difference, I
think.
I'm also interested in why GCC doesn't generate
ld [%sp+64], %o2
instead of
ld [%sp+64], %o2
mov %o2, %o0
-- Gaby
CodeSourcery, LLC http://www.codesourcery.com