This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: inlining inefficiencies
Gabriel Dos Reis <gdr@codesourcery.com> writes:
> Dan Nicolaescu <dann@godzilla.ICS.UCI.EDU> writes:
>
> | Gabriel Dos Reis <gdr@codesourcery.com> writes:
> |
> | > Dan Nicolaescu <dann@godzilla.ICS.UCI.EDU> writes:
> | >
> | > | There are some problems with inlining as shown by the code below
> | > | (derived from oopack)
> | > |
> | > | class Complex_d {
> | > | public:
> | > | double re, im;
> | > | Complex_d (double r, double i) : re(r), im(i) {}
> | > | Complex_d () {}
> | > | };
> | >
> | > Incidentely, I would like to mention that the compiler seems to have
> | > some unexplained difficulty to optimize similar constructs with
> | > double __complex__ -- that used to be mentioned in the past, and I
> | > beleive the situation doesn't improve :-(
> |
> | If the difficulties you mention are related to aliasing, and you have
> | some testcases, please send them to me.
>
> I did some preliminary analysis here:
>
> http://gcc.gnu.org/ml/libstdc++/2001-11/msg00038.html
>
> I'm suspecting some aliasing issues, but I can't tell for sure.
The code generated by 3.2 is a little better.
The problem is the same, when functions are inlined the argument
passing is inlined too, and for SPARC v8 float arguments are passed in
integer registers, so there's a lot of code generated to move
arguments between the float and integer registers (through memory).
Your example looks much better when compiled with -mcpu=v9 -m64
__complex__ double add(__complex__ double z1, __complex__ double z2)
{
return z1 + z2;
}
_Z3addCdS_:
!#PROLOGUE# 0
!#PROLOGUE# 1
fmovd %f0, %f8
faddd %f8, %f4, %f12
faddd %f2, %f6, %f4
fmovd %f4, %f2
retl
fmovd %f12, %f0
struct complex { double re, im; };
complex add(complex z1, complex z2)
{
complex w;
w.re = z1.re + z2.re;
w.im = z1.im + z2.im;
return w;
}
_Z3add7complexS_:
!#PROLOGUE# 0
add %sp, -224, %sp
!#PROLOGUE# 1
faddd %f0, %f4, %f8
faddd %f2, %f6, %f4
std %f8, [%sp+192]
std %f4, [%sp+200]
ldx [%sp+192], %g4
ldx [%sp+200], %g1
stx %g4, [%sp+176]
stx %g1, [%sp+184]
ldd [%sp+176], %f0
ldd [%sp+184], %f2
nop
retl
sub %sp, -224, %sp
Well, this one could clearly be improved.
Can one of the SPARC maintainers comment on that?