This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
C++ inlining problems
- To: gcc at gcc dot gnu dot org
- Subject: C++ inlining problems
- From: Joe Buck <jbuck at racerx dot synopsys dot com>
- Date: Tue, 31 Oct 2000 17:55:43 -0800 (PST)
Consider this code (a small piece of the Stepanov benchmark):
-------------------------------
struct {
double operator()(const double& x, const double& y) {return x + y; }
} plus;
template <class Iterator, class Number>
Number accumulate(Iterator first, Iterator last, Number result) {
while (first != last) result = plus(result, *first++);
return result;
}
double call_accum(double* first, double* last, double zero)
{
return accumulate(first, last, zero);
}
------------------------------
gcc 2.95.2 does a near-perfect job on this code, thanks to the ADDRESSOF
optimzation a very clean job of inlining is done. Solaris/Sparc code
at -O2 (unused labels deleted):
--------------------------------------------------------------
double accumulate<double *, double>(double *, double *, double):
!#PROLOGUE# 0
add %sp, -120, %sp
!#PROLOGUE# 1
std %o2, [%sp+96]
ldd [%sp+96], %f2
cmp %o0, %o1
fmovs %f2, %f0
be .LL8
fmovs %f3, %f1
.LL9:
mov %o0, %g2
ldd [%g2], %f2
add %o0, 8, %o0
cmp %o0, %o1
bne .LL9
faddd %f0, %f2, %f0
.LL8:
retl
sub %sp, -120, %sp
------------------------------------------------------------
(it seems that the loop could have one fewer instruction, but not bad).
The current CVS snapshot does horribly:
----------------------------------------------------
double accumulate<double *, double>(double *, double *, double):
!#PROLOGUE# 0
add %sp, -120, %sp
!#PROLOGUE# 1
mov %o0, %o4
cmp %o4, %o1
be .LL11
std %o2, [%sp+96]
add %sp, 96, %o2
.LL7:
mov %o4, %o0
ldd [%o0], %f4
add %o4, 8, %o4
ldd [%o2], %f2
cmp %o4, %o1
faddd %f2, %f4, %f2
bne .LL7
std %f2, [%sp+96]
.LL11:
ldd [%sp+96], %f0
retl
sub %sp, -120, %sp
---------------------------------------------------
The loop now does two loads and a store, instead of one load.
The reason appears to be that "result" is committed to be in
memory because it is passed by reference.
In gcc 2.95.2 we had a special optimization (ADDRESSOF) that handled only
the case of an object that fits in a register (builtin type or one-element
struct), that would allow us to remove referencing for objs passed to
inline functions. It seems that this no longer works. Any clue as to how
hard it is to make the tree-based inliner do this transformation? It
seems to me that it wouldn't be hard to remove the address-taking for any
passed object, so we could do better than ADDRESSOF.
The result of this problem is that current snapshots (and gcc-2.96RH)
are much worse at inline functions and STL than gcc 2.95.2.
(gcc 2.95.2 is bad if the object passed by reference to an inline
function is, say, a 2-element struct).