This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

C++ inlining problems


Consider this code (a small piece of the Stepanov benchmark):

-------------------------------
struct {
  double operator()(const double& x, const double& y) {return x + y; }
} plus;


template <class Iterator, class Number>
Number accumulate(Iterator first, Iterator last, Number result) {
  while (first != last) result =  plus(result, *first++);
  return result;
}

double call_accum(double* first, double* last, double zero)
{
    return accumulate(first, last, zero);
}
------------------------------

gcc 2.95.2 does a near-perfect job on this code, thanks to the ADDRESSOF
optimzation a very clean job of inlining is done.  Solaris/Sparc code
at -O2 (unused labels deleted):

--------------------------------------------------------------
double accumulate<double *, double>(double *, double *, double):
	!#PROLOGUE# 0
	add	%sp, -120, %sp
	!#PROLOGUE# 1
	std	%o2, [%sp+96]
	ldd	[%sp+96], %f2
	cmp	%o0, %o1
	fmovs	%f2, %f0
	be	.LL8
	fmovs	%f3, %f1
.LL9:
	mov	%o0, %g2
	ldd	[%g2], %f2
	add	%o0, 8, %o0
	cmp	%o0, %o1
	bne	.LL9
	faddd	%f0, %f2, %f0
.LL8:
	retl
	sub	%sp, -120, %sp
------------------------------------------------------------

(it seems that the loop could have one fewer instruction, but not bad).

The current CVS snapshot does horribly:

----------------------------------------------------
double accumulate<double *, double>(double *, double *, double):
	!#PROLOGUE# 0
	add	%sp, -120, %sp
	!#PROLOGUE# 1
	mov	%o0, %o4
	cmp	%o4, %o1
	be	.LL11
	std	%o2, [%sp+96]
	add	%sp, 96, %o2
.LL7:
	mov	%o4, %o0
	ldd	[%o0], %f4
	add	%o4, 8, %o4
	ldd	[%o2], %f2
	cmp	%o4, %o1
	faddd	%f2, %f4, %f2
	bne	.LL7
	std	%f2, [%sp+96]
.LL11:
	ldd	[%sp+96], %f0
	retl
	sub	%sp, -120, %sp
---------------------------------------------------

The loop now does two loads and a store, instead of one load.
The reason appears to be that "result" is committed to be in
memory because it is passed by reference.

In gcc 2.95.2 we had a special optimization (ADDRESSOF) that handled only
the case of an object that fits in a register (builtin type or one-element
struct), that would allow us to remove referencing for objs passed to
inline functions.  It seems that this no longer works.  Any clue as to how
hard it is to make the tree-based inliner do this transformation?  It
seems to me that it wouldn't be hard to remove the address-taking for any
passed object, so we could do better than ADDRESSOF.

The result of this problem is that current snapshots (and gcc-2.96RH)
are much worse at inline functions and STL than gcc 2.95.2.
(gcc 2.95.2 is bad if the object passed by reference to an inline
function is, say, a 2-element struct).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]