This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
c++ optimisation fails badly
- From: Andrew Dorrell <andrewdorrell at oztralia dot com>
- To: gcc at gcc dot gnu dot org
- Cc: andrewdorrell at oztralia dot com
- Date: Thu, 24 Jun 2004 16:43:37 +1000
- Subject: c++ optimisation fails badly
- Organization: Dorrell Family
I have been implementing some classes that implement multi-component numeric
types and have been having real problems getting the optimisation I was
expecting. It seems that overloading arithmetic operators carries a very
significant optimisation penalty. Before I give the code the vital
statistics:
systems: linux Suse 9.0, Fedora 1
gcc versions: gcc (GCC) 3.3.1 (SuSE Linux), gcc (GCC) 3.3.2 20031022 (Red
Hat Linux 3.3.2-1) gcc (GCC) 3.4.0 (on Fedora 1)
The problem exists on all the systems and compiler versions I have tried but
the same code optimises as expected using the microsoft C++ compiler (.net
2003).
Anyway I would appreciate any feedback. Here is sufficient code to
reproduce the issue:
/*******************************************************
* start optimise-test.cpp
*/
#include <stdlib.h>
#include <functional>
template<class T>
struct fn : std::binary_function<T,T,T>
{
T operator()(T const &arg1, T const &arg2) const
{
return (arg1 + arg2) / 2;
}
};
template<class T>
struct ThreeComponent
{
typedef T value_type;
ThreeComponent() {}
ThreeComponent(T const &v1, T const &v2, T const& v3) : a(v1), b(v2),
c(v3)
{}
T a;
T b;
T c;
};
template <class T>
ThreeComponent<T> operator+(ThreeComponent<T> const &l,
ThreeComponent<T> const &r)
{
return ThreeComponent<T>(l.a + r.a, l.b + r.b, l.c + r.c);
}
template <class T>
ThreeComponent<T> operator/(ThreeComponent<T> const &l, T r)
{
return ThreeComponent<T>(l.a / r, l.b / r, l.c / r);
}
template<class T>
struct tcfn
{
T operator()(T const &arg1, T const &arg2) const
{
return T((arg1.a+arg2.a)/2,(arg1.b+arg2.b)/2,(arg1.c+arg2.c)/2);
}
};
template <class T, class FN>
void test_optimisation()
{
const int size = 10000000;
T *data = (T*)malloc(size * sizeof(T));
T *end = data + size;
FN fn;
for(T *it=data; it<end; ++it)
{
*it = fn(*it, *it);
}
}
int main(int argc, char *argv[])
{
if(argc==1 || (argc > 1 && atoi(argv[1]) == 1))
{
test_optimisation<int, fn<int> >();
}
if(argc==1 || (argc > 1 && atoi(argv[1]) == 2))
{
test_optimisation<ThreeComponent<int>, fn<ThreeComponent<int> > >();
}
if(argc==1 || (argc > 1 && atoi(argv[1]) == 3))
{
test_optimisation<ThreeComponent<int>, tcfn<ThreeComponent<int> >
>();
}
}
/***************************************
* end optimise-test.cpp
*/
The code can be compiled simlply: g++ -O3 -o optimise-test optimise-test.cpp
Each of three different possible implementations for an averaging operation
can then be selected by a command line argument:
optimise-test 1
performs averaging on a large array of integers
optimise-test 2
performs same averaging on and array of ThreeComponent<int>. This applies
the same function as is applied to the int array and makes use of operator
overloading to achieve the desired result
optimise-test 3
performs a specially written vector averaging on and array of
ThreeComponent<int>. No operator overloading is relied on in this case
If optimisation was working as expected tests 2 and 3 should take roughly
the same time and about 3 times as long as test 1. And this is the case
using the microsoft compiler. Using g++ it is more tyipcal that test 2
takes at least twice as long, and when combined with some other (similar)
abstractions can take upto 5 times as long. The following for example
comes from my suse linux build:
for i in 1 2 3; do time ./bug $i; done
real 0m0.102s
user 0m0.023s
sys 0m0.075s
real 0m0.788s
user 0m0.564s
sys 0m0.212s
real 0m0.296s
user 0m0.074s
sys 0m0.213s
Notice in this case test 2 (user 0.564s) takes 7 times as long as test 3
(user 0m0.074s) which performs exactly the same amount of work!
Please Please Please! Am I missing something or is this a real optimiser
problem. Is this a known issue? Am I doing something significantly wrong
in my code? This is driving me to distraction as I am coding for image
processing and just can't accept that kind of performance hit.
--
Andrew Dorrell