This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

c++ optimisation fails badly


I have been implementing some classes that implement multi-component numeric 
types and have been having real problems getting the optimisation I was 
expecting.  It seems that overloading arithmetic operators carries a very 
significant optimisation penalty.  Before I give the code the vital 
statistics:

systems: linux Suse 9.0, Fedora 1
gcc versions: gcc (GCC) 3.3.1 (SuSE Linux), gcc (GCC) 3.3.2 20031022 (Red 
Hat Linux 3.3.2-1)  gcc (GCC) 3.4.0 (on Fedora 1)

The problem exists on all the systems and compiler versions I have tried but 
the same code optimises as expected using the microsoft C++ compiler (.net 
2003).

Anyway I would appreciate any feedback.  Here is sufficient code to 
reproduce the issue:


/*******************************************************
* start optimise-test.cpp
*/
#include <stdlib.h>
#include <functional>

template<class T>
struct fn : std::binary_function<T,T,T>
{
    T operator()(T const &arg1, T const &arg2) const
    {
        return (arg1 + arg2) / 2;
    }
};

template<class T>
struct ThreeComponent
{
    typedef T value_type;

    ThreeComponent() {}
    ThreeComponent(T const &v1, T const &v2, T const& v3) : a(v1), b(v2), 
c(v3)
        {}

    T a;
    T b;
    T c;
};

template <class T>
ThreeComponent<T> operator+(ThreeComponent<T> const &l,
                          ThreeComponent<T> const &r)
{
    return ThreeComponent<T>(l.a + r.a, l.b + r.b, l.c + r.c);
} 

template <class T>
ThreeComponent<T> operator/(ThreeComponent<T> const &l, T r)
{
    return ThreeComponent<T>(l.a / r, l.b / r, l.c / r);
} 

template<class T>
struct tcfn
{
    T operator()(T const &arg1, T const &arg2) const
    {
        return T((arg1.a+arg2.a)/2,(arg1.b+arg2.b)/2,(arg1.c+arg2.c)/2);
    }
};


template <class T, class FN>
void test_optimisation()
{
    const int size = 10000000;

    T *data = (T*)malloc(size * sizeof(T));
    T *end  = data + size;
    FN fn;

    for(T *it=data; it<end; ++it)
    {
        *it = fn(*it, *it);
    }
}


int main(int argc, char *argv[])
{
    if(argc==1 || (argc > 1 && atoi(argv[1]) == 1))
    {
        test_optimisation<int, fn<int> >();
    }
    if(argc==1 || (argc > 1 && atoi(argv[1]) == 2))
    {    
        test_optimisation<ThreeComponent<int>, fn<ThreeComponent<int> > >();
    }
    if(argc==1 || (argc > 1 && atoi(argv[1]) == 3))
    {    
        test_optimisation<ThreeComponent<int>, tcfn<ThreeComponent<int> > 
>();
    }
}
/***************************************
* end optimise-test.cpp
*/

The code can be compiled simlply: g++ -O3 -o optimise-test optimise-test.cpp
Each of three different possible implementations for an averaging operation 
can then be selected by a command line argument:

optimise-test 1 	
	performs averaging on a large array of integers

optimise-test 2
	performs same averaging on and array of ThreeComponent<int>.  This applies 
the same function as is applied to the int array and makes use of operator 
overloading to achieve the desired result

optimise-test 3
	performs a specially written vector averaging on and array of 
ThreeComponent<int>.  No operator overloading is relied on in this case

If optimisation was working as expected tests 2 and 3 should take roughly 
the same time and about 3 times as long as test 1.  And this is the case 
using the microsoft compiler.  Using g++ it is more tyipcal that test 2 
takes at least twice as long, and when combined with some other (similar) 
abstractions can take upto 5 times as long.  The following for example 
comes from my suse linux build:

for i in 1 2 3; do time ./bug $i; done

real    0m0.102s
user    0m0.023s
sys     0m0.075s

real    0m0.788s
user    0m0.564s
sys     0m0.212s

real    0m0.296s
user    0m0.074s
sys     0m0.213s


Notice in this case test 2 (user 0.564s) takes 7 times as long as test 3 
(user    0m0.074s) which performs exactly the same amount of work!


Please Please Please! Am I missing something or is this a real optimiser 
problem.  Is this a known issue?  Am I doing something significantly wrong 
in my code?  This is driving me to distraction as I am coding for image 
processing and just can't accept that kind of performance hit.

-- 
Andrew Dorrell


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]