This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: C++ math optimization problem...


I think it is a bug, or a "missing feature".
I tried to simpify the testcase below and ended up with a comlete different testcase, but it causes the same problem:


it seems to be about FPU registers, if anything causes the compiler to store the value to memory, it treats it as it would be volatile.

void otherfunc();

void test(){
double result=0.0; //stored in fpu register
otherfunc(); //"result" saved to memory
for (int j = 1; j < 100000000; ++j) result += 1.0; //"result" read and written to/from memory each cycle


  std::cerr << result << std::endl;
}

on i386 the inner loops look like this:

without otherfunc():
.L7:
  decl  %eax
  fadd  %st, %st(1)
  jns .L7


with otherfunc(): .L7: fldl -8(%ebp) decl %eax fadd %st(1), %st //FPU stack ordering problem? fstl -8(%ebp) js .L11 fstp %st(0) jmp .L7 .L11:


I don't really know what keywords to search for in bugzilla, could anyone please look up if this is a known bug?



Mfg,





Benjamin Redelings I schrieb:
Hi,
I have a C++ program that runs slower under 4.0 CVS than 3.4. So, I am trying to make some test-cases that might help deduce the reason. However, when I reduced this testcase sufficiently, it began behaving badly under BOTH 3.4 and 4.0.... but I guess I should start with the most reduced case first.


Basically, the code just does a lot of multiplies and adds. However, if I take the main loop outside of an if-block, it goes 5x faster. Also, if I implement an array as 'double*' instead of 'vector<double>' it also goes 5x faster. Using valarray<double> instead of vector<double> does not give any improvement.

MATH INSIDE IF-BLOCK
% time ./2h 1
double addition
result = 83283300.006041

real    0m0.995s
user    0m1.000s
sys     0m0.000s

MATH OUTSIDE IF-BLOCK
% time ./2i 1
result = 83283299.999998

real    0m0.218s
user    0m0.220s
sys     0m0.000s

Should I submit a PR? Any help would be appreciated...

-BenRI

------------ begin testcase -------------
#include <vector>

const int OUTER = 100000;
const int INNER = 1000;

using namespace std;

int main(int argn, char *argv[])
{
  int s = atoi(argv[1]);

  double result;
  if (s == 1) {  //remove this condition to get a 5x speedup
    // initialize d
    vector<double> d(INNER); //change to double* to get 5x speedup
    for (int i = 0; i < INNER; i++)
      d[i] = double(1+i) / INNER;

    // calc result
    result=0;
    for (int i = 0; i < OUTER; ++i)
      for (int j = 1; j < INNER; ++j)
        result += d[j]*d[j-1] + d[j-1];
  }
  else
    exit(-1);

  printf("result = %f\n",result);
  return 0;
}
----------- end testcase --------------








-- Stefan Strasser


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]