Op vr 04-07-2003, om 21:56 schreef Toon Moene:
Steven Bosscher wrote:
Hmm, how about initializing the data you use (malloc just allocates
the
space) - without it you could run into NaNs which would distort the
timing picture completely.
Maybe so, but in this case this has nothing to do with it. Just look
at
the assembly output and you can see that the code is just really poor.
But just to be sure, I replaced the line:
data1[i][j][k] = data2[i][j][k] * data3[i][j][k];
with:
data1[i][j][k] = 0.0;
and indeed, tree-ssa is still about 33% slower (2.97s avg. for mainline
vs. 3.89s avg. for tree-ssa). Sorry!
It may be interesting for people looking into this that with tree-ssa,
- we create a bigger stack frame
- with -fnew-ra performance is only 20% worse than mainline
- PRE doesn't make a difference at all.