This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop



------- Comment #2 from changpeng dot fang at amd dot com  2010-08-24 00:03 -------
float f (float *x, float *y, float *z, unsigned n)
{
  float ret = 0.0;
  unsigned i;
  for (i = 0; i < n; i++)
    {
      float diff = x[i] - y[i];
      ret -= diff * diff * z[i];
    }
  return ret;
}

NO, this is related tp PR 45022 in certain sense, but the underlying
reason is yet unknown.

For the above test case, if I compile with -O3 -march=amdfam10 -m64,
the loop is not vectorized due to floating point reduction. To my
surprise, no prefetch is generated. The cost model filtered out the 
prefetches (we are trying to prefetch for each of the three memory
references):
Ahead 15, unroll factor 1, trip count -1
insn count 14, mem ref count 3, prefetch count 3
Not prefetching -- instruction to prefetch ratio (4) too small

However, if we compile with -O3 -ffast-math -march=amdfam10 -m64,
the loop can be vectorized, and one of the array reference is 
aligned. As a result and due to PR 45022, we are trying to prefetch
only for the aligned reference, and one prefetch is inserted (this
time, insns-to-prefetch ratio is big enough).

The Fix of PR 45022 will result in NO prefetch generated actually and thus
hide the problem.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]