This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop
- From: "changpeng dot fang at amd dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 24 Aug 2010 00:03:55 -0000
- Subject: [Bug target/45391] CPU2006 482.sphinx3: gcc4.6 5% regression from prefetching of vectorized loop
- References: <bug-45391-18740@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #2 from changpeng dot fang at amd dot com 2010-08-24 00:03 -------
float f (float *x, float *y, float *z, unsigned n)
{
float ret = 0.0;
unsigned i;
for (i = 0; i < n; i++)
{
float diff = x[i] - y[i];
ret -= diff * diff * z[i];
}
return ret;
}
NO, this is related tp PR 45022 in certain sense, but the underlying
reason is yet unknown.
For the above test case, if I compile with -O3 -march=amdfam10 -m64,
the loop is not vectorized due to floating point reduction. To my
surprise, no prefetch is generated. The cost model filtered out the
prefetches (we are trying to prefetch for each of the three memory
references):
Ahead 15, unroll factor 1, trip count -1
insn count 14, mem ref count 3, prefetch count 3
Not prefetching -- instruction to prefetch ratio (4) too small
However, if we compile with -O3 -ffast-math -march=amdfam10 -m64,
the loop can be vectorized, and one of the array reference is
aligned. As a result and due to PR 45022, we are trying to prefetch
only for the aligned reference, and one prefetch is inserted (this
time, insns-to-prefetch ratio is big enough).
The Fix of PR 45022 will result in NO prefetch generated actually and thus
hide the problem.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391