This is the mail archive of the
mailing list for the GCC project.
Re: prefetching on pentium 4
- From: Tim Prince <timothyprince at sbcglobal dot net>
- To: ranjith kumar <ranjit_kumar_b4u at yahoo dot co dot uk>
- Cc: gcc-help at gcc dot gnu dot org
- Date: Tue, 28 Nov 2006 05:44:56 -0800
- Subject: Re: prefetching on pentium 4
- References: <email@example.com>
- Reply-to: tprince at myrealbox dot com
ranjith kumar wrote:
Hi,P4 isn't suitable for automatic compiler-generated prefetch. Default
hardware prefetch (stride-based and cache line pairs) is quite
effective. Prefetch intrinsics are available with #include
<xmmintrin.h>. Details on what works vary with steppings. The earliest
P4 models could accelerate hardware prefetch by the program issuing 3
cache lines of prefetch prior to entering a loop. Since Northwood, that
doesn't work. Since Prescott, prefetch hints are ignored on P4, with
prefetch going to L2 regardless of hints. Effect of prefetch on DTLB
misses also is model dependent.
1) Will "gcc" insert prefetch instructions
automatically on "pentium 4" processor?
Which flags should be enabled while compiling sothat
gcc automatically insert prefetch instructions?
2) Or programmer has to include some functions?
If so, what is the syntax of that function?
Contrary to what certain Windows related docs say, _mm_prefetch() works
the same on all compilers which implement it.