prefetching on pentium 4

Tim Prince timothyprince@sbcglobal.net
Tue Nov 28 13:45:00 GMT 2006


ranjith kumar wrote:
> Hi,
> 
>    1) Will "gcc" insert prefetch instructions
> automatically on "pentium 4" processor?
> Which flags should be enabled while compiling sothat
> gcc  automatically insert prefetch instructions?
> 
> 2) Or programmer has to include some functions?
>    If so, what is the syntax of that function?
> 
P4 isn't suitable for automatic compiler-generated prefetch.  Default 
hardware prefetch (stride-based and cache line pairs) is quite 
effective.  Prefetch intrinsics are available with #include 
<xmmintrin.h>.  Details on what works vary with steppings.  The earliest 
P4 models could accelerate hardware prefetch by the program issuing 3 
cache lines of prefetch prior to entering a loop.  Since Northwood, that 
doesn't work.  Since Prescott, prefetch hints are ignored on P4, with 
prefetch going to L2 regardless of hints.  Effect of prefetch on DTLB 
misses also is model dependent.
Contrary to what certain Windows related docs say, _mm_prefetch() works 
the same on all compilers which implement it.



More information about the Gcc-help mailing list