This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
prefetch optimizations
- From: Niavis Panagiotis <niavis at ceid dot upatras dot gr>
- To: gcc at gcc dot gnu dot org
- Date: Thu, 10 Jul 2003 14:37:21 +0300 (EET DST)
- Subject: prefetch optimizations
I'm not sure if this is the right mailing list to post this. I'm using the
function block_prefetch(follows) to prefetch a 4Kb array in the cpu
cache. In the program I allocate a scr pointer and a dest pointer (both
50MB) and I copy data from src to dest(whith memcpy) in 4KB
chuncks. I observe the following behaviours:
1) The best bandwidth is achieved when at the begining of each 4KB chunk
the destination pointer is passed to block_prefetch before coping to it.
2) malloc returns me pointers 8 bytes after a page barrier. If i align
them to page barrier the bandwidth drops a lot.
Can someone explain me or point me to some documents which explain this
behaviour?
Thank you in advance.
The block_prefetch follows:
int pfetch;
static const void inline block_prefetch(void *addr)
{
int *a = (int *) addr ;
pfetch += a[0] + a[16] + a[32] + a[48]
+ a[64] + a[80] + a[96] + a[112]
+ a[128] + a[144] + a[160] + a[176]
+ a[192] + a[208] + a[224] + a[240] ;
a += 256;
pfetch += a[0] + a[16] + a[32] + a[48]
+ a[64] + a[80] + a[96] + a[112]
+ a[128] + a[144] + a[160] + a[176]
+ a[192] + a[208] + a[224] + a[240] ;
a += 256;
pfetch += a[0] + a[16] + a[32] + a[48]
+ a[64] + a[80] + a[96] + a[112]
+ a[128] + a[144] + a[160] + a[176]
+ a[192] + a[208] + a[224] + a[240] ;
a += 256;
pfetch += a[0] + a[16] + a[32] + a[48]
+ a[64] + a[80] + a[96] + a[112]
+ a[128] + a[144] + a[160] + a[176]
+ a[192] + a[208] + a[224] + a[240] ;
}
Niavis Panagiotis
Computer Engeneering & Informatics Department
University of Patras
Greece