Scatter/Gather vector operations

Dzonatas dzonatas@dzonux.net
Sun Apr 8 08:04:00 GMT 2007


Tim Prince wrote:
> Dzonatas wrote:
>> Is this the only portable way to do to a pack/unpack without asm()? 
>> How do I set it up differently to trigger a pack/unpack optimization?
>>
>> Thank you.
>>
> If you're talking about optimization for a specific CPU, but you don't 
> want to reveal which CPU that is, why even post this?
No. I'm just trying to get an idea of what direction the future of such 
code may take, as I also wonder what is the best format for now.
> This code looks OK to me.  There isn't any special hardware support 
> for this on commonly available CPUs, like Opteron or Xeon. Scalar 
> moves should work as well as anything, and you are within the limits 
> for efficient Write Combine buffering.  If you have problems, you 
> won't get any help if you can't describe them more specifically.
>
The problem is bandwidth. Vector processes help greatly with that alone 
despite the matrix math.

Currently, there are immediate targets for SSE2 and Altivec enabled 
architectures. I could probably write assembly code to overcome it with 
instructions to unpack a vector and scatter the data that is specific 
for SSE2/Altivec, but I don't want to aim that short. I would like to 
avoid the assembly code if possible.

For example, is there a formal way to use a vector register as a pointer 
to main memory to fetch that data into another vector register. I know 
this is beyond the basic vector operations implemented now, but like:

for(i=0;i<100;i++) {
*vector_reg1 = vector_reg2;   // scatter each data element from reg2 
into memory pointed to by each associated element in reg1
vector_reg1++;
}

Thanks for the response.


-- 



More information about the Gcc-help mailing list